Distributed Transactions in Practice

ACID across services is hard. Three patterns cover the practical space: 2PC for short cross-shard transactions, Saga for long cross-service flows, TCC as the middle ground. Picking the right one for the use case avoids the 'distributed transactions are unsolvable' despair.

Advertisement

The fundamental problem

You want operations across services to either all succeed or all fail. Network is unreliable. Services crash. Some operations have side effects (charge card, send email) that can't be 'rolled back' in the database sense. There's no perfect solution; only patterns matched to constraints.

2PC — the blocking pattern

Coordinator asks all participants 'prepare'. If all yes, says 'commit'. Participants hold locks during the gap. Works for: short transactions, bounded participants, within one trust domain. Doesn't work: long transactions (lock contention), participant crash leaves coordinator blocked.

Advertisement

Saga — long-running compensable steps

Each step has an inverse (compensation). Step 1 succeeds, step 2 fails → run compensation for step 1. Works for: long flows (checkout, onboarding) where steps cross service boundaries. Caveat: weaker isolation; intermediate states visible to others.

TCC — Try, Confirm, Cancel

Each step: Try (reserve resources without committing), Confirm (commit if everyone Tried), Cancel (release reservations). Like 2PC but resource-level. Works for: inventory reservation, hotel-bookings-style workflows. Requires each service to model a 'reserved' state.

Outbox + idempotency

Whatever pattern, every step must be idempotent (because retries happen). And: write your state change + the outbound event in a single local transaction, publish via a separate worker. The outbox pattern bridges 'I committed locally' to 'the message reached the broker' atomically.

2PC for short cross-shard. Saga for long cross-service. TCC for reservable resources. Outbox + idempotency under everything.