Two-Phase Commit gets a bad rap as 'the blocking protocol'. Used as a general-purpose distributed transaction mechanism, that reputation is earned. Used narrowly — short transactions, limited participants, with a robust coordinator — it remains the right tool for specific jobs.

Advertisement

The protocol

Phase 1 (prepare): coordinator asks all participants 'can you commit?'. Each writes a tentative log entry and replies yes/no. Phase 2 (commit/abort): if all yes, coordinator says commit; participants apply and ack. If any no or timeout, abort.

The blocking problem

If coordinator crashes after phase 1 but before phase 2, participants are blocked holding locks until coordinator recovers. With many participants and slow coordinators, this can cascade.

Advertisement

Where 2PC still fits

Cross-shard transactions in a single trust domain (e.g., CockroachDB, Spanner). Short critical-section transactions (<100ms) with bounded participants (2-5). Backed by a Raft-replicated coordinator for high-availability of the coordinator.

Saga as the alternative

Long transactions across services with retry/compensation. Each step has an inverse. Used by checkout flows, payment processing. No blocking; weaker isolation (intermediate states visible). The right shape when 2PC's locks would cause too much contention.

Per-message idempotency

Whatever protocol you pick, make participants idempotent. Coordinators crash, retries happen, network duplicates messages. An operation that's safe to repeat survives all of this; an operation that isn't requires careful 'have I done this?' tracking.

2PC for short cross-shard txns inside one system. Sagas for cross-service flows. Always idempotent participants.