Byzantine Fault Tolerance — When It Earns Its Cost

Byzantine Fault Tolerance handles nodes that lie, not just nodes that crash. The cost is real (3f+1 nodes to tolerate f Byzantine failures, more message rounds). The benefit is real too — but only when your threat model genuinely includes lying nodes.

Advertisement

The Byzantine fault model

Crash failures: node stops. Byzantine: node sends arbitrary messages — could be malicious, could be corrupted memory, could be buggy. To tolerate f Byzantine nodes you need 3f+1 total (vs 2f+1 for crash-only). The extra node-count is the floor cost.

PBFT and its descendants

Practical Byzantine Fault Tolerance (Castro & Liskov, 1999) was the first viable production BFT. Three-phase commit (pre-prepare, prepare, commit). O(n²) messages per decision. Newer variants (HotStuff) reduce this to O(n).

Advertisement

Tendermint and HotStuff

Tendermint (Cosmos): leader-based BFT with timeouts. HotStuff (Libra/Diem): linear message complexity, pipelined. These are the modern blockchain-targeted BFT protocols. Production-quality, well-audited.

Where BFT earns its cost

Public blockchains (untrusted participants). Cross-org consortium systems (members don't fully trust each other). Critical infrastructure where adversarial node compromise is in scope (defense, financial settlement). Rarely needed in a single-org internal system.

Where to skip

Inside one trust domain (your company's services): use Raft, not BFT. The cost-benefit doesn't work. If you're seeing 'BFT' in an internal architecture proposal, push back — usually it's cargo culting from the blockchain world.

BFT for genuinely Byzantine threat models: blockchains, cross-org consortiums. Raft elsewhere. Don't add BFT to single-trust-domain systems.