When to Pick Cassandra — Belgavi.AI Lab

Cassandra is the right answer for a specific shape of problem: write-heavy, multi-region, time-series-friendly, eventual consistency tolerable. Outside that shape, it's the wrong answer. The reason it gets misused is that 'NoSQL scales' became a meme — actual fit is much narrower.

Advertisement

The sweet spot

Time-series-like data with predictable access patterns. Writes >> reads. Need multi-DC deployment with active-active writes. Tolerate read-your-own-writes anomalies of milliseconds-to-seconds. Examples: IoT telemetry, ad impression logs, audit trails, time-series metrics.

The wrong fit

Heavy ad-hoc queries (Cassandra is bad at joins, secondary indexes, aggregations). Strong consistency required (you can configure QUORUM but you pay latency). Small datasets (< 100 GB — Postgres is simpler). Frequent schema changes (Cassandra schemas are rigid).

Advertisement

Partition key design rules

Pick a partition key that gives you ~10K-1M rows per partition. Too small → operational overhead. Too large → hot partitions, slow reads. Use clustering keys for in-partition ordering. Never let one partition exceed ~100 MB.

Operational reality

Cassandra is unforgiving operationally — repair must run weekly to handle tombstones, compaction strategy must match access pattern (STCS for time-series, LCS for read-heavy), and JVM tuning matters. Plan for an SRE who knows it, or use a managed service (DataStax Astra, ScyllaDB Cloud).

ScyllaDB alternative

ScyllaDB is a C++ rewrite of Cassandra with the same API, ~3x throughput, no JVM. If you'd choose Cassandra, evaluate ScyllaDB first — it's a drop-in upgrade for almost all workloads.

Cassandra (or Scylla) for write-heavy, time-series, multi-region. Postgres for everything else under 1B rows.