Designing a Metrics Aggregation System

A metrics system ingests millions of points per second, lets you query over arbitrary time ranges with low latency, and doesn't cost millions to operate. The architecture: ingest pipeline + time-series store + pre-aggregation rollups + smart query planner.

Advertisement

Ingest pipeline

Agents (Prometheus, Telegraf, OpenTelemetry collectors) push samples to a write-side service. Write service deduplicates, batches, partitions by metric name, writes to time-series store. At 1M points/sec, this stage uses a queue (Kafka or in-process buffer) to absorb bursts.

Time-series store

Specialized for append-heavy + range-query workloads. Prometheus TSDB, InfluxDB, VictoriaMetrics, TimescaleDB. All use compressed columnar layouts (Gorilla, Snappy) and per-series indexes. Query for a metric over a time range is O(log N) to find blocks, then linear scan.

Advertisement

Pre-aggregation rollups

A 24-hour query on raw 1-second samples = 86,400 points per series, slow. Precompute 1-minute, 5-minute, 1-hour rollups (avg/sum/min/max/p99). Query planner picks the coarsest resolution that satisfies the requested range. Sub-second queries even over a year.

Downsampling for retention

Keep raw samples for 7 days, 1-minute rollups for 30 days, 1-hour rollups for 1 year, daily for forever. Storage drops 100x while keeping enough resolution for historical comparison. Configure per-metric — some you keep raw forever (billing).

Cardinality control

'request_count{user_id=alice}' with 10M users = 10M time series. This is what kills metric systems. Hard cap on label cardinality (e.g., max 10K series per metric); reject writes that would exceed. Aggregate high-cardinality dimensions before ingest.

Specialized TSDB + pre-agg rollups + downsample-on-age + cardinality limits. Each of the four is non-negotiable at scale.