Designing a Quota System — Belgavi.AI Lab

Rate limiting and quota tracking look similar but differ in time horizon. Rate limit: requests per second. Quota: API calls per month. Both need distributed counters that survive node failures; the algorithms diverge from there.

Advertisement

Token bucket — short-term rate

Bucket of tokens refills at rate R. Each request consumes a token. Burst-tolerant up to bucket capacity. Implemented in Redis with a script or in app memory with rebalancing.

Sliding window — fixed-period quota

Track requests in a rolling time window. More accurate than fixed windows (no edge effects at minute boundaries). Redis sorted sets with score=timestamp work well.

Advertisement

Distributed counters

Quotas across many app servers need shared state. Redis is the typical answer; CRDTs for multi-region eventual; central coordinator only for very strict quotas. Pick by SLA on accuracy vs latency.

Token bucket for rate. Sliding window for quota. Redis for distribution; CRDTs for multi-region.