Rate limiting and quota tracking look similar but differ in time horizon. Rate limit: requests per second. Quota: API calls per month. Both need distributed counters that survive node failures; the algorithms diverge from there.
Token bucket — short-term rate
Bucket of tokens refills at rate R. Each request consumes a token. Burst-tolerant up to bucket capacity. Implemented in Redis with a script or in app memory with rebalancing.
Sliding window — fixed-period quota
Track requests in a rolling time window. More accurate than fixed windows (no edge effects at minute boundaries). Redis sorted sets with score=timestamp work well.
Distributed counters
Quotas across many app servers need shared state. Redis is the typical answer; CRDTs for multi-region eventual; central coordinator only for very strict quotas. Pick by SLA on accuracy vs latency.