Rate Limiting Strategies Compared

Rate limiting protects services from abuse and from themselves. The algorithm you pick affects bursting behavior, memory cost per client, and fairness — and the wrong choice causes either lost legitimate traffic or service overload.

Advertisement

Fixed window — simple, broken

Count requests in the current minute; reject when count > limit; reset every minute. Easy to implement but lets a client send 2× the limit by timing requests around the boundary (e.g., 100 at 12:59:59, 100 at 13:00:00). Use only for non-critical limits.

Sliding window log — accurate, expensive

Store every request timestamp; on each new request, count timestamps in the trailing window. Perfectly accurate but O(N) memory per client. Fine for low-RPS APIs; impossible at scale.

Advertisement

Token bucket — the production default

Each client has a bucket with capacity C and refill rate R/sec. Request consumes 1 token; reject if bucket empty. Allows controlled bursting up to C, then enforces sustained rate R. 2 numbers per client. This is what most production rate limiters use.

Token bucket implementation

import time

class TokenBucket:
    def __init__(self, capacity, refill_per_sec):
        self.cap, self.rate = capacity, refill_per_sec
        self.tokens = capacity; self.last = time.monotonic()
    def take(self, n=1):
        now = time.monotonic()
        self.tokens = min(self.cap, self.tokens + (now - self.last) * self.rate)
        self.last = now
        if self.tokens >= n:
            self.tokens -= n; return True
        return False

Distributed rate limiting

Single-instance buckets are easy. Distributed across 100 API servers, you have options: local + sloppy (each server tracks its share, total may overshoot by 10%) — usually fine. Redis-backed (every request hits Redis) — accurate but adds 1-2 ms. Sliding window counter in Redis (Cloudflare's approach) — best of both. Pick based on whether 10% overshoot is acceptable.

Token bucket per client, local-first with sloppy distributed enforcement, Redis backing for high-value limits.