A URL shortener is the classic interview problem — but at billions of links it becomes a real distributed-systems exercise. The interesting parts: ID generation strategy, hot-key handling for viral links, and analytics ingestion without slowing the redirect path.
ID generation
Three options. Hash-based: SHA-256 → base62, take first 7 chars. Collisions exist (resolve with linear probing). Counter-based: monotonic counter → base62. No collisions, but predictable URLs. Snowflake-like: 64-bit ID = timestamp + datacenter + sequence. Distributed, no central counter.
Storage
Read:write ratio ~100:1. Store mapping in a KV store (DynamoDB, Cassandra, Redis Cluster) keyed by short_id. Postgres works up to ~1B rows with a simple table + index. For ~100B rows you need sharding.
Hot-key handling
A viral link (e.g., 50K req/sec to one short_id) overwhelms a single shard. Mitigation: cache the redirect target in a CDN edge cache with TTL ~1 minute. The DB only sees one request/minute for hot keys — the CDN absorbs the storm.
Analytics pipeline
The redirect path must NOT block on analytics writes. Pattern: enqueue click event to Kafka (1ms async write), return 302 immediately. A separate consumer ingests Kafka into ClickHouse/BigQuery for aggregation. Real-time dashboards query the analytics store, not the redirect store.
Custom aliases
User wants tnyurl.co/my-link. Check uniqueness in DB, reject if taken. Reserve a namespace for system-generated IDs (e.g., 7-char base62) and a separate one for custom aliases. Avoid collisions between the two.