Load testing is dismissed as 'we'll just k6 it' and then poorly executed. A good load test reveals capacity, scaling characteristics, and failure modes; a bad one ships false confidence. The patterns that matter are about realism, not tooling.
Define what you're proving
Goal 1: 'can we serve 10K RPS at p99 < 200ms?' That's a capacity test. Goal 2: 'what breaks first when we push to 50K?' That's a stress test. Goal 3: 'are we faster after refactor X?' That's a regression test. Different goals → different tests.
Realistic traffic shape
Production traffic is bursty, not flat. It has read/write ratios, session-cluster timings, varying payload sizes. Replay production traffic if you can; synthesize realistic shapes if you can't. Flat-RPS tests miss the failures real users trigger.
Measure the right things
RED metrics: rate, errors, duration. Plus: tail latency (p99, p99.9 — not average), error breakdown by class, resource usage (CPU, memory, DB connections), downstream impact. 'It handled the RPS' isn't enough.
Test for hours, not minutes
Five-minute tests miss memory leaks, connection pool exhaustion, cache eviction effects. Run for at least 30 min once you've validated the short tests. The system stabilizes (good) or degrades (bad) — both are important to know.
Don't test in prod by accident
Load tests can DoS your own dependencies. Coordinate with downstream owners. Use synthetic accounts. Disable real-money side effects (no payments to actual processors). Plan rollback if the test reveals an issue.