'Why is my service slow?' is often a lock contention question. CPU is idle, latency is high — threads are waiting. Finding the contended lock is straightforward with the right tools; teams without them spend days guessing.

Advertisement

Symptoms

CPU usage low (~30%) but latency high. Throughput plateaus despite more cores. Threads in BLOCKED or WAITING state in thread dump. Long tail latency (p99 >> p50) — characteristic of queuing behind a lock.

Java: jstack + async-profiler

jstack: snapshot all thread states. Many threads on the same monitor = found your lock. async-profiler with --lock mode aggregates lock wait time per call site. The visualization makes the contended lock obvious.

Advertisement

Go: mutex profile

Enable with runtime.SetMutexProfileFraction. pprof -mutex shows contended mutexes by wait time. Built into the runtime; just need to enable.

Linux: perf lock

Kernel-level lock observation. Captures wait/hold times for futex-based locks. perf lock report gives you the histogram. Language-agnostic; works on any compiled program.

Fixes by pattern

Coarse lock split into finer locks (per-shard, per-key). Read-write lock for read-heavy workloads. Lock-free data structures for hottest paths (carefully). Eventually-consistent local counters with periodic sync. Pick by traffic shape; don't rewrite for fun.

Profile lock wait time; the contended lock pops out. Then split, use RW lock, or go lock-free — in that order.