'Why is my service slow?' is often a lock contention question. CPU is idle, latency is high — threads are waiting. Finding the contended lock is straightforward with the right tools; teams without them spend days guessing.
Symptoms
CPU usage low (~30%) but latency high. Throughput plateaus despite more cores. Threads in BLOCKED or WAITING state in thread dump. Long tail latency (p99 >> p50) — characteristic of queuing behind a lock.
Java: jstack + async-profiler
jstack: snapshot all thread states. Many threads on the same monitor = found your lock. async-profiler with --lock mode aggregates lock wait time per call site. The visualization makes the contended lock obvious.
Go: mutex profile
Enable with runtime.SetMutexProfileFraction. pprof -mutex shows contended mutexes by wait time. Built into the runtime; just need to enable.
Linux: perf lock
Kernel-level lock observation. Captures wait/hold times for futex-based locks. perf lock report gives you the histogram. Language-agnostic; works on any compiled program.
Fixes by pattern
Coarse lock split into finer locks (per-shard, per-key). Read-write lock for read-heavy workloads. Lock-free data structures for hottest paths (carefully). Eventually-consistent local counters with periodic sync. Pick by traffic shape; don't rewrite for fun.