Profiling used to mean 'attach to a process locally and reproduce the bug'. Continuous profiling — always-on, low-overhead, aggregated — lets you ask 'what was the CPU doing during the spike yesterday' from a UI. 2025-2026 made this cheap enough to leave on permanently.
eBPF-based sampling
eBPF profiler attaches to perf_events, samples stacks at ~100Hz, exports as pprof. <1% CPU overhead. Works across languages without per-language agents. Parca and Pyroscope both support eBPF mode.
What you get
Flamegraphs aggregated across all instances. 'Show me CPU breakdown by function in the checkout service yesterday from 14:00-15:00'. Compare two time windows ('before deploy vs after').
Memory profiling too
Java: JFR allocation profiling. Go: heap profile via pprof. Pyroscope ingests both. Memory leaks become visible as a steady growth in retained heap for a specific allocation site.
Cost
Storage is dominated by stack-trace cardinality, not request volume. Even at 10K hosts, profile storage is small (~GB/day) compared to logs (~TB/day). The biggest cost: the engineer time spent NOT looking at flamegraphs because they didn't know they were available.
Where it shines
Diagnosing 'why is p99 latency 30% worse since Tuesday's deploy' — flamegraph diff usually reveals it instantly. CPU regressions, lock contention, GC pressure, unexpected I/O — all visible without instrumentation changes.