Jaeger is an open-source distributed tracing system originally from Uber. It collects spans from your services, stores them, and lets you visualize request flows across systems. The most actionable observability signal for microservice architectures.
Trace anatomy
A trace is a tree of spans. Root span = the entry point (HTTP request). Child spans = internal operations (DB query, downstream API call). Each span has start/end time, status, attributes (HTTP method, URL, error message). Trace ID propagates via HTTP headers (W3C Trace Context standard).
Sampling strategy
100% sampling = too much data. Head-based: decide at trace start (e.g., sample 1% of GET, 10% of POST). Simple, predictable storage. Tail-based: collect all spans, decide after — sample errors at 100%, slow requests at 100%, others at 1%. More useful, more complex.
Storage
Jaeger supports Cassandra, Elasticsearch, ClickHouse backends. Retention typically 7-30 days; older traces archived or dropped. At 1B spans/day with ES, expect ~50GB/day at default sampling — plan accordingly.
Root-cause workflow
User reports slow request →
Find their trace by user_id attribute →
Click into trace timeline →
Identify span with longest self-time →
Read span attributes for parameters →
Reproduce locallyIntegration with OpenTelemetry
Modern setup: instrument services with OTel SDK, export to OTel Collector, Collector forwards to Jaeger. Decouples instrumentation from storage. Switch from Jaeger to Tempo or Honeycomb later without re-instrumenting.