OpenTelemetry Full Stack — Belgavi.AI Lab

OpenTelemetry (OTel) is the open standard replacing vendor-specific instrumentation (Datadog APM, New Relic agents, etc.). One SDK, one collector, any backend. The promise is real but the moving parts are many — here's how they fit.

Advertisement

The three signals

OTel covers traces (request-spanning timelines), metrics (numbers over time), and logs (text events). All three share a common context (trace_id, span_id) so you can pivot between them. This is the killer feature versus legacy stacks.

Instrumentation: auto + manual

Auto-instrumentation: drop in a library, common frameworks emit traces with zero code (Flask, Django, Express, Spring). Manual: wrap your own business logic with tracer.start_as_current_span(). Start with auto, add manual for the 5% of code that matters most.

Advertisement

The Collector

Standalone binary that receives OTLP from your apps, processes (sampling, attribute editing, batching), exports to any backend. Run as sidecar per pod (low overhead, simple) or as a fleet of dedicated nodes (better aggregation, more complex). Sidecar wins for < 100 services.

Backend choice

Backend	Best for
Jaeger	Open-source self-host, traces only
Tempo + Prometheus + Loki (Grafana stack)	Self-hosted full-stack
Datadog	Hosted, polished UI, expensive
Honeycomb	High-cardinality trace querying
SigNoz	OTel-native self-host

Sampling strategy

100% sampling is expensive (every span hits the collector). Tail-based sampling: collect all spans, decide after-the-fact based on outcome (errors, slow requests sampled at 100%, normal at 1%). Head-based: decide at trace start. Tail is more useful but requires the collector to buffer.

Auto-instrument first → sidecar collector → tail sample → pick backend last. The standard means you can change backends later.