Golden Signals Revisited — Belgavi.AI Lab

'Golden signals' became conventional wisdom; teams memorize the original four (latency, traffic, errors, saturation) without remembering why. In 2026, RED and USE have largely supplanted them as practical day-to-day frameworks. The math is the same; the framing matters.

Advertisement

The original four (Google SRE book)

Latency: how long requests take. Traffic: how much demand. Errors: rate of failures. Saturation: how full the service is. Designed for services; not always natural fit for stateful systems or batch.

RED for services

Rate (requests/sec), Errors (errors/sec), Duration (latency). Three signals; remember three things. Captured by any HTTP/gRPC instrumentation. The minimum-viable service dashboard.

Advertisement

USE for resources

Utilization, Saturation, Errors. For CPUs, disks, network. 'CPU at 90% utilization' is U. 'CPU run queue length' is S. 'CPU errors' is E (rare; cosmic rays). Lets you reason about resource-bound systems.

SLO-aligned signals

Pick a small number of user-impacting SLIs. Track them. Don't conflate 'CPU usage' (cause) with 'p99 latency' (user impact). SLO alerting is on the user-impact metric; dashboards include cause-side metrics for diagnosis.

Common mistakes

Tracking every metric ever ('vanity dashboards'). No baseline understanding (alerts fire on absolute thresholds vs deviation from normal). No connection to SLO. Burying the four/three key signals among 50 other charts.

RED for services, USE for resources, SLO-aligned for user impact. Keep dashboards tight; the discipline is in what you don't show.