A connection is 'half-open' when one side has died but the other doesn't know yet. TCP keepalive defaults are too slow (~2 hours on Linux). Application-level heartbeats are the only reliable way to detect dead peers in seconds, not hours.

Advertisement

The three layers

Network layer: ICMP echo — useless inside an established TCP connection. Transport layer: SO_KEEPALIVE — default ~2 hr on Linux, tune with tcp_keepalive_time. Application layer: PING/PONG messages — fastest detection, full control.

Heartbeat interval rule of thumb

Heartbeat interval = (max acceptable detection time) / 2. If you want to detect dead peers within 30 sec, send heartbeats every 15 sec; declare dead after 2 missed. Too frequent wastes bandwidth/battery; too sparse misses fast failures.

Advertisement

WebSocket ping/pong

const ws = new WebSocket('wss://example.com');
let missed = 0;
const interval = setInterval(() => {
  if (ws.readyState !== WebSocket.OPEN) return;
  ws.ping(); missed++;
  if (missed > 2) { ws.close(4000, 'no pong'); }
}, 15000);
ws.on('pong', () => { missed = 0; });

gRPC keepalive

gRPC has built-in keepalive — set GRPC_ARG_KEEPALIVE_TIME_MS=10000 and GRPC_ARG_KEEPALIVE_TIMEOUT_MS=5000. The server can refuse aggressive keepalive (RST_STREAM with ENHANCE_YOUR_CALM) so coordinate values between client and server.

Mobile-aware tuning

Doze mode on Android suspends app heartbeats. iOS will silently kill background sockets. For mobile, prefer push notifications (APNs/FCM) over keep-alive for wake-up; use heartbeats only while the app is foregrounded.

Application-layer PING/PONG every 15s, declare dead after 2 missed. Don't rely on TCP keepalive defaults — they're hours, not seconds.