A connection is 'half-open' when one side has died but the other doesn't know yet. TCP keepalive defaults are too slow (~2 hours on Linux). Application-level heartbeats are the only reliable way to detect dead peers in seconds, not hours.
The three layers
Network layer: ICMP echo — useless inside an established TCP connection. Transport layer: SO_KEEPALIVE — default ~2 hr on Linux, tune with tcp_keepalive_time. Application layer: PING/PONG messages — fastest detection, full control.
Heartbeat interval rule of thumb
Heartbeat interval = (max acceptable detection time) / 2. If you want to detect dead peers within 30 sec, send heartbeats every 15 sec; declare dead after 2 missed. Too frequent wastes bandwidth/battery; too sparse misses fast failures.
WebSocket ping/pong
const ws = new WebSocket('wss://example.com');
let missed = 0;
const interval = setInterval(() => {
if (ws.readyState !== WebSocket.OPEN) return;
ws.ping(); missed++;
if (missed > 2) { ws.close(4000, 'no pong'); }
}, 15000);
ws.on('pong', () => { missed = 0; });gRPC keepalive
gRPC has built-in keepalive — set GRPC_ARG_KEEPALIVE_TIME_MS=10000 and GRPC_ARG_KEEPALIVE_TIMEOUT_MS=5000. The server can refuse aggressive keepalive (RST_STREAM with ENHANCE_YOUR_CALM) so coordinate values between client and server.
Mobile-aware tuning
Doze mode on Android suspends app heartbeats. iOS will silently kill background sockets. For mobile, prefer push notifications (APNs/FCM) over keep-alive for wake-up; use heartbeats only while the app is foregrounded.