HTTP load testing tools (ab, wrk) don't handle stateful long-lived bidi connections. k6 (and its websocket support) is purpose-built. Measuring p99 message latency and max concurrent connections is critical before launching any bidi service.

Advertisement

Why generic tools fail

ab/wrk: one short request per virtual user. WebSocket: one long-lived connection per user, many messages over it. Modeling 10K concurrent users sending 10 messages/sec each = 100K msg/sec on 10K connections — requires per-VU connection holding, not per-request.

k6 WebSocket script

import ws from 'k6/ws';
import { check } from 'k6';

export const options = { vus: 1000, duration: '60s' };

export default function () {
  ws.connect('wss://api.example.com/ws', null, (socket) => {
    socket.on('open', () => {
      socket.setInterval(() => {
        socket.send(JSON.stringify({ type: 'ping', ts: Date.now() }));
      }, 1000);
    });
    socket.on('message', (data) => {
      const msg = JSON.parse(data);
      check(msg, { 'has pong': (m) => m.type === 'pong' });
    });
    socket.setTimeout(() => socket.close(), 60000);
  });
}
Advertisement

Key metrics

ws_connecting: time to establish (TLS + handshake). ws_msgs_sent / ws_msgs_received: throughput. ws_session_duration: how long connections stayed up. Custom: app-level message round-trip via check hooks.

OS limits

Max file descriptors / open sockets: ulimit -n 1000000 on the test box. Ephemeral port range: net.ipv4.ip_local_port_range 1024 65535. TIME_WAIT reuse: net.ipv4.tcp_tw_reuse=1. Without these, 64K connection limit is your ceiling, not the server's.

Realistic load shape

Don't just ramp to N VUs and hold. Ramp slow (e.g., 1K/min for 10 min) to see connection rate limits. Add bursts (sudden 5x for 30s) to verify autoscaling. Mix message rates (most idle, some chatty) to mimic real users.

k6 for WebSocket load. Bump OS limits first. Measure connect time + msg throughput + session duration.