HTTP load testing tools (ab, wrk) don't handle stateful long-lived bidi connections. k6 (and its websocket support) is purpose-built. Measuring p99 message latency and max concurrent connections is critical before launching any bidi service.
Why generic tools fail
ab/wrk: one short request per virtual user. WebSocket: one long-lived connection per user, many messages over it. Modeling 10K concurrent users sending 10 messages/sec each = 100K msg/sec on 10K connections — requires per-VU connection holding, not per-request.
k6 WebSocket script
import ws from 'k6/ws';
import { check } from 'k6';
export const options = { vus: 1000, duration: '60s' };
export default function () {
ws.connect('wss://api.example.com/ws', null, (socket) => {
socket.on('open', () => {
socket.setInterval(() => {
socket.send(JSON.stringify({ type: 'ping', ts: Date.now() }));
}, 1000);
});
socket.on('message', (data) => {
const msg = JSON.parse(data);
check(msg, { 'has pong': (m) => m.type === 'pong' });
});
socket.setTimeout(() => socket.close(), 60000);
});
}Key metrics
ws_connecting: time to establish (TLS + handshake). ws_msgs_sent / ws_msgs_received: throughput. ws_session_duration: how long connections stayed up. Custom: app-level message round-trip via check hooks.
OS limits
Max file descriptors / open sockets: ulimit -n 1000000 on the test box. Ephemeral port range: net.ipv4.ip_local_port_range 1024 65535. TIME_WAIT reuse: net.ipv4.tcp_tw_reuse=1. Without these, 64K connection limit is your ceiling, not the server's.
Realistic load shape
Don't just ramp to N VUs and hold. Ramp slow (e.g., 1K/min for 10 min) to see connection rate limits. Add bursts (sudden 5x for 30s) to verify autoscaling. Mix message rates (most idle, some chatty) to mimic real users.