WebSocket connections are long-lived and stateful, which breaks the stateless-LB assumptions backing most web services. Scaling past a few thousand concurrent connections requires explicit architecture decisions.

Advertisement

Connection limits per server

Per OS process: 65K (port limit) or 1M+ (with tuning). Per machine: 100K-1M depending on memory per connection (state, buffers). Plan for ~10K-50K per node in practice, then horizontal-scale.

Sticky sessions or shared state

Either: sticky route (connection stays on one server, fails over with reconnect) or shared state (Redis pub/sub, NATS) so any server can deliver messages to any connection. Sticky is simpler at small scale; shared state is required at large scale.

Advertisement

Pub/sub fanout

Server publishes 'user X has new message' to Redis/NATS. Any server holding a connection for user X picks it up and forwards over WS. Decouples publishers from active connection topology.

Plan 10K-50K conns per node. Sticky sessions early; pub/sub fanout later. Test fail-over reconnect logic.