WebSocket connections are long-lived and stateful, which breaks the stateless-LB assumptions backing most web services. Scaling past a few thousand concurrent connections requires explicit architecture decisions.
Connection limits per server
Per OS process: 65K (port limit) or 1M+ (with tuning). Per machine: 100K-1M depending on memory per connection (state, buffers). Plan for ~10K-50K per node in practice, then horizontal-scale.
Sticky sessions or shared state
Either: sticky route (connection stays on one server, fails over with reconnect) or shared state (Redis pub/sub, NATS) so any server can deliver messages to any connection. Sticky is simpler at small scale; shared state is required at large scale.
Pub/sub fanout
Server publishes 'user X has new message' to Redis/NATS. Any server holding a connection for user X picks it up and forwards over WS. Decouples publishers from active connection topology.