Load Balancing: The Traffic Cop of Distributed Systems

Published on: Feb 14, 2026 | Category: System Design

Load balancing is a core concept in system design, ensuring that no single server bears too much demand. By spreading the work evenly, load balancers improve application responsiveness and increase availability.

What is a Load Balancer?

A Load Balancer sits between clients and servers, accepting incoming network and application traffic and distributing the traffic across multiple backend servers using various algorithms. It acts as the "traffic cop" in front of your servers and routes client requests across all servers capable of fulfilling those requests in a manner that maximizes speed and capacity utilization.

L4 vs. L7 Load Balancing

Layer 4 (Transport Layer): Decisions are based on data from network and transport layer protocols (IP, TCP, FTP, UDP). It doesn't inspect packet contents. Examples: HAProxy (in TCP mode), IPVS.
Layer 7 (Application Layer): Decisions are based on data inside the application message (HTTP headers, URLs, cookies). This allows for smarter routing but is more CPU intensive. Examples: NGINX, HAProxy (in HTTP mode), AWS ALB.

Common Algorithms

1. Round Robin

Requests are distributed across the group of servers sequentially. It works best when servers have similar capabilities.

2. Least Connections

A new request is sent to the server with the fewest current connections to clients. Useful when requests have varying processing times.

3. IP Hash

The IP address of the client is used to determine which server receives the request. This ensures a client is always directed to the same server (Session Stickiness).

4. Consistent Hashing

Crucial for distributed caching. It minimizes reorganization of the mapping when nodes are added or removed.

Interactive Visualization

Below is a simulation of a Load Balancer distributing requests to backend servers. You can adjust the algorithm to see how it affects distribution.

Health Checks

A load balancer should only route traffic to "healthy" servers. It regularly checks the health of backend servers by attempting to connect or sending a specific request. If a server fails the check, it is removed from the pool until it recovers.

Conclusion

Choosing the right load balancing strategy is critical for building scalable, reliable systems. While Round Robin is a great starting point, production systems often require more sophisticated approaches like Least Connections or resource-based routing.