System Design - Load Balancing

Open Table of contents

Context
Algorithms
Implementation
References

Context

A component that distributes traffic evenly among servers. Each service has a different workload and usage patten.

Algorithms

Round Robin

This algorithm sends requests to each of the servers in sequence and evenly distributes the load. It’s straightforward to set up and understand, but slow requests might overload the server.

Weighted Round Robin

The load balancer assigns a weight to each server based on its capacity.
It then forwards requests based on server weight; the more the weight, the higher the requests

Imagine weighted round robin as an extension of the round robin algorithm. It means servers with higher capacity get more requests in sequential order.

This approach offers better performance. Yet scaling needs manual updates to server weights, thus increasing operational costs.

Least Response Time

The load balancer monitors the response time of the servers.
It then forwards the request to the server with the fastest response time.
If two servers have the same latency, the server with the fewest connections gets the request.

This approach has the lowest latency, yet there’s an overhead with server monitoring. Besides latency spikes might cause wrong routing decisions.

Adaptive

An agent runs on each server, which sends the server status to the load balancer in real-time.
The load balancer then routes the requests based on server metrics, such as CPU and memory usage.

Put simply, servers with a lower load receive more requests. It means better fault tolerance. Yet it’s complex to set up, also the agent adds an extra overhead.

Least Connections

The load balancer tracks the active connections to the server.
It then routes requests to the server with fewer connections.

It ensures a server does not get overloaded during peak traffic. Yet tracking the number of active connections makes it complex. Also, session affinity needs extra logic.

IP Hash

The load balancer uses a hash function to convert the client’s IP address into a number.
It then finds the server using the number.

This approach avoids the need for an external storage for sticky sessions.

Yet there’s a risk of server overload if IP addresses aren’t random. Also, many clients might share the same IP address, thus making it less effective.

Least Bandwidth

Similar to the least connections, but route to the server serving the least bandwidth for IO.

Implementation

Smart clients
Hardware load balancers
Software load balancers

A hardware load balancer runs on a separate physical server. Although it offers high performance, it’s expensive.

So they set up a software load balancer. It runs on general-purpose hardware. Besides, it’s easy to scale and cost-effective.

References

Neo Kim blog