Load balancing is a technique that ensures an organization’s server does not get overloaded with traffic. With load balancing measures in place, workloads and traffic requests are distributed across server resources to provide higher resilience and availability.
In the earliest days of the Internet, it became painfully obvious that a single application on a single server couldn’t handle high-traffic situations. Concurrent server requests from large volumes of traffic would frequently overwhelm a single server regardless of how powerful the underlying infrastructure was. A solitary instance of application availability was—and still is, in cases where load balancing isn’t implemented—a single point of failure. This poses a huge threat to a system’s reliability.
How Does Load Balancing Work?
A typical load balancing sequence works as follows:
- Traffic comes to your site. Visitors to your site send numerous concurrent requests to your server via the Internet.
- The traffic is distributed across server resources. The load balancing hardware or software intercepts each request and directs it to the appropriate server node.
- Each server operates a reasonable workload. The node receives the request and is able to accept the request and respond to the balancer because it is not overloaded with too many requests.
- The server returns the request. The process is completed in reverse order to deliver the server’s response back to the user.
It may seem obvious, but it’s worth noting that the steps above can only be completed if there are multiple resources (either server, network, or virtual) that have already been established. Otherwise, if there is only a single server or compute instance, all of the workloads are distributed to the same place and load balancing is no longer necessary.
Recommended: What is Failover Clustering?
Benefits of Load Balancing
A load balancer acts like a traffic cop or a filter for the traffic coming over the Internet. It prevents any one server from being overloaded and becoming unreliable. Therefore, every node is able to work more efficiently.
In recent years, load balancing has become a feature of a broader class of technology known as Application Delivery Controllers (ADC). ADC aims to provide multiple advanced load balancing features to ensure workload balancing, along with an overall high quality of application delivery.
In addition to preventing any one resource from becoming overwhelmed and unreliable, load balancing has benefits for security and productivity. ADCs are often used as appliances and control points for security, which helps to protect against various types of threats including Denial of Service (DOS) attacks. Load balancing also often involves duplicating content and application workloads, which allows for more than one copy of a resource to be accessed at a time.
Recommended: How to Prevent DoS Attacks
Hardware Vs. Software Load Balancers
Load balancing can be accomplished using either hardware or software. Both approaches have their benefits and drawbacks, as illustrated in the table below. Check out our lineup of the Best Load Balancers for 2021 to figure out which hardware or software load balancer is the right fit for you.
Hardware load balancers | Software load balancers | |
Performance | Higher | Lower |
Flexibility | Less flexible | More flexible |
Virtualization | Built-in | Happens externally |
Architecture | Best for multi-tenant, occupies more physical space | Best for individual tenant, requires no physical space |
Cost | Higher investment and maintenance costs | Lower costs overall |
Configuration | Less configurable | Highly configurable |
Recommended: Does Virtualization’s Success Spell Decline for Server Sales?
Load Balancing Categories
There are several types of load balancing configurations that you might choose to deploy depending on the features that are most important to you. You might also decide to layer multiple configuration types as part of a load balance or ADC appliance. Load balancing categories include server load balancing, network load balancing, global server load balancing (GSLB), container load balancing, and cloud load balancing.
Server Load Balancing
With server load balancing, the goal is to distribute workloads across server resources based on availability and capabilities. Server Load balancer configurations tend to rely on application layer traffic to route requests. Server load balancing is also sometimes referred to as Layer 7 load balancing as it makes use of application layer traffic.
Network Load Balancing
Network load balancing distributes traffic flow across IP addresses, switches, and routers to maximize utilization and availability. These types of configurations are made at the transport layer and data traffic. Network load balancing is also sometimes referred to as Layer 4 balancing.
Global Server Load Balancing (GSLB)
In Global Server Load Balancing, an operator handles workload balancing across a globally distributed set of Layer 4 and Layer 7 loads. In a GSLB deployment, there are also typically ADC assets at the global level, as well as the local level where the traffic is finally delivered.
Container Load Balancing
Container load balancing provides virtual, isolated instances of applications and is also enabled via load balancing clusters. Among the most popular approaches is the Kubernetes container orchestration system, which can distribute loads across container pods to help balance availability.
Cloud load balancing
Within a cloud infrastructure, there are often multiple options for load balancing across compute instances. Load balancing in the cloud can also include both network (layer 4) and application (layer 7) balancing.
Load Balancing Approaches
Depending on the load balancing configuration(s) being used from the list above, there are a variety of techniques that determine how a set of workloads are balanced. These include:
- Round Robin: A set of IPs for server or network resources are provided and traffic is directed to resources in the order listed.
- Weighted Round Robin: Each compute or network resource in a list is provided a weighted score, with the highest weight getting the most traffic.
- Least Connection: The resource with the fewest number of active connections is directed to get new incoming requests.
- Weighted Response Time: Information about server instance response time is used to direct traffic, with the slowest servers getting the least amount of traffic.
- Source IP Hash: The IP address of the client and the receiving compute instance are computed with a cryptographic algorithm (the “hash” to help keep clients connected to the same resource).