Load Balancing

System Design | Scalability | Availability | Performance | DNS

13 min readMay 14, 2021

tl;dr

Load balancing is an integral part of system design and is key for the scalability, availability & performance of any application. I have been part of few discussions where these concepts are mystified and made unnecessarily cryptic than required.
I have actively participated in the load balancing design of my current app, but by no measure am I an expert. The main idea here is to simplify, declutter and understand load balancing and related topics. I will use Gmail (hypothetical) as an example to share my experience & understanding. I really appreciate any feedback to keep this article updated and relevant.

What is a Domain Name System (DNS)?

A domain name system converts a domain name (part of an URL) to an IP address that serves the required content.

Example: IP address for https://mail.google.com/ is 142.250.80.5 (for my current location, we will discuss more on why it may differ based on location).

How does DNS server resolve domain name to IP Address?

Scenario 1 (cache hit): mail.google.com found in ISP DNS Resolver’s cache

In this scenario, the browser gets the IP address directly from ISP DNS Resolver’s cache. Scenario 2 shows the flow when the IP address is not found in the cache.

Scenario 2(cache miss): mail.google.com NOT found in DNS Resolver’s cache

Since the IP address for mail.google.com is not found. ISP DNS Resolvers communicate with various “authoritative DNS servers” across the internet to resolve DNS names.

ISP DNS Resolver does not host DNS domains. It contacts the “authoritative DNS server” across the internet to resolve DNS names.

Refer to “Root Servers” to get a list of “authoritative DNS servers” and related information. In theory, if all of these DNS servers were to be down then we may never be able to reach Gmail servers even though they are up and running.

In our example, ISP DNS Resolver first gets a list of name servers containing google.com from “authoritative DNS servers”. It then gets one or a list of IP addresses of mail.google.com from one of the name servers.

In our example, ISP DNS Resolver caches 142.250.80.5 IP address for “mail.google.com” for a configured period of time know a TTL (time to live). Eg: 30 mins or 60 mins (Time-based cache eviction). This is how Scenario 1 (cache hit) cache is primed.

Both Scenario 1 and Scenario 2 are simple as we are just resolving the domain names to IP addresses (mail.google.com = [142.250.80.5]). We have not dwelled on the details of load balancing. We will further expand our example to understand loading balancing.

Load Balancers

Layer 4 (L4)

L4 load balancers look at the transport layer to distribute traffic and not packet contents.

Eg: Divert traffic purely based on IP address or port

It does not look into HTTP headers or cookies of a typical REST call. The scope of this article is on Layer 7 (L7). Please refer to NGINX documentation for more info on L4.

Layer 7 (L7)

L7 load balancers look at the application layer to distribute traffic.

Eg: Divert traffic based on header info, cookie info, or in extreme cases even payload.

L7 is way more flexible than L4 and demands more computing and resources based on implementation. We will dig more into L7 through different use cases in this article. Please refer to NGINX documentation for more info on L7.

Use Cases & FAQs

Let us go through few use cases and their related FAQs to better understand how load balancers (L7)help in scalability, availability & performance.

What is a data center?

In simplicity, a data center is a physical location that houses all computing & network resources to successfully fulfill a request.

In our example, let us say Gmail has a data center in California. This data center is fully capable of authentication, authorization, and successfully respond to all REST calls (fetching emails, deleting emails, etc..). So the data center has microservices (hosting REST endpoints), databases, cache, storage, load balancers, and every other required resource.

Use Case 1: Single Data Center + Single Server or Instance

This is a hypothetical use case where Gmail just has one data center and just one instance of microservice handling traffic across the globe. For this use case, we do not need load balancers as all requests go to one data center and one server.

DNS Lookup

The browser just gets 1 IP address (not a list, as we just have 1 data center) from the DNS resolver (refer to figure 2 above) for mail.google.com, regardless of the user’s geographical location.

CONS

Poor performance as traffic increases.
Users suffer network latency if they are geographically far from the data center.
No way to horizontally scale. Vertical scaling (not a good one) is the only option.
Single point of failure, both in terms of data center and server instances.

In reality, this use case does not exist.

Use Case 2: Single Data Center + Multiple Servers or Instances in a Cluster

In this use case, Gmail has one data center in California but multiple instances of microservices to handle incoming traffic. We can use a load balancer within this data center to evenly distribute traffic.

What is VIP IP Address?

Wikipedia: A virtual IP address (VIP or VIPA) is an IP address that doesn’t correspond to an actual physical network interface. Uses for VIPs include network address translation (especially, one-to-many NAT), fault-tolerance, and mobility.

DNS Resolver (#2) returns the VIP IP address of Gmail’s load balancer (#3) to the browser.
IP addresses of Microservices (#4) are internal to the Google network and not exposed to the outside world.
Load balancer (#3) routes requests to different instances of Microservices (#4) to evenly distribute load and handle the traffic.
Now technically, we can add more resources like DB replicas, instances of Microservices (#4), etc… to horizontally scale.

This is way better than use case 1, but still has many limitations for products like Gmail that has a massive user base across the globe.

DNS Lookup

Even in this use case, the browser just gets 1 IP address (not a list, as we just have 1 data center and 1 cluster of load balancer) from the DNS resolver (refer to figure 4) for mail.google.com, regardless of the user’s geographical location.

Note: IP address is the VIP IP address of the “load balancer & reverse proxy (#3)”

Getting to know a little about Reverse Proxy

A reverse proxy is a web server that centralizes all services and provides a unified interface for the external world. It also handles Security (connections per client, blacklist IPs), Compression (gzip), SSL certificates, Caching, Hide internal server details, etc.

Example: If we have different Gmail microservices to handle Inbox, Updates, Promotions, Attachments, etc. These details are encapsulated by the reverse proxy and the outside world just knows different endpoints. In simplicity, reverse proxy takes up the responsibility of mapping endpoints to applicable microservices.

Note: Products or solutions like NGINX support both Layer 7 loading balancing and reverse proxy.

How does Load Balancer evenly distribute the load?

Many techniques can be used based on application needs. Some of the common & widely used techniques:

Round Robin: This is a rotation system. If we have 5 instances of microservices (M1 .. M5). The first request is routed to M1 and the second to M2 and likewise circularly. This works well if the unit of work for endpoints are equal and consumes constant resources.

Note: In Kubernetes environments, A Kubernetes Ingress controller is a specialized load balancer that uses “weighted round-robin” based on the number of active threads of a pod or instance of a microservice.

IP Hashing: This is simply mapping a request to an instance using a hash technique like consistent hashing.

Example: We can hash the email ID of the incoming request and map it to a server instance.

Least Connections: Calls are routed to an instance with the least number of connections at that instant of time.

Example: If we have 5 instances of microservices (M1 .. M5) and each microservice can handle 30 concurrent threads or connections. The request is routed to an instance that has a max number of available or free connections.

Note: In many applications, the numbers of threads available in microservices are greater than the number of concurrent connections it can make to the database.

Note: If microservices (M1 .. M5) have different numbers of concurrent threads configured, then we need to factor this aspect into our calculation. This is the “Weighted least connections” method

Least Response Time: Calls are routed to instances with the least number of active connections and least average response times.

Custom Load: Calls are routed to instances with the least load. The least load on an instance is calculated based on CPU usage, memory, and average response times.

Note: In Kubernetes environments, A Kubernetes Ingress controller can be configured to route based on available CPU, memory, or a combination of both.

How does Load Balancer know intricate details of each and every instance of microservice or related resources for dynamic load balancing?

To handle or adopt dynamic load balancings like Least connections, Weighted least connections, Least response time, or Custom load. Load balancers need to know some metrics (depending on the method) in regular intervals. Load balancers achieve this by making periodic calls (HTTP or other protocols) to each instance of microservice and caching the required metrics to make an informed decision.

Example: In Kubernetes environments, A Kubernetes Ingress controller is a specialized load balancer that can be configured to handle dynamic load balancing. Click here to dig deeper.

CONS

Traffic around the world should be routed to a single DC in California. Users who are geographically far away will suffer network latency.
For any reason, if this DC goes down then the entire application is unavailable. In the larger sense, this can be considered as a single point of failure.

Some FAQs before we proceed with Use Case 3 & 4

In Use Case 2 (refer to Figure 4 above), What happens if the “Load Balancer + Reverse Proxy” cluster goes down? Is that a single point of failure?

Yes, in this case, our application (Gmail) is not accessible. But we can make some changes to mitigate this risk. Refer to Figure 5 below.

We now have 2 load balancer clusters instead of 1. The 2 VIP IP addresses are registered with DNS (refer to Figure 1 for the complete flow of resolving DNS to IP addresses).
When a browser (#1) requests an IP address for mail.google.com, we get a list of 2 IP addresses in random order. If the first IP times out then the browser tries to connect to the second IP address.

In Use Case 2 (refer to Figure 4 OR 5 above), Is my DB (#5) or Storage (#6) a single point of failure?

Yes, we can mitigate this by adding multiple instances of DBs (#5), Storage (#6), and introduce load balancers between Microservices (#4) and DB, Storage. Refer to Figure 5 below.

Multiple DB Nodes (#8): We now have multiple instances of DBs that should be in sync. Additionally, we have a new cluster of load balancers(#6) between microservices(#4) and DB(#8). We can handle reads and writes to specific DB nodes depending on the type of DB.

Example: Like all writes routed to master DB node and reads to replica DB nodes in case of RDMS like MySQL or PostgreSQL.

Multiple Storage Nodes (#9): We now have multiple instances of Storage that should be in sync. Additionally, we have a new cluster of load balancers(#7) between microservices(#4) and Storage(#9).

Use Case 3: Single Data Center + Multiple Farms

In this use case, we still are in a single data center but we have multiple farms that are intentionally independent of each other.

Hypothetically let us say, Gmail wants to separate out resources for paid corporate clients and free individual users. To handle this we have 2 farms that are self-sufficient and independent of each other.

Farm 1 = Paid corporate clients

Farm 2 = Free individual users

The initial load balancer(#2) now routes the request to the load balancer of either Farm1 or Farm2 based on user group or type. Within each farm, the request is handled as discussed in Use Case 2.

Note: Farm1 and Farm2 are independent of each other. Farm1 can work fine and serve requests even if Farm2 is down & vice versa. So when either of the farms is down not all users are affected.

We definitely are better when compared to Use Case 2, as we can introduce more farms to distribute load, resources and mitigate risks. But we still have latency issues for users far away from California and the threat of application being unavailable if something untowardly happens to this single DC.

Use Case 4: Multiple Data Centers

Multiple data centers can be geographically distributed within a country or across continents

What is DNS Load Balancing?

We saw the flow of how to resolve domain names to a single IP address or a list of IP addresses (refer to Figure 2). We can extrapolate the same to route requests appropriately to geographically distributed data centers across the globe. This technique is DNS load balancing aka Global Server Load Balancing (planet-wide).

DNS load balancing is the practice of configuring a domain in the Domain Name System (DNS) such that client requests to the domain are distributed across a group of server machines.

DNS Load Balancing

Let us understand DNS load balancing through “Azure’s Traffic Manager”. It is very mature and gives us a perfect picture of how powerful DNS load balancing can be. Similar offerings: Google Cloud Load Balancing, AWS Elastic Load Balancing, and many others.

Azure Traffic Manager

This is a DNS-based load balancer. Traffic is optimally distributed while providing high availability & responsiveness.

Multi Data Center

We now have our app (Gmail) deployed in multiple data centers across the globe. Data is asynchronously replicated across these data centers. Technically, any data center can honor user requests from any corner of the world. We will take the help of the Traffic Manager to route traffic to the closest healthy data center (minimal latency).

Azure Traffic Manager takes up the responsibility of routing requests to the appropriate data center (playing the role of a DNS Load Balancer). We still have local load balancers (internal to a specific DC) in each of the data centers as discussed in the previous use cases. Below (Figure 9) is a high-level flow of user’s requests getting routed to the closest healthy data center with help of Azure Traffic Manager.

I strongly recommend reading “How Traffic Manager Works” for a detailed understanding of the entire flow.

High-level Steps to integrate with Traffic Manager

Step1: Register multiple endpoints with Traffic Manager. These endpoints span across multiple data centers.

Note: Endpoints need not be an Azure service. It can be any internet-facing service.

Step 2: Implement “Health Endpoint”. Each DC needs to implement a health endpoint that the Traffic Manager periodically calls to know the status and health of a data center. Example: A REST call returning HTTP status 200.

Note: In many applications, the health of the data center can be an aggregate value of many resources. Regardless, health endpoint should be a very lightweight call.

Step 3: Configure Traffic- Routing method. There are different routing methods. Below are some common and popular routing methods.

Closest: Closest to the user with the lowest network latency. Example: Requests from Australia may go to the Asia data center as it is the closest with the lowest latency.
Weighted: We can assign weights (a numeric value between 1–1000) to each data center. Example: Asia = 2, Europe = 3, South Ameria = 1 & North America = 4. Then 20% of traffic Asia, 30% to Europe, 10% to South America & 40% to North America.
Geographical: Request is routed based on the DNS query of ISPs DNS resolver. Example: Any request from Japan should be routed to a North American data center (legal reasons) even though the Asian data center is close to Japan.
Priority: We can define the priority for each data center. Requests will be routed based on the order of priority and the health of the data center. Example: North America = 4, Europe = 3, Asia = 2 & South America = 1. All requests will be routed to the North American data center as long it is healthy. Once the North American data center goes unhealthy then requests are routed to Europe & so on. This is more like data centers reserved for disaster recovery (DR).

Can we have a nested Traffic Manager?

Yes, we can have multiple nested traffic managers with different routing methods based on application needs.

Again, I strongly recommend watching “Azure Traffic Manager Tutorial”.

CONCLUSION

Different projects adopt different techniques based on their needs & demands. In many applications, it gets complex because of existing legacy layers. Regardless, load balancing is indeed the first step of performance tunning. I hope this article is helpful in understanding the basics of load balancing and provides a path to dig deeper based on your requirements & design.

Thank you.