You started your e-commerce business with just a single server running your spring/rails/(Insert trendy Js+Node framework) monolith, and now you are successful. People are flooding your website for the latest products. However, there's only so much your single server behind the free reverse proxy on Heroku Dyno can do for you. With increased traffic your requests will have increased latency (longer time to finish a single request response cycle), end users will begin noticing the slow experience, and eventually with the traffic increasing further your app eventually crashes, speed is a feature, and you have none.
Thankfully, we can avoid this scenario by adding more servers and distributing our traffic across them. Now before you add more servers, you can of course optimize your application logic, scope for inefficient queries to database, and add caching for frequent requests, to minimize database queries, or even caching static content at your proxy. Even with all these optimizations, there's only so much your traffic your single server can serve with acceptable latency per request.
To tackle this let's run a replica of our application code in another server. Now we have 2 servers that run independent from each other running the exact same application logic. Before we begin distributing traffic across these 2 servers we need to perform certain checks.
First, is the application logic running on each server stateless? (read more about stateful vs stateless services). The reason why this check matters, is that if the application logic is not stateless, we will have to map each request to the same server it hit previously, e.g. user A logs (session based) in via the application running on server A, if the next time user A makes a request and we do not log them back in to server A we will have to re-authenticate them because server B has no context of server A's stored session. This makes load balancing hard to do, and we will eventually have to build complex logic around how to manage traffic across multiple servers while persisting a mapping for every request. In a stateless service we can route the user to whichever available or selected server since each server is stateless, and can be used as a pipe of business logic unaware of it's previous request.
Second, and still going off the theme of state, do we have any form of persistence that is local to our server? Let's say in our server A and server B, we are also running processes on the same instance for a cache. Server A interacts with Cache A and Server B interacts with Cache B, and both servers interact with a Postgres instance running on a separate server. Now if we are using a write through cache (geeksForGeeks Article on write through caching), we can very well run into an instance where Cache A has data that is not there in Cache B and if we invalidate data in Cache A with write through, Cache B will have invalid/old data that still may be taking a hit on requests. There may be cases where you may not care too much about data inconsistency and just put an expiration on the cache, and manage refetching the data via the database (totally plausible, and this is how I have caching set at the moment for this website). If not you can move the caches out of the instance running the application logic and put the cache on a server outside of the application servers, and have it sit between your database and application server instances.
If both the checks are good we can now start distributing traffic across our 2 servers :). But how.......
The core problem of routing traffic can be handled in a few different ways. Do you want your client side code to switch between 2 url's...what happens when you have N instances? Do we now deal with switching between N urls in client side code!?
Instead, we can use software and/or hardware that was made to specialize in this very task, Load Balancers. A Load Balancer sits between client and multiple servers and distributes traffic amongst the pool of servers. My choice of solution for this is Nginx which can be used for load balancing as well as only a proxy.
There are different strategies to route incoming requests to servers, the one I have seen used in production is the round robin (alternate between servers sequentially). You read more about different load balancing strategies here https://kemptechnologies.com/load-balancer/load-balancing-algorithms-techniques/. Load balancing is also divided on how much it needs to look at the incoming request in order to route it. If the routing logic needs to look into the content of the incoming request and route the traffic based on the content of the request we call it layer 7 load balancing e.g. We have a server for serving images and one for dynamic data, then we need to check if the request is for an image or dynamic data before forwarding the request. If we do not need to be aware of the content and we can just route the request without awareness of the content of the request we are mostly in the realm of layer 4 load balancing, e.g. round robin, and weighted round robin.
If our services our stateless and we running the same instance then layer 4 load balancing makes sense, we can keep the routing logic dumb and the smarts in our application logic. Layer 4 based load balancing is generally faster than layer 7 since we do not need to decrypt the request content to inspect. Also load balancers can be configured to forward requests to the client but to have the server return a response directly to to the client (DSR/Direct Server Return), or relay back the response via the load balancer. DSR is often employed when your load balancer starts to become a performance bottleneck.
Now with our Load balancer and servers setup with round robin we can now have client traffic being distributed across multiple servers instead of just one and scale our e-commerce business on, if we hit a bottleneck with 2 servers we can add more instances to our server pool and pass through this bottleneck, that is of-course until our Postgres database starts becoming the bottleneck 😉....