The Burden of Monitoring Load Balancers
I was having one of “those” days, and someone asked me to describe what a load balancer did. My answer was something like this.
“Imagine your most critical application gained sentience, and had access to a genetics lab. There, it crossed a router with a switch to create a device that served its nefarious plans for world domination. In this scenario, what your mild-mannered-application-turned-mad-scientist created would look a lot like a load balancer.”
What I was trying to get at was that, like a switch, a load balancer has multiple devices plugged into it, and it both sends and receives traffic to and from those devices. Like a router, the load balancer creates boundaries between sets of devices, so that one group (a “pool”) cannot automatically receive data meant for a different pool.
This highlights some of the challenge that I introduced in my last blog post. You see, since it’s really just a box with ports in it, many monitoring solutions look at a load balancer and see a switch. They display metrics for CPU, memory, interface status, and bandwidth. While this will certainly tell you the overall basic health of the device, so much is missing that using this information to say the load balancer is “healthy” is nearly worse than nothing at all.
The problem is that none of those metrics really focus on what the load balancer does. What you really need to know includes:
- The health and status of the actual service that is being presented
- The health and status of any higher-order controllers or managers
- Which devices are connected to the load balancer
- How those devices are grouped into pools
For each of those points, we would want to know about both the status and the performance.
Without at least SOME of this information, it’s impossible for anyone to confidently state that the load balancer is doing anything remotely resembling its intended purpose.