Have you ever read about TV ratings? Almost every person that watches TV has heard of the ratings produced by the Nielsen Media Research
group. These statistics shape how we watch TV and decide whether or not shows are renewed for more episodes in the future.
But, how does Nielsen handle longer programs? How do they track the Super Bowl? Can they really tell how many people were tuned in for the entire event? Or who stopped watching at halftime after the commercials were finished? This particular type of tracking could let advertisers know when they want their commercials to air. And for the network broadcasting the event, it could help them figure out how much to charge during the busiest viewing times.
You might be interested to know that Nielsen tracks their programs in 15-minute increments
. They can tell who was tuned in for a particular quarter-hour segment over the course of multiple hours. Nielsen has learned that monitoring the components of a TV show helps them understand the analytics behind the entire program. Understanding microtrends helps them give their customers the most complete picture possible.
Now, let's extend this type of analysis to application monitoring. In the old days, it was easy to figure out what we needed to monitor. There were one or two servers that ran each application. If we kept an eye on those devices, we could reliably predict the performance of the software and the happiness of the users. Life was simple.
Enter virtualization. Once we started virtualizing the servers that we used to rely on for applications, we gained the ability to move those applications around. Instead of an inoperable server causing our application to be offline, we could move that application to a different system and keep it running. As virtual machines matured, we could increase performance and reliability. We could also make applications run across data centers to provide increased capabilities across geographic locations.
This all leads to the cloud. Now, virtual machines could be moved hither and yon and didn't need to be located on-prem. Instead, we just needed to create new virtual machines to stand up an application. But, even if the hardware was no longer located in our data center, we still needed to monitor what we were doing. If we couldn't monitor the hardware components, we still needed to monitor the virtual machines.
This is where our Nielsen example comes back into play. Nielsen knows how important it is to monitor the components of our infrastructure. So too must we keep an eye on the underlying components of our infrastructure. With virtual machines becoming the key components of our applications today, we must have an idea of how they are being maintained to understand how our applications are performing.
What if the component virtual machines are sitting on opposite sides of a relatively slow link? What if the database tier is in Oregon while the front-end for the application is in Virginia? Would it cause an issue if the replication between virtual machines on the back-end failed for some reason due to misconfiguration and we didn't catch it until they got out of sync? There are a multitude of things we can think about that might keep us up at night figuring out how to monitor virtual machines.
Now, amplify that mess even further with containers. The new vogue is to spin up Docker or Kubernetes containers to provide short-lived services. If you think monitoring component virtual machines is hard today, just wait until those constructs have a short life and are destroyed as fast as they are created. Now, problems can disappear before they're even found. And then they get repeated over and over again.
The key is to monitor both the application and the infrastructure constructs. But it also requires a shift in thinking. You can't just rely on SNMP to save the day yet again. You have to do the research to figure out how best to monitor not only the application software but the way it is contained in your cloud provider or data center. If you don't know what to look for, you might miss the pieces that could be critical to figuring out what's broken or, worse yet, what's causing performance issues without actually causing things to break.