If you’re looking at response time, what’s more useful: a mean or a percentile? Not sure? You probably know how to calculate the average or mean of a sample, but…
Statistical Process Control (SPC), or using numbers or data to study the characteristics of our process to make it behave the way we want it to behave, has been around…
At Velocity last week, I spoke about how we quantify abnormality in a system’s time-series metrics cheaply, in realtime, at high frequency. Note that this is not the same thing as…
Not too long ago, my primary programming language was Perl. I’ve written a lot of Perl, including some things that I think are quite clever. And therein lies the problem.…
In previous posts, I claimed that thresholds are a root of much evil in monitoring systems (not the root of all evil, but a root of much evil), and that…
If you don’t know socat, you probably should. From its man: Socat is a command line based utility that establishes two bidirectional byte streams and transfers data between them. Because the…
Why is a threshold-based alert such a disaster? There are two big reasons. Thresholds are always wrong. They’re worse than a broken clock, which is at least right twice a…
In this post I’ll tell a story that will feel familiar to anyone who’s ever monitored MySQL. Here’s a recipe for a threshold-based alert that will go horribly wrong, beyond…
This post is part of an ongoing series on the best practices for effective and insightful database monitoring. Much of what’s covered in these posts is unintuitive, yet vital to understand. Previous…
I wrote a couple of posts previously on SQL Server consolidation. The first post tried to give insight on some of the problems and associated motivating factors that most companies have…