- For each category of queries, we detect changes in frequency, total accumulated time, and latency
- We detect important changes in overall error and warning rate, globally (not per-category-of-query)
Introducing Query Anomaly Detection
November 11, 2015 |
Database
Anomaly detection sure is a hot topic. We’ve written about it ourselves a number of times, and Preetam Jinka and I just co authored a book for O’Reilly called Anomaly Detection For Monitoring. One of the challenges, as we’ve discussed so often, is that catch-all, generic anomaly detection is hard to do.
In special cases, however, there’s often a niche use case that can be done well and is highly beneficial. Query behavior changes are an example of that, and I’m happy to announce that SolarWinds® Database Performance Monitor (DPM) now has advanced statistical algorithms running to detect important changes to your most important queries continually.
What does query anomaly detection mean? Good question! In general, a lot of anomaly detection techniques try to compare current behavior to past behavior and determine if we’re within ranges of expected behavior.
You’re probably familiar with various ways to do this, such as Holt-Winters forecasting. HWF includes seasonality so you won’t have skewed expectations at 5am based on 5pm’s traffic, for example.
There are so many ways that system behavior can change, however, that most anomaly detection techniques alarm on lots of false positives. That is, they tell you something’s unusual far too often. To avoid this, we’ve taken a more sophisticated approach that we’ll write up in more detail later.
As for what we check for anomalous changes, though, we’re again taking a very specific view of that. Database Performance Monitor measures many thousands of metrics per server every second (sometimes much more). We’re not going to check all of those; as I’ve discussed previously, it’s vital to understand the meaning of the metrics and only check meaningful metrics in sensible ways.
In brief, we look for anomalous changes in query metrics first of all: