Home > Why You Should Almost Never Alert On Thresholds

Why You Should Almost Never Alert On Thresholds

April 8, 2013 | Database

This post is part of an ongoing series on the best practices for effective and insightful database monitoring. Much of what's covered in these posts is unintuitive, yet vital to understand. Previous posts have covered Why Percentiles Don't Work the Way You Think; how to avoid getting to a point When It's Too Late to Monitor; how to tell If a Query Is Bad; and an explanation of why, when looking at charts, you should understand that A Trendline is a Model. Alerting when a metric crosses a threshold is one of the worst things you can do. Why? Because threshold-based monitoring is prone to false positives and false negatives, which correspond to false alarms and missed alarms (a.k.a. useless noise and useless silence). One of the worst things about most monitoring systems is the incredible amount of noise they generate. IT staff members react to this predictably: they filter out as many alerts as they can, and ignore most of the rest. As a result, a system that didn’t really work well to begin with becomes even less useful. This problem is universal. When I speak at conferences, I ask how many people are using Nagios. Usually well over half of the hands go up. I then ask how many people do not have email filters set up to shuffle some Nagios alerts to /dev/null. There’s usually one or two hands remaining, and everyone laughs uncomfortably. We talk about root cause analysis for systems, but what’s the root cause of the email filters? It’s largely due to thresholds. Simple alive/dead status checks are easy to get right, but a lot of Nagios-style alerting is based on thresholds. This never works. In theory it could sometimes work (albeit rarely), but in practice it never does. In my next post I’ll explain why this is.

Baron Schwartz

Baron is a performance and scalability expert who participates in various database, open-source, and distributed systems communities. He has helped build and scale many large,…

We're Geekbuilt ^™.

Developed by network and systems engineers who know what it takes to manage today's dynamic IT environments, SolarWinds has a deep connection to the IT community.

The result? IT management products that are effective, accessible, and easy to use.

Company Career Center Resource Center Email Preference Center

For Customers For Government For Partners GDPR Resource Center

Why You Should Almost Never Alert On Thresholds

Tags

SQL Server Consolidation, Part 3

A Sure-Fire Recipe For Monitoring Disaster

Why You Should Almost Never Alert On Thresholds

Tags

SQL Server Consolidation, Part 3

A Sure-Fire Recipe For Monitoring Disaster

You may also like