The USE Method
USE is an acronym for Utilization, Saturation, and Errors. Brendan Gregg suggests using it to get started quickly when you’re diving into an unknown system: “I developed the USE method to teach others how to solve common performance issues quickly, without overlooking important areas. Like an emergency checklist in a flight manual, it is intended to be simple, straightforward, complete, and fast.” A summary of USE is “For every resource, check utilization, saturation, and errors.” What do those things mean? Brendan defines the terminology:- Utilization: the average time the resource was busy servicing work
- Saturation: the degree to which the resource has extra work which it can't service, often queued
- Errors: the count of error events
The RED Method
I first saw this acronym in a talk on monitoring microservices in 2015. The acronym stands for Rate, Errors, and Duration. These are request-scoped, not resource-scoped as the USE method is. Duration is explicitly taken to mean distributions, not averages.USE and RED: Two Sides of the Same Coin
What may not be obvious is USE and RED complement one another. The USE method is an internal, service-centric view. The system or service’s workload is assumed, and USE directs attention to the resources handling the workload. The goal is to understand how these resources behave in the presence of the load. The RED method, on the other hand, is about the workload itself, and treats the service as a black box. It’s an externally-visible view of the behavior of the workload as serviced by the resources. I define workload as a population of requests over a period of time. I’ve spoken and written extensively before about the importance of measuring the workload, since the system’s raison d’etre is to do useful work. Taken together, RED and USE comprise minimally complete, maximally useful observability—a way to understand both aspects of a system: its users/customers and the work they request, and its resources/components and how they react to the workload. (I include users in the system. Users aren’t separate from the system; they’re an inextricable part of it.) I often refer to this duality as the "Zen of Performance," a holistic, unified system performance worldview I'm developing. It's work in progress!Mapping USE and RED to Standard Terminology
USE and RED are convenient, and part of the reason they’re so valuable is their atoms map directly to standard concepts that are core performance metrics:- U = Utilization, as canonically defined
- S = Concurrency
- E = Error Rate, as a throughput metric
- R = Request Throughput, in requests per second
- E = Request Error Rate, as either a throughput metric or a fraction of overall throughput
- D = Latency, Residence Time, or Response Time; all three are widely used