When you find spikes in your database's CPU consumption, locating the cause behind those peaks can be a daunting task. Database server software doesn’t make it easy to precisely analyze what's taxing CPU and IO, and, due to the system's complexity, locating the source of the burden on system resources isn't just a matter of pulling metrics from the database and connecting the dots. These figures must be calculated.
Database servers perform a lot of work asynchronously in order to share resources and gain efficiency. Many queries’ writes, for example, are aggregated and deferred to be done later, often with a single IO operation. Because of this disconnect, accurate, automatic performance cost accounting is impossible.
However, there are ways to powerfully leverage mathematical techniques, such as regression analysis, to discover just what's weighing on your system's resources. These methods make it possible to infer certain cause-and-effect relationships (i.e. queries impacting resources) and map resource loads directly to the offending queries.
But even a math-driven approach relies on a user's ability to interpret results and draw conclusions, to make sense of complexity. Getting to the bottom of performance issues—and locating valuable information such as the CPU load of a query—means making inferences and comparing things that might look similar, to locate significant trends. For many modern day technologies—not just performance management tools-- this kind of task involves using profiles.
Profiling is a powerful method for categorizing similar types of work and ranking those metrics by a defined dimension. In the world of database performance, a profiler can help with visualization and can answer questions like "Which queries run most frequently?" and “Which queries take the most time to complete?”
Profilers are probably familiar to most readers, appearing in various aspects of everyday lives. Online personal finance tools, such as Mint
, are a good example of basic profiles in action. Profiles like these let users sort and segment their income and expenses, compare expense categories over time, and—for those of us with the most holiday shopping ahead—use categorizations to determine the root cause of our dwindling checking account balances.
The online real estate site Zillow uses profiles too, allowing users to compare historic prices in a particular neighborhood, using sorting criteria such as number of bedrooms, bathrooms, square footage, etc. Zillow can also help calculate a “Zestimate” value using their proprietary algorithm to estimate the current value of your home—another kind of profiling, using Zillow's own, unique metric. These types of analyses don't only solve problems quickly (find a house that meets certain criteria), they also help structure future plans (help develop a strategy for maximizing your own house's Zestimate when it goes on market in six months).
Other examples of everyday profiling include the proliferation of popular health apps available in wearable tech and on smartphones. Health profilers can help you keep track of aspects of your life such as "steps taken" or "calories eaten"-- often difficult to view in the long term--so you can identify trends, peaks, valleys, and outliers. Because health technology can measure metrics and present them in neat profiles, they offer powerful new approaches in how users can manage (and optimize!) their personal health.
Using Profiles for Database Monitoring
Profiling is at the core of SolarWinds Database Performance Monitor’s (DPM) approach to workload analytics. With the Profiler dashboard, users can analyze their systems at the highest level (e.g. datacenter summary) all the way down to single query execution on a single server. For fast problem resolution, a DBA often needs to isolate specific issues, so having a single, interactive view to rank, filter, slice-and-dice, and drill down into your systems is critical. Visual comparisons make the task of finding your resource hogs much easier.
In this screenshot of DPM's Profiler, you can see a list of the seven most frequent queries in a database server. Each row displays a type of query along with statistics showing that query's frequency over a time interval.
The blue bar under the query name (1) indicates the percent of the total executed. The graph in the center (2) shows query execution over a specific time interval. So far, this is fairly standard profile analysis. Where it gets interesting is in the two columns to the right, where you see CPU and IO Ops performance (3). From here, you can drill down and map query performance and resource utilization. You can see that the CPU time is roughly proportional with how often the queries execute (e.g. query #1: 46.58% executed vs. 46.51% CPU time). However, you can also see that most of the server’s IO load comes from query #1 (58%), with query #2 using much, much less IO (5.36%). You now have a starting point for additional drill down and troubleshooting.
How and Why We Use Weighted Linear Regression
For the reasons mentioned above, the per query CPU Time and IO Ops metrics are virtually impossible to measure directly in the database server and therefore need to be calculated. After countless hours of research and analysis, Baron Schwartz found that using a weighted linear regression provided the most accurate results.
Compared to standard linear regression that would involve performing regression between each query’s metrics and the total CPU or IO metric for the server, Baron decided instead to go frame-by-frame in the time series of these metrics and assign a weighted proportion of the CPU or IO to each query present within that time frame, proportional to the magnitude of each query’s metric. We've found that this approach produces much better results than alternative, computing an estimate of each query’s relationship to an external metric such as CPU or IO.