Understanding your SQL Server Monitoring: Part 2 CPU
February 5, 2016
Infrastructure
Welcome back. Last time we met, I reiterated the importance of a good SQL Server monitoring solution, and not just monitoring, but monitoring intelligently. We also went over ways to monitor your storage so you can see when things are going in the wrong direction from a performance perspective. In this article, we are going to move on to some of the top ways for diagnosing CPU pressure in SQL Server. Like the last article, I will go over some of the metrics that are helpful for diagnosing CPU pressure.
Just like with monitoring storage, one of your best friends are your wait stats. Wait stats show how long, and how many tasks, have had to wait, and categorizes them in to useful and granular categories. One of the most common things that leads to CPU pressure is when SQL Server is installed on a machine and the default settings are not changed. The specific default settings that affect CPU the most are Cost Threshold of Parallelism and Maximum Degree of Parallelism, or MAXDOP as it’s most commonly known. To understand the basic idea of both of these metrics, it’s helpful to understand, at a high level, how a query is handled by SQL Server when you click execute. Once a query is submitted, it is received by the SQL Server engine which scores each step of the query with a cost, for instance the join are where clauses are scored on how much of a resource cost they are. If the cost is more than the set Cost Threshold of Parallelism, then the engine will look to split out the work amongst multiple threads. The number of threads it will use is dependent on the cost of the work of each additional thread spun up for the work. The MAXDOP setting tells SQL Server what the maximum number of threads that it can spin up is. It will keep spinning up threads until it hits the MAXDOP or the cost of the threads is below the Cost Threshold of Parallelism setting.
Here is where the default, “out of the box,” settings creates issues. The cost setting, by default is only 5. This low of a cost makes it very difficult for SQL Server to be picky about which queries are parallelized. There are a lot of different recommendations out there for how to configure this advanced option. I personally reference Jonathan Kehayias’s article here for a recommendation on how to tune the setting. This uses the existing query workload to determine what the setting should be. Now for the MAXDOP setting, this is set to 0 by default. That setting in itself can sound slightly concerning. You would think that a setting of 0 would mean no threads, but it actually means the exact opposite. This setting means there is no limit. It allows SQL Server to create as many threads of work as it determines are necessary. This can create problems if the cost is too low and MAXDOP is set to 0, because then SQL Server can spin up more threads than it should. When this happens, threads can get stuck in a situation of waiting for threads to complete. When the threads wait, this increments the wait statistic CXPACKET. It’s important to remember that the numbers you will see when you query wait statistics are cumulative since the last time the SQL Server Service was started. You need to look at the average length of the wait per category, to understand if you have, on average, long waits, which would indicate a problem, or short waits which may not be that big of a deal now. If I look at my wait stats and they show short CXPACKET waits, I start monitoring that statistic more often, to watch if it increases. If it increases over the next couple of weeks, this tells me that I need to look at either tuning my Cost Threshold for Parallelism configuration, or backing down my MAXDOP setting.
As a great transition to our next post, let’s take a second to go over another metric which doesn’t necessarily mean that you are experiencing CPU pressure when this is high, but that your CPU is getting a higher than normal number of requests. This metric is a total of compilations and recompilations of query plans. Compilations are the action that SQL Server does when it receives a query for the first time and it compiles the query plan. The query plan is the instructions that SQL Server has created for it’s worker threads to process the query. Recompilations are the same action, but are kicked off by SQL Server for queries that have had a plan in the query cache already, but the plan has expired. Compilations and recompilations, like everything in SQL Server, can go a lot deeper than this. A great post that goes in to deeper detail about compilations and recompilations, is available here on TechNet. It is based on 2005, but a lot of the principles still hold true.
In my next post, we’ll get back in to compilations and recompilations, to understand how they can be a sign of memory pressure. We’ll also cover other signs to look for in your monitoring, that indicate that memory pressure is negatively impacting your performance.