CPUs are running faster now than ever before, solid-state drives are replacing spinning disks, and database servers have terabytes of RAM. Because these resources are plentiful and increasingly affordable, it can seem tempting to solve database performance issues with hardware. But consider this example: if CPU utilization shoots up to 90% on one of your servers and stays at that level for a day or two, is the problem lack of CPU power or something else?
As DBAs with years of experience in a variety of environment between us, we each have seen that most database performance issues result not from lack of enough CPU power or IO or some other hardware constraint, but rather from poorly written queries and inefficiently designed indexes.
We wanted to explore these issues with some of our colleagues, so we invited Pinal Dave
, Pluralsight evangelist and SQL Server consultant; Colin Stasiuk
, of Benchmark IT Consulting and SQL Server MVP; Jason Strate
, SQL Server MVP and consultant at Pragmatic Works; and Tim Mitchell
, business intelligence (BI) consultant and trainer SQL Server MVP, to share their own experiences with identifying the biggest impacts on database performance. We’ve included highlights of the discussion here in this article.
(Note: We think there’s value in sharing the perspectives of our colleagues in this way. If you like this format, please let us know by sharing this article.)
Question: In your experience, do you see more performance issues as a result of hardware (CPU, RAM, disk, network) issues or a result of bad code and design?
I have often seen organizations take a shortcut by upgrading hardware for immediate performance. In most of these cases, though, it’s turned out that the hardware isn’t the issue. In fact, if the code is bad and we add hardware, the performance issue can be further amplified. It’s best to identify the bad area of the code first to solve performance issues instead of looking at hardware solution.
I have run into instances with a number of clients in which they had robust hardware being limited by inefficient coding. In my experience, most performance problems can be significantly reduced or eliminated through proper coding.
Bad code and design problems far exceed hardware issues.
Often, the root cause [of poor performance issues] is that the code for the application is not properly leveraging the [hardware] resources. In one case [on which I consulted], the addition of indexes and a change in transaction isolation level resulted in such significant performance improvements that anticipated daily outages ceased and the tradition of "everyone on call for the holidays" was removed.
Question: Have you ever spent money on new hardware and still had performance problems due to poor code? Can you give an example?
In my early career, we had the issue of CPU running 100% most of the time during the day. We assumed that it was possible that our application had grown and needed more CPU. So we purchased a new server with very high power CPU and loads of memory—but our issue was not resolved. In fact, more and more users started to complain about the application performance. After much research we figured out that the real culprit was a cursor running in our application. Once we decomposed the cursor to a series of operations, our CPU load was reduced to one-quarter of the original load. We spent more money than we needed to, and delayed a solution to the real problem by focusing on hardware.
Row-by-row processing can be improved a bit by throwing more hardware at it (at most about a 5% improvement, if you're lucky), but in the end, writing proper code will always get you the biggest bang for your buck.
In one of my first DBA assignments, we were just purchasing a SAN for the data platform and the expected outcome was that there would be sufficient IO throughput to resolve the performance issues related to some key tables. With another client, a new SAN was brought in to replace an existing, overallocated SAN. In the first situation, the poor process for assigning primary key values was not addressed and performance issues continued. In the second, while the immediate IO bottlenecks were resolved, the code that was really causing the performance issues wasn't addressed; within a few years, the IO performance continued to degrade, even without substantial increases in [data] volume. Last time I checked in with that DBA, they were pushing SQL Server IO requests at peak time that exceeded available throughput by 3x.
Question: Have you ever been told to “stop blaming the code” and to make queries perform better? Can you give an example?
I consulted once with an organization that had over 1100 stored procedures and over 300 views in their single database. It was impossible for me to even understand the extremely nested logic of their code. Whenever I would find the root cause of an issue there, it was poorly written code. However, the complexity was so high, it wasn’t really possible to even start looking into the code. To try to improve performance without touching the code, I did some best practices-based, server-level optimization with the help of SQL wait statistics and a few third-party tools (like SolarWinds Database Performance Analyzer
Developers (just like anyone who takes pride in their work) never like to think that their code is wrong. I have seen many times when a developer writes a proc that runs screaming fast on their box, find that it falls down when it gets to production. The developer is working to get to an answer, to solve a bug and move onto the next bug. When the developer is working with an empty database (as are most databases that developers work from locally), it's very difficult to predict how the code will run against enterprise-level load and requirements. There’s a disconnect between development and testing and production.
I hear this often, especially from developers. Unfortunately, hardware doen't provide easy fixes to lack of indexing, implicit conversions, poor isolation level decisions, and many other code-generated issues. The only way to address these issues is through changes to applications.
As DBAs, it’s critical that we have the skills and tools we need to accurately identify the real causes of database performance issues. Throwing hardware at a performance problem can be a waste of time and money, and not even solve the problem—and as we’ve seen here, it can even make a problem worse. We should consider hardware as a possible culprit, but we should be looking for opportunities to improve query execution and index design, too