As a database administrator (aka DBA, or Default Blame Acceptor) throughout my career, I’ve worked with a myriad of developers, system administrators, and business users who have all had the same question—why is my query (or application) slow? Many organizations lack a full-time DBA, which makes the question even harder to answer. The answer is sometimes simple, sometimes complicated, but they all start with one bit of analysis you need to do: whether the relational database management system (RDBMS) you are using is DB2, MySQL, Microsoft SQL Server, Oracle, or PostgreSQL.
It’s All About the Execution Plan
A database engine balances CPU, memory, and storage resources to try to provide the best overall performance for all queries. As part of this, the engine will try to limit the number of times it executes expensive processes, by caching various objects in RAM—one use is saving blocks with the data needed to return results to a query. Another common use of caching is for execution plans or explain plans (different engines call these different things), which is probably the most important factor in your queries performance.
When you submit a query to a database engine, a couple of things happen—the query is first parsed, to ensure its syntax is valid, the objects (tables, views, functions) you’re querying exist, and you have permission to access them. This process is very fast and happens in a matter of microseconds. Next, the database engine will look to see if that query has been recently executed and if the cache of execution plans has a plan for that query. If there’s not an existing plan, the engine will have to generate a new plan. This process is very expensive from a CPU perspective, which is why the database engine will attempt to cache plans.
Execution or explain plans are simply the map and order of operations required to gather the data to answer your query. The engine uses statistics or metadata about the data in your table to build its best guess at the optimal way to gather your data. Depending on your database engine, other factors such as the number of CPUs, the amount of available memory, various server settings, and even the speed of your storage may impact the operations included in your plan (DBAs frequently refer to this as the shape of the plan).
How Do I Get This Plan and How Do I Read It?
Depending on your RDBMS, there are different approaches to gathering the plan. Typically, you can get the engine to give you a pre-plan, which tells you the operations the engine will perform to retrieve your data. This is helpful when you need to identify large operations like table scans, which would benefit from an index would be helpful. For example—if I had the following table called Employees:
EmployeeID |
LastName |
State |
02 |
Dantoni |
PA |
09 |
Brees |
LA |
If I wanted to query by LastName, e.g.,
SELECT STATE
FROM Employees
WHERE LastName = ‘Dantoni’
I would want to add an index to the LastName column. Some database engines will even flag a missing index warning, to let you know an index on that column would help the query go faster.
There is also the notion of a post plan, which includes the actual row counts and execution times of the query. This can be useful if your statistics are very out of date, and the engine is making poor assumptions about the number of rows your query will return.
Performance tuning database systems are a combination of dark arts and science and can require a deep level of experience. However, knowing about the existence of and how to capture execution plans, allows you to have a much better understanding of the work your database engine is doing, and can give you a path to fix it.