Have you ever deployed a new application that ran fine at first, then slowed to crawl as more and more data was added? Or tried to run a report that took minutes or even hours to come back? Database performance is a frequent bottleneck for many applications, and in this post you’ll learn about a critical aspect of database performance—indexes.
What Are Indexes?
An index is a linked copy of a subset of a table existing within the same database. It can consist of one or more columns (and it doesn’t have to include all of the rows). The index is linked to the table so when data is inserted, updated, or deleted in the table, it’s reflected in the index before the transaction is marked complete by the database. Indexes allow the query optimizer to use a subset of the table data. Here’s an example:
SELECT FirstName, LastName
FROM Employees
WHERE LastName = ‘Smith’
This query retrieves the first name and last name from the table named Employees. If you don’t have an index, the query would have to read the entire table. But if you have an index of the LastName column, instead of reading the entire table, the engine will only read the specific column.
While this might seem minor at first, if I told you the Employees table has 50 columns and 10,000,000 rows, the reduction in records read suddenly becomes significant. And this is especially the case when you’re retrieving a small number of records, such as the balance of a specific customer.
Should I Just Index Every Column?
When I’ve done performance tuning on systems, most of the time I see under-indexing, but every now and then I come across a system where someone got creative and put an index on every column in every table in the database. But indexes need their own space on disk, and are kept in sync with the data in the table. If we go back to our 50-column Employees table, 50 indexes would mean every time a new record was written, to complete the write to the main table, the database engine would need to write to every one of those 50 indexes. As you can imagine, this would drastically decrease the performance of write operations. And those 50 indexes would take up a lot of disk space.
Which columns should you index? The obvious answer is any columns referenced in the
where and
order by clauses of commonly executed queries. Those columns will benefit the most, as that’s where the initiation work in data retrieval is done. Most tables also have a primary key, which uniquely identifies each row. In the Employees table, this would likely be
EmployeeId. The primary key is inherently an index.
You should also index columns referenced in foreign key relationships; these reference columns in other tables and are the subject of frequent lookups. In some database systems, you can also create filtered indexes—an index with a WHERE clause.
For example, in the Employees table, if there’s an IsActive field to identify employees who are currently working for the company, we might create a filtered index on isActive = 1 as shown below:
CREATE INDEX IdxFilter on Employees (EmployeeID, LastName)
WHERE IsActive=1;
This would reduce the size of our working set of data and make queries against employees more efficient.
Types of Indexes
Depending on your database engine, you’ll find many types of indexes available. The two most common are clustered and nonclustered indexes.
A clustered index is the actual table data, sorted by the clustering key (usually the primary key of the table). Tables without clustered indexes are called heaps—this pattern is common in Oracle, but less common in other databases. An exception to this is temporary staging tables, which are designed strictly with data ingest speed in mind.
A nonclustered index, on the other hand, is an on-disk structure separate from the data rows in the table, like our LastName example. In the LastName example, LastName would be considered the key in our nonclustered index which has a pointer in the nonclustered index points at a row locator in a clustered index, or heap, allowing for rapid retrieval of the row.
You also have the option to include additional columns in a nonclustered index—these columns aren’t part of the key, and are used for additional columns potentially required for the query. These included columns can significantly improve performance by eliminating lookups to the base table to retrieve records.
Other common index types include those for nested datatypes like XML and JSON, which allow data to be queried more efficiently. Another modern type of index is columnstore, which stores data in columns. This allows for a great deal of data compression, and is helpful for large aggregation queries like those encountered in a data warehouse.
These latter types of indexes are more for special use, and you should investigate the options available in your database platform.
Keep Your Databases Humming
The importance of indexes to database performance cannot be overstated. Frequently, one index can serve multiple queries and drastically improve the overall performance of a server.
This overview of database indexes is one part of making sure your databases are working as efficiently as possible. To get the most out them, it’s helpful to get the right tools to help you monitor and manage them. The SolarWinds
database management portfolio, Database Insights for SQL Server, Database Performance Monitor (DPM), Database Performance Analyzer (DPA), and SQL Sentry, can help you with your monitoring and management needs. If you’re ready to find out what SolarWinds can do for your databases,
request a demo with their database management experts.