There's an important heuristic in database indexing that I call the left-prefix rule. It helps you understand whether the database will be able to make the best use of a multi-column index to speed up a query. It's one of a small handful of really important things to understand about indexes!
The left-prefix rule can be confusing, but once you grok it, it's simple and easy to use. It applies to MySQL, PostgreSQL, MongoDB, and many other databases. In this post, I'll explain what the left-prefix rule means and how it works.
How Multi-Column Indexes Work
Most database indexes are a sorted copy of some data from the table they index. Suppose you have a database of people and you want to find them by name. If you create an index on the first_name attribute, the database makes a sorted, searchable copy of everyone's first name, with pointers back to the original records.
Indexes can have more than a single column. Multi-column indexes are also called compound indexes or composite indexes. The underlying implementation is sorted by all of the columns, in order. Suppose you index (last_name, first_name): now the database creates a copy of both of those columns and sorts them first by the last name, then if there are any people with the same last name, it sorts them by first name.
Here's an example. Notice how there are two entries for Bach, and which one comes first.
The Left-Prefix Rule
The left-prefix rule says a query can search an index efficiently if it provides search terms (such as values in a WHERE clause) that match the leading columns of the index, in left-to-right order.
In our example above, if we're looking for people named Bach, we can search the index efficiently and find all of the Bachs. If we're looking for {Bach, Carl Philipp Emanuel}, we can find that record efficiently too.
But if we're looking for people named Irving, we're going to have to scan the whole index, because there could be Irvings anywhere in the index. It's not sorted in a way that keeps all the first names together, unless you're looking within a single last name. So another way to state the left-prefix rule is that searches that don't constrain the leading index columns aren't very efficient. The index might not even be usable for such searches.
Inequalities, Ranges, and the Left-Prefix Rule
There's more to the left-prefix rule. The usable prefix of the index, in most databases and under most circumstances, is
up to and including the first inequality or range search condition.
Suppose we have a four-column index on columns (a, b, c, d). If we search for records where a=X, and b=Y, and c>Q, and d=R, then the usable prefix is only the first three columns. That's because the database will be able to progressively narrow the search until it gets to the fourth column:
- Narrow the search to a=X. Good.
- Narrow the search to b=Y. Okay.
- Narrow the search to c>Q. That's a set (or range) of potentially many different values of c, so the narrowing stops after this.
- Look for d=R. This is a scan. Within each possible value of Q, the database must scan to find rows with d=R.
Gaps in the Prefix
There's one more case to consider: what if you provide search terms for some but not all of the columns in the index? What if you search for a=X, b=Y, and d=R?
The usable prefix of the index, that's constrained/narrowed by your criteria, is limited to the
contiguous prefix for which you've provided search terms. In this case, that's just (a, b). That's your usable left-prefix for constraining the search. A "hole" or gap in the index, meaning that the index has a column without a search term to apply to it, ends the prefix.
Concluding Thoughts
Now you know the three important tenets of the left-prefix rule!
- A query can search an index efficiently if it constrains the leading columns of the index, in left-to-right order...
- Up to and including the first inequality or range search condition...
- Without any gaps in the column prefix.
Next time you're examining the EXPLAIN plan of a query that uses an index, and you see that it doesn't use all the columns of a multi-column (compound/composite) index, check whether the query satisfies the left-prefix rule. If it doesn't, that might explain why the index's full power isn't being used to speed up the query as much as it could!