Houston, we have a problem. Google search for “What is innodb history list length?” and you get a bunch of nonsense mixed in with correct information, and it’s hard to tell which information is right. Let’s fix this.
What Is InnoDB History, Anyway?
InnoDB is an MVCC storage engine, which means you can start a transaction and continue to see a consistent snapshot even as the data changes. This is implemented by keeping old versions of rows as they're modified.
They’re kept in a linked list. The most recent version points to the previous one, which points to the previous one, and so on. Each version has a transaction ID with it, so when InnoDB goes looking for a row, it compares your current transaction ID to the row version and selects the right one for you. This can be done without locking.
MVCC is complicated
and I’m simplifying, but that’s the essence of what happens.
What Is the History List?
The InnoDB history list is the undo logs which are used to store these modifications. They're a fundamental part of InnoDB’s transactional architecture.
How Can I Inspect the History List?
In classic InnoDB, you had to look at the InnoDB transaction “monitor,” which is best-known as the
SHOW ENGINE INNODB STATUS
command. A little section of it looks like this:
------------ TRANSACTIONS ------------ Trx id counter 26168770192 Purge done for trx's n:o < 26168770192 undo n:o < 0 state: running but idle History list length 1839
This shows which transactions have been purged and how long the history list is.
In some variants of MySQL, such as Percona Server, there’s a
which makes this a lot more accessible to monitor.
What Are the Units of the History List?
So if the history list length is 1839, what exactly does that mean?
Various sources online claim this is in different units:
- Unpurged old row versions
- Unpurged transactions in the undo space
- Modifications to the database
- Undo segments (huh?)
- Unflushed segments in the undo log (not clearing things up…)
- Undo pages, or number of pages in the undo log (what?)
- Undo logs (you mean, like log files?)
So what is it, really?
Let’s go to the source:
So as you can see, the call stack is a bit twisty if you’re not familiar with the InnoDB internals!
The best name for the unit is undo logs, which are “update undo logs for committed transactions,” to quote the source. But an “undo log” is a special term with specific meaning in InnoDB. An undo log is a single set of atomic changes a transaction makes, which may actually modify multiple records.
Naming is hard. This is one of the 10 hard things in computer science, we all know that. (The other thing is cache invalidation. Binary arithmetic is not a third hard thing.) Most of the confusion around the history list comes from how the terms are defined and used in InnoDB’s source code. To understand these terms, you have to understand a lot of the foundations of InnoDB
In layman’s terms, to be a bit hand-wavy, the history list length is in units of changes. If you're a Git user, you might think of it as analogous to the commit history of the database. (Note, however, the entire history isn’t retained forever as it is in Git.)
How Does the History List Interact With InnoDB’s Internals?
In very complicated ways.
Old row versions are eventually deleted. What does “eventually” mean?
- When they’re not needed anymore – no currently running transaction could possibly see them anymore. Hence the advice that “long running transactions are bad” because they delay purge. InnoDB’s purge, by the way, is analogous to PostgreSQL’s VACUUM, although it is considerably more sophisticated. (This may not be a good thing. It’s really hard to understand, and has difficult-to-manage edge cases.)
- When the background purge thread(s) get around to it. The purge subsystem in InnoDB is really confusing for anyone not intimately familiar with the source code. It’s a bunch of algorithms trying to purge fast enough to keep up, but steadily enough not to cause serious performance problems. The algorithms have varied significantly over time. There are also variations on the number of purge threads (there used to not even be a dedicated thread for this, it was just catch-as-catch-can in a “main thread”). All kinds of difficult-to-balance things are traded off against each other.
- It’s not just about long-running transactions, but also about long-running statements.
- The transaction isolation levels influence which old row versions need to be kept, too.
- The history list can, with some nuances I can’t dive into here, cause InnoDB to delay DML operations.
The history list is unflushed
changes, by the way. This means as the history list grows, the amount of flushing (writing pages from the buffer pool to disk) the server has to do in the future grows.
What Is the Relationship to Purge Lag?
This is quite complicated and is explained in more details than I can include here, in some of the references at the end of this article.
However, in high-level terms, purge lag is closely related to the history list. To quote the fine manual
The InnoDB transaction system maintains a list of transactions that have index records delete-marked by UPDATE or DELETE operations. The length of this list represents the purge_lag value. When purge_lag exceeds innodb_max_purge_lag, each INSERT, UPDATE, and DELETE operation is delayed…. The lag value is displayed as the history list length
Where Is the History Stored in InnoDB?
There’s a special “undo space” which is page-organized in InnoDB. Like all other pages, it's kept in the buffer pool short-term and written to disk long-term. The undo space is transactional itself – any modifications to it are ACID.
This is really important for performance. It means finding old row versions has to, in general terms:
- Look at the row and see if it’s the right version to be used
- If not, go to the undo space and find an old version
- If the appropriate page in the undo space isn’t in the buffer pool, load it from disk
- If there’s no space for it, remove pages to make room in the buffer pool
- If there’s no “clean” pages to remove, flush pages to disk to make them clean to let them be removed to make room.
- If there’s too much flushing scheduled, get in line and wait your turn…
And so on.
And undo logs are different sizes; inserts can be small, but updates and deletes require storing more data to recreate the old row version correctly.
In other words, accessing old row versions can be, in certain circumstances, a very expensive operation.
Percona Server (XtraDB) has some features designed to help you examine the composition of your buffer pool, which can be helpful for understanding edge-case situations.
I won't mention rollback segments, because they explode the complexity of this topic even further.
Databases are complicated. Naming is hard. InnoDB’s history list is the array of changes transactions have performed, stored as undo logs that can be used for rollback, as well as recreating history for MVCC purposes.
Unfortunately, as I discovered halfway through writing this article, it’s difficult to explain without explaining all of InnoDB.
Here are some references written by people in their left minds.