Why WiredTiger Is the Default MongoDB Storage Engine
When we published our blog post about working-sets in MongoDB, one of our colleagues told us he’d been aware that MMAPv1 had previously been MongoDB’s default storage engine, but he didn’t know why WiredTiger had since taken its place. We saw this as an opportunity to write a follow-up post to explain some of the contrasts between the two and explore the advantages of WiredTiger, which (we think) led MongoDB to making it the system’s default storage engine.
But if you don’t have time for all that, TL;DR: WiredTiger is a more general-purpose storage engine; MMAPv1 is not.
A Brief History of WiredTiger
MongoDB acquired WiredTiger in December 2014, shortly before the release of MongoDB 3.0. They also acquired the team of storage engine gurus behind WiredTiger, which included Keith Bostic and Michael Cahill, who were originally well-known for their creation of Berkeley DB.
Upon acquisition, Cahill became Director of Engineering at MongoDB, and at the time of the acquisition’s announcement, it was already planned for WiredTiger to be included with MongoDB 3.0, billing “lower storage costs, greater hardware utilization, and more predictable performance” and with the headline “A New Storage Engine for High Scale Apps”—scalability remaining at the core of MongoDB. The process of swapping to WiredTiger for current users, MongoDB said, would be “non-disruptive for existing deployments; applications will be 100% compatible, and upgrades can be performed with zero downtime.”
WiredTiger Implementation and Architecture
Of course, when WiredTiger was first included with MongoDB v3.2 as the default storage engine, it wasn’t being plugged into a vacuum: it was replacing a previous default, MMAPv1.
So, what are some of the distinguishing characteristics of WiredTiger?
In terms of implementation and architecture, WiredTiger is a general purpose storage engine that uses MVCC (Multi-Version-Concurrency-Control) to provide a point-in-time snapshot of the data contained in each of your system’s transaction (though this only provides document-level concurrency).
MongoDB configures WiredTiger to create checkpoints; that is, it writes the snapshot data to disk at intervals of either 60 seconds or 2 gigabytes of journal data. Therefore, in the event of a crash, even without journaling, a database supported by WiredTiger should be able to recover to the last valid checkpoint. Internally, WiredTiger stores data using a Btree layout, as opposed to MMAPv1’s memory-mapped files. It also supports using a log-structured merge tree (LSM) layout, although MongoDB does not currently support that as a configuration option.
WiredTiger vs. MMAPv1
When comparing the two MongoDB default engines, the current and the former, there are four major differences to note.
- WiredTiger performs better on multicore systems.
- MMAPv1 is not designed to scale with multiple cores; adding CPU cores does not improve performance by much.
- WiredTiger performs its locking on the Document level, whereas MMAPv1 performs it on the Collection level, resulting in superior concurrency for WiredTiger.
- WiredTiger supports gzip and snappy (default) compression for indexes and collections; MMAPv1 does not support compression.
- The size of WiredTiger collections is smaller than MMAPv1, with or without compression enabled.
- WiredTiger supports index-prefix compression, reducing the size of indexes both on disk and loaded in-memory.
- The enterprise version of MongoDB with WiredTiger includes an option for encryption at rest.
Wait, So Should I Ever Use MMAPv1?
In short, of course, but only if your workload is suitable for it. For example, MMAPv1 works very well when you have large documents that you update frequently, but only in a few fields each time. With WiredTiger in such a situation, you’d see much more I/O utilized in this workload; it might make sense to use MMAPv1 instead.
Ultimately, WiredTiger performs well in most use-cases, whereas MMAPv1’s design choices make it suitable in specific, specialized cases.
As MongoDB began to see widespread use, it began to require a storage engine more fit for general-purpose applications than MMAPv1. WiredTiger was an excellent option with timely development, and, as a result, it’s the default storage engine for MongoDB in 2017.