Deciding Which Storage Engine Is Right for You: WiredTiger
We've written several articles about the history and advantages of MongoDB's storage engines, explaining why WiredTiger replaced MMAPv1 as the default engine and the contexts in which MMAPv1 might still be the best choice for some users.
In this article, we'll continue these conversations and consider how you should decide which storage engine is right for you, this time returning to WiredTiger. It may be MongoDB's default engine, but is it the best option in every situation?
As mentioned above and in our previous articles, WiredTiger is now MongoDB's default storage engine, having replaced MMAPv1 when MongoDB acquired WiredTiger in late 2014, alongside the release of MongoDB 3.0. WiredTiger's development team also went to MongoDB, including Keith Bostic and Michael Cahill, who were originally widely known for their creation of Berkeley DB.
WiredTiger is a NoSQL, multiversion concurrency control (MVCC) storage-engine. The integration of concurrent threads allows the system to see a snapshot of the database at the time it accesses a collection. It then writes a consistent view of data to disk according to set checkpoints: the default setting is either every 2GB of writes or 60 seconds. This gives WiredTiger the ability to recover checkpoints anytime it's necessary. If you ever suffer a crash between checkpoints, WiredTiger can also recover un-checkpointed data with its journal files.
WiredTiger is highly scalable, employing document-level locking, which enables highly concurrent workloads, and its concurrency model allows the server to take advantage of many core CPUs. It stores its data using a B-tree structure, offering highly efficient reads and good write performance.
A hot cache means that WiredTiger implements a "least recently used" (LRU) eviction algorithm, defaulting to 50% of RAM and reserving 1GB for cache. It also relies on the OS page-cache to fetch compressed data without hitting your disk.
For Collections, WiredTiger uses Snappy compression by default. It can also employ gzip compression, in order to trade off CPU for increased compression efficiency. If necessary, you can override compression on a per-collection basis. For indexes, WiredTiger uses prefix-compression both on-disk and in-memory.
Finally a few additional features that round out WiredTiger's architecture:
- Its disk footprint is small: WiredTiger's disk usage is much less than MMAPv1's, even with compression disabled. WiredTiger doesn't need to pad data and it has a more efficient data storage format in general.
- Write ahead logging facilitates automatic crash recovery and makes writes durable.
- MongoDB's enterprise edition supports on-disk encryption for WiredTiger.
WiredTiger has quite a few advantages that check off exactly what many people are looking for when considering a storage engine. Fundamentally, it's a very sound choice. In general, its advantages surface directly from the engine's architecture, as we've described above:
- WiredTiger is highly scalable with concurrent readers and writers.
- Its compression system allows for efficient storage use and less disk I/O.
- It supports encryption for sensitive data.
Despite a very good architecture and the benefits it provides, WiredTiger does have a handful of drawbacks, which you should keep in mind and consider anytime you're choosing a storage engine for MongoDB:
- WiredTiger's concurrency scheme prevents in-place updates; updating one field in a document re-writes the entire document.
- WiredTiger's inclusion in MongoDB is still relatively recent — it's not fully battle-tested. It's only been included in MongoDB since 3.0 — a time period of a few years. WiredTiger is a relatively complicated storage engine (in comparison to MMAPv1, at least) and there's been less deep experience using it in production.
WiredTiger's big takeaway is straightforward: it's a great general-purpose storage engine. As such, it's an excellent default choice if you're not sure what to use or if you don't have a strong reason to use an alternative — which is great news, as it's automatically included as the default engine in current versions of MongoDB.
Of course, your choice of storage engine depends on your particular use case, and there are scenarios in which WiredTiger is not the optimal pick. For example, users with many large documents in separate collections, or users with a mostly read-only workload, may want to choose MMAPv1 instead. You can read more details about the advantages of MMAPv1 here. Though WiredTiger is an excellent default engine, you should ultimately let your use case drive your decision.