For MongoDB users, knowledge of working sets is critical. Understanding the interactions between your working set and physical memory can make a major difference in how your system performs.
To optimize MongoDB performance, you should consider and know the following:
- How working sets fit into the scheme of MongoDB in the first place
- Why it's important to fit your working set within your available RAM
- Why you should avoid pulling from disk
- When and how it's possible to tell there's a mismatch between your working set and your RAM
What Is a Working Set?
MongoDB's documentation defines a working set as follows:
"[A] working set represents the total body of data that the application uses in the course of normal operations… For best performance, the majority of your active set should fit in RAM."
As you run queries through MongoDB, your working set consists of any data the server requires to fulfill a query. As this happens, your server cache will behave dynamically, according to what the working set needs. An implicit part of this process involves adding or removing documents from your physical memory, based on your system's limits.
Why is this important? Fitting your working set to RAM is vital for performance: going to disk is costly. As this breakdown from a UC Berkeley study on latency numbers shows, reading from memory is significantly faster than reading from disk. As this visualization of 2017 speeds shows, even if you're using SSD, accessing from memory is an order of magnitude more efficient than accessing from disk.
The Common MongoDB Storage Engines
There are two primary storage engines available for MongoDB:
MMAP
- As Mongo's documentation explains, "MMAPv1 is MongoDB’s original storage engine based on memory mapped files. It excels at workloads with high volume inserts, reads, and in-place updates." As of MongoDB version 3.2, however, MMAP is no longer MongoDB's default storage engine—WiredTiger (discussed below) now holds that spot.
- With MMAP, MongoDB will use all available memory, as necessary. There are no configuration options to limit the data kept in-memory.
WiredTiger
- As mentioned above, WiredTiger is now the default storage engine for MongoDB.
- WiredTiger has an internal data cache, but is configured to also leave memory for the operating system’s file cache. It's generally recommended to leave your cache size at its default setting.
- These defaults settings will use the larger of either of the following:
- 50% of RAM minus 1 GB.
- 256 MB.
- Outside WiredTiger's cache, the operating system will cache frequently accessed files for the database in its own file cache. WiredTiger compresses data files by default, so leveraging the OS cache lets more data stay in the server's physical memory.
Other MongoDB storage engines include RocksDB and TokuDB, which are maintained independently and work through the pluggable storage engine API.
How Can I Tell if my Working Set Fits my RAM?
For MMAP, a high number of page faults is a leading indicator that instead of finding data in RAM, it's going to disk to fetch it. This, of course, means the working set isn't fitting within your available RAM. Inversely, you can look directly at your RAM usage: as explained above, MMAP uses all available memory automatically. Therefore, if you're not using all of the RAM, then you can safely conclude the working set fits into RAM, even with room to spare.
For WiredTiger, you can also begin by looking directly at memory usage. If your server has free RAM, this inherently means the working set fits into your memory, as WiredTiger automatically uses its set cache plus your operating system's cache. If your server looks to be using most of its memory, look at the pages-read-into-cache and unmodified pages evicted metrics reported for WiredTiger by MongoDB’s serverStatus
command. High activity for those two metrics indicates the server is cycling data through the cache, which means your working set doesn't fit in memory.
The last metric to consider, of course, is volume of disk reads—if you see a lot of read activity on your drive, it's generally a good sign your working set is overflowing your memory. Because WiredTiger interacts with your OS cache, this may be a more reliable metric to evaluate working set ratio—e.g. you may be able to see high cache evictions but low IO because of how your cache behaves.
Conclusions
The ratio in MongoDB of working set to available memory has a major effect on your bottom line system performance. Forcing your MongoDB storage engines to work from disk saddles the system with a large, costly strain—but it's not always obvious when this mismatch happens. Instead of waiting to see your system performance suffer, you can proactively evaluate whether your systems have enough memory to cope with your working set, and ensure you continue to scale effectively.