Cloud Assessment: MongoDB Two Generations of Engine Stacks Up
In this article, we will delve into the differences between two generations of storage engines in MongoDB: MMAPV1 and WiredTiger. As a support engineer from Percona, Vinodh Krishnaswamy and Aayushi Mangal will guide us through the main features of these engines, enabling us to choose the right engine according to our needs.
Storage Engine
MongoDB’s storage engine is responsible for managing BSON data on memory and disk, supporting read and write operations. MMAPV1 and WiredTiger are two generations of storage engines that have been used in MongoDB.
MMAPV1
MMAPV1 is the original MongoDB storage engine, introduced in the first version. However, it has been deprecated since version 4.0.
WiredTiger
WiredTiger is a pluggable engine introduced in version 3.0 and became the default storage engine in version 3.2. It enables multi-document transactions and provides compression and file-level locking capabilities.
Data Compression
Data compression is an essential feature in MongoDB’s storage engines. MMAPV1 does not support data compression, relying on memory-mapped files. On the other hand, WiredTiger supports snappy and zlib compression.
MMAPV1
MMAPV1 does not support data compression, which means that when you write a collection, it is kept in memory. This is suitable for scenarios where large-capacity writes, reads, and updates are frequent.
WiredTiger
WiredTiger supports snappy and zlib compression, resulting in smaller storage space occupied by MongoDB compared to MMAP. It also has its own cache and file system cache.
Snappy
Snappy is the default algorithm, providing reasonable compression efficiency.
Zlib
Zlib increases the compression ratio at the expense of CPU usage.
Data Directory
Let’s examine the data directory and its support for file system members. Journaling is a critical aspect of data directory management.
MMAPV1
MMAPV1 ensures atomic writes. If a fault occurs or MongoDB is terminated before submitting changes to data files, MongoDB can use the log file to write to the data file and maintain a consistent state.
WiredTiger
WiredTiger uses checkpoints between writes and logs all data continuously modified between checkpoints. For any recovery from a crash or sudden termination of the database, it uses journal entries from the last checkpoint since.
Log Directory
The log directory plays a crucial role in maintaining data consistency.
MMAPV1
MMAPV1 uses journaling to ensure atomic writes. It logs all changes to the data file, allowing MongoDB to recover from a crash or sudden termination of the database.
WiredTiger
WiredTiger uses checkpoints to ensure data consistency. It logs all changes to the data file and uses journal entries from the last checkpoint to recover from a crash or sudden termination of the database.
Lock and Concurrency
Locking and concurrency are essential features in MongoDB’s storage engines.
MMAPV1
MMAPV1 uses read-write locks, allowing concurrent read access to the database but exclusive write access. It uses collection-level locking in version 3.0 and above.
WiredTiger
WiredTiger supports document-level locking and uses optimistic concurrency control for most read and write operations.
Memory Usage
Memory usage is a critical aspect of MongoDB’s performance.
MMAPV1
MMAPV1 automatically uses all available memory on the computer as a cache. The operating system’s virtual memory subsystem manages memory usage of MongoDB.
WiredTiger
WiredTiger uses WiredTiger’s internal cache and file system cache. The default setting for the internal cache is the larger of 50% of (RAM - 1 GB) or 256 MB.
Summary: MMAPV1 vs WiredTiger
The following table summarizes the differences between MMAPV1 and WiredTiger:
| Feature | MMAPV1 | WiredTiger |
|---|---|---|
| Data Compression | No | Snappy and Zlib |
| Journaling | Atomic writes | Checkpoints |
| Locking | Read-write locks | Document-level locking |
| Memory Usage | Dynamic memory allocation | Internal cache and file system cache |
In conclusion, MMAPV1 and WiredTiger are two generations of storage engines in MongoDB, each with its own strengths and weaknesses. By understanding the differences between these engines, we can choose the right engine according to our needs and optimize our MongoDB deployment for better performance.