MongoDB's Performance over RDBMS
Rate this article
Someone somewhere might be wondering why we get superior performance with MongoDB over RDBMS databases. What is the secret behind it? I too had this question until I learned about the internal workings of MongoDB, especially data modeling, advanced index methods, and finally, how the WiredTiger storage engine works.
I wanted to share my learnings and experiences to reveal the secret of it so that it might be helpful to you, too.
MongoDB uses a document-oriented data model, storing data in JSON-like BSON documents. This allows for efficient storage and retrieval of complex data structures.
MongoDB's model can lead to simpler and more performant queries compared to the normalization requirements of RDBMS.
The initial phase of enhancing performance involves comprehending the query behaviors of your application. This understanding enables you to tailor your data model and choose suitable indexes to align with these patterns effectively.
Always remember MongoDB's optimized document size (which is 16 MB) so you can avoid embedding images, audio, and video files in the same collection, as depicted in the image below.
Customizing your data model to match the query patterns of your application leads to streamlined queries, heightened throughput for insert and update operations, and better workload distribution across a sharded cluster.
While MongoDB offers a flexible schema, overlooking schema design is not advisable. Although you can adjust your schema as needed, adhering to schema design best practices from the outset of your project can prevent the need for extensive refactoring down the line.
A major advantage of BSON documents is that you have the flexibility to model your data any way your application needs. The inclusion of arrays and subdocuments within documents provides significant versatility in modeling intricate data relationships. But you can also model flat, tabular, and columnar structures, simple key-value pairs, text, geospatial and time-series data, or the nodes and edges of connected graph data structures. The ideal schema design for your application will depend on its specific query patterns.
An example of a best practice for an address/contact book involves separating groups and portraits information in a different collection because as they can go big due to n-n relations and image size, respectively. They may hit a 16 MB optimized document size.
Embedding data in a single collection in MongoDB (or minimizing the number of collections, at least) versus storing in multiple tables in RDBMS offers huge performance improvements due to the data locality which will reduce the data seeks, as shown in the picture below.
Data locality is the major reason why MongoDB data seeks are faster.
Difference: tabular vs document
Tabular | MongoDB | |
---|---|---|
Steps to create the model | 1 - define schema. 2 - develop app and queries | 1 - identifying the queries 2- define schema |
Initial schema | 3rd normal form. One possible solution | Many possible solutions |
Final schema | Likely denormalized | Few changes |
Schema evolution | Difficult and not optimal. Likely downtime | Easy. No downtime |
Performance | Mediocre | Optimized |
WiredTiger is an open-source, high-performance storage engine for MongoDB. WiredTiger provides features such as document-level concurrency control, compression, and support for both in-memory and on-disk storage.
Cache:
WiredTiger cache architecture: WiredTiger utilizes a sophisticated caching mechanism to efficiently manage data in memory. The cache is used to store frequently accessed data, reducing the need to read from disk and improving overall performance.
Memory management: The cache dynamically manages memory usage based on the workload. It employs techniques such as eviction (removing less frequently used data from the cache) and promotion (moving frequently used data to the cache) to optimize memory utilization.
Configuration: WiredTiger allows users to configure the size of the cache based on their system's available memory and workload characteristics. Properly sizing the cache is crucial for achieving optimal performance.
Durability: WiredTiger ensures durability by flushing modified data from the cache to disk. This process helps maintain data consistency in case of a system failure.
Compression:
Data compression: WiredTiger supports data compression to reduce the amount of storage space required. Compressing data can lead to significant disk space savings and improved I/O performance.
Configurable compression: Users can configure compression options based on their requirements. WiredTiger supports different compression algorithms, allowing users to choose the one that best suits their workload and performance goals.
Trade-offs: While compression reduces storage costs and can improve read/write performance, it may introduce additional CPU overhead during compression and decompression processes. Users need to carefully consider the trade-offs and select compression settings that align with their application's needs.
Compatibility: WiredTiger's compression features are transparent to applications and don't require any changes to the application code. The engine handles compression and decompression internally.
Overall, WiredTiger's cache and compression features contribute to its efficiency and performance characteristics. By optimizing memory usage and providing configurable compression options, WiredTiger aims to meet the diverse needs of MongoDB users in terms of both speed and storage efficiency.
Few RDBMS systems also employ caching, but the performance benefits may vary based on the database system and configuration.
MongoDB, being a NoSQL database, offers advanced indexing capabilities to optimize query performance and support efficient data retrieval. Here are some of MongoDB's advanced indexing features:
Compound indexes
MongoDB allows you to create compound indexes on multiple fields. A compound index is an index on multiple fields in a specific order. This can be useful for queries that involve multiple criteria.
The order of fields in a compound index is crucial. MongoDB can use the index efficiently for queries that match the index fields from left to right.
Multikey indexes
MongoDB supports indexing on arrays. When you index an array field, MongoDB creates separate index entries for each element of the array.
Multikey indexes are helpful when working with documents that contain arrays, and you need to query based on elements within those arrays.
Text indexes
MongoDB provides text indexes to support full-text search. Text indexes tokenize and stem words, allowing for more flexible and language-aware text searches.
Text indexes are suitable for scenarios where users need to perform text search operations on large amounts of textual data.
Geospatial indexes
MongoDB supports geospatial indexes to optimize queries that involve geospatial data. These indexes can efficiently handle queries related to location-based information.
Geospatial indexes support 2D and 3D indexing, allowing for the representation of both flat and spherical geometries.
Wildcard indexes
MongoDB supports wildcard indexes, enabling you to create indexes that cover only a subset of fields in a document. This can be useful when you have specific query patterns and want to optimize for those patterns without indexing every field.
Partial indexes
Partial indexes allow you to index only the documents that satisfy a specified filter expression. This can be beneficial when you have a large collection but want to create an index for a subset of documents that meet specific criteria.
Hashed indexes
Hashed indexes are useful for sharding scenarios. MongoDB automatically hashes the indexed field's values and distributes the data across the shards, providing a more even distribution of data and queries.
TTL (time-to-live) indexes
TTL indexes allow you to automatically expire documents from a collection after a certain amount of time. This is helpful for managing data that has a natural expiration, such as session information or log entries.
These advanced indexing capabilities in MongoDB provide developers with powerful tools to optimize query performance for a wide range of scenarios and data structures. Properly leveraging these features can significantly enhance the efficiency and responsiveness of MongoDB databases.
In conclusion, the superior performance of MongoDB over traditional RDBMS databases stems from its adept handling of data modeling, advanced indexing methods, and the efficiency of the WiredTiger storage engine. By tailoring your data model to match application query patterns, leveraging MongoDB's optimized document structure, and harnessing advanced indexing capabilities, you can achieve enhanced throughput and more effective workload distribution.
Remember, while MongoDB offers flexibility in schema design, it's crucial not to overlook the importance of schema design best practices from the outset of your project. This proactive approach can save you from potential refactoring efforts down the line.
For further exploration and discussion on MongoDB and database optimization strategies, consider joining our Developer Community. There, you can engage with fellow developers, share insights, and stay updated on the latest developments in database technology.
Keep optimizing and innovating with MongoDB to unlock the full potential of your applications.
Top Comments in Forums
There are no comments on this article yet.