A key aspect of database administration and capacity planning for your application is monitoring your cluster's health and performance. Although MongoDB Atlas, which is a Database-as-a-Service cross-cloud platform, handles a vast majority of administration efforts and has built-in fault tolerance/scaling ability, it is crucial that users know how to best monitor their clusters and tweak or scale whatever they need before hitting a crisis.
Monitoring MongoDB databases allows you to:
There are four main areas which we should keep in mind when monitoring MongoDB:
Each area can be covered by a few utilities and commands:
The status of a MongoDB server process can be an immediate indication of whether we need to drill down into its activity or health. A process that is unresponsive or does not answer to our commands should be immediately investigated.
rs.status()
for replica sets and sh.status()
for sharded clusters provide a high level status of the cluster. Use MongoDB’s built-in free monitoring feature to get information on Operation Execution Times, Memory Usage, CPU Usage, and Operation Counts.When our application is struggling or underperforming, we need to rule out the database layer as the bottleneck. Since the application issues connections and operations against the database, we should pay close attention to their behavior.
MongoDB provides various metrics and mechanisms to identify its connections and operations patterns. On top of the active and proactive monitoring tools, Atlas provides a full alerting system and log gathering is available.
How to monitor with MongoDB Atlas: Atlas provides various built-in features like Performance Advisor, Real-Time Performance Panel, and Query Profiler to track operations and highlight slow/heavy spotted operations. Additionally, the Metrics tab provides many graphs that plot operations and number of connections. I will highlight and elaborate on the important ones further in this article:
How to monitor with self-managed MongoDB instances:
The hardware metrics are important to track. They can be used to identify which resources could be the root cause for performance issues or need tuning and capacity re-planning.
How to monitor with MongoDB Atlas:
The Atlas metrics tab within a cluster provides plotted graphs for the hardware metrics. These allow you to correlate them with other database metrics. I will highlight and elaborate on the important ones further in this article:
How to monitor with General MongoDB instances:
Replication is a key aspect of MongoDB clusters' high availability and durability. The health and performance of replication needs to be carefully monitored in order to maintain a healthy cluster.
How to monitor with MongoDB Atlas: The Atlas metrics tab within a cluster provides plotted graphs for the replication metrics. Replication metrics allow you to correlate them with other database metrics. I will highlight and elaborate on the important ones further in this article:
How to monitor with General MongoDB instances: Use the usage of the following MongoDB Commands:
While monitoring MongoDB metrics, you should look out for the following.
Let’s cover the main metrics for operations and connection monitoring.
The average rate of operations performed per second over the selected sample period. Opcounters graph/metric shows the operations velocity and breakdown of operation types for the instance.
This is the average operation time (read and write operations) performed over the selected sample period.
Query Executors represent the average rate per second over the selected sample period of scanned documents during queries and query-plan evaluation. The query targeting represents the ratio between the number of documents scanned and the number of documents returned. A high number ratio may indicate suboptimal operations which scan a lot of documents to return a smaller portion.
This describes the number of open connections to the instance. High numbers or spikes might indicate a suboptimal connection strategy from the client side or unresponsive server.
Queues describe the number of operations waiting for a lock, either read or write. High queues may indicate the existence of conflicting writing paths or suboptimal schema design, which force high competition over database resources.
This refers to the average rate per second over the selected sample period of queries that return sorted results that cannot perform the sort operation using an index.
Let’s cover the main metrics for hardware monitoring.
This is the percentage of time the CPU spent on system calls servicing this MongoDB process, scaled to a range of 0-100% by dividing by the number of CPU cores. It covers CPU used by modules such as user
, kernel
, iowait
, steal
, etc. High kernel or user CPU might indicate an exhaustion of CPU by the MongoDB operations (software) while high iowait will most likely be related to storage exhaustion being the root cause for CPU exhaustion.
This is the percentage of time the CPU spent on application software (MongoDB code) servicing this MongoDB process, scaled to a range of 0-100% by dividing by the number of CPU cores. It covers CPU used by modules such as user
, kernel
. High kernel CPU might indicate an exhaustion of CPU by the Operating System operations while high user
will most likely be related to MongoDB operations being the root cause for CPU exhaustion.
Disk latency is the read and write disk latency in milliseconds of the disk partition used by MongoDB. High values (>500ms) mean MongoDB might be impacted by the storage layer.
This is the average consumed IO operations per second on the disk partition used for MongoDB.
This refers to the total bytes of free disk space on the disk partition used by MongoDB. Atlas offers a disk auto scaling capabilities which are based on this metric.
The system memory describes the used number of physical memory bytes in use versus the number of free unused available space. The available metric estimates the number of bytes of system memory available for running new applications, without swapping.
A Swap Usage graph describes how much memory is being placed on the swap device. A high used metric in this graph indicates that swap is being used which directly indicates that the memory is under-provisioned for the current workload.
Let’s cover the main metrics for replication monitoring.
Replication lag is the approximate number of seconds a secondary node is behind the primary in write operations. High replication lag will indicate a secondary that struggles to replicate and might influence your operations' latency considering the write/read concern of the connections.
This is the approximate number of hours available in the primary's replication oplog. If a secondary is lagging more than this amount, it cannot catch up and will require a full resync.
Replication headroom is the difference between the primary's replication oplog window and the secondary's replication lag. A secondary can go into RECOVERING if this value goes to zero.
This refers to the average rate of gigabytes of oplog the primary generates per hour. High unexpected volumes of oplog might indicate a schema design issue or highly insufficient write workload.
This refers to the average rate of replication operations performed per second over the selected sample period. Opcounters - repl graph/metric shows the operations velocity and breakdown of operation types for the instance.
MongoDB provides built-in UI tools in Atlas as well as Cloud Manager and Ops Manager to help you monitor performance. MongoDB also offers some standalone tools and commands to look at more raw-based data.
Let’s understand the tools we have. These are the tools you can run from a host which has access and appropriate roles (clusterMonitor) to monitor your environment.
mongostat
is used to get a quick overview of the status of your MongoDB server instance. It’s best used for watching a single instance for a specific event as it provides a real-time view. You can use this command to monitor basic server statistics such as operation breakdown, MongoDB memory statistics, lock queues, and connections/network.
You can execute the MongoDB command through the following syntax:
mongostat <options> <connection-string> <polling interval in seconds>
See example output here.
mongotop tracks the amount of time a MongoDB instance spends reading and writing data per collection.
You can execute the MongoDB command through the following syntax:
mongotop <options> <connection-string> <polling interval in seconds>
See example output here.
rs.status() returns the replica set status. It is done from the point of view of the member where the method is run.
See example output here.
When you want to have an overview of the database’s state, you use the db.serverStatus()
command. It provides you with a document representing the current instance metrics counters. Run this command at a regular interval to collect statistics about the instance.
See example output here.
The dbStats command returns the storage statistics, such as the total collection data versus storage size, number of indexes and their size, and collection-related statistics (number of documents and collections), for a certain database.
See example output here.
The collStats command is used to collect statistics similar to that provided by dbStats on the collection level. Its output includes a count of the objects in the collection, the collection’s size, the amount of disk space consumed by the collection, and information concerning its indexes for a given collection.
See example output here.
Hence, we can monitor MongoDB databases by using different tools like mongostat
, mongotop
, dbStats
, collStats
, and serverStatus
commands. These commands provide real-time monitoring and reporting of the database server that allows us to monitor errors and database performance and assist in informed decision making to optimize a database.
MongoDB provides a variety of metrics and tools to monitor your database and ensure it's running at optimal performance. From UI tools to advisors to raw-data metrics, you're covered whether you're hosting your database yourself or using MongoDB Atlas.
For more information on monitoring MongoDB databases, see the following resources.
References: