Explore Developer Center's New Chatbot! MongoDB AI Chatbot can be accessed at the top of your navigation to answer all your MongoDB questions.

Join us at AWS re:Invent 2024! Learn how to use MongoDB for AI use cases.
MongoDB Developer
MongoDB
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Productschevron-right
MongoDBchevron-right

Massive Number of Collections

Lauren Schaefer, Daniel Coupal6 min read • Published Feb 12, 2022 • Updated Oct 01, 2024
MongoDBSchema
Facebook Icontwitter iconlinkedin icon
Rate this article
star-empty
star-empty
star-empty
star-empty
star-empty
In the first post in this MongoDB Schema Design Anti-Patterns series, we discussed how we should avoid massive arrays when designing our schemas. But what about having a massive number of collections? Turns out, they're not great either. In this post, we'll examine why.
Are you more of a video person? This is for you.

Massive number of collections

Let's begin by discussing why having a massive number of collections is an anti-pattern. If storage is relatively cheap, who cares how many collections you have?
Every collection in MongoDB automatically has an index on the _id field. While the size of this index is pretty small for empty or small collections, thousands of empty or unused indexes can begin to drain resources. Collections will typically have a few more indexes to support efficient queries. All of these indexes add up.
In general, we recommend limiting collections to 10,000 per replica set. When users begin exceeding 10,000 collections, they typically see decreases in performance.
To avoid this anti-pattern, examine your database and remove unnecessary collections. If you find that you have an increasing number of collections, consider remodeling your data so you have a consistent set of collections.

Example

Let's take an example from the greatest TV show ever created: Parks and Recreation. Leslie is passionate about maintaining the parks she oversees, and, at one point, she takes it upon herself to remove the trash in the Pawnee River.
Let's say she wants to keep a minute-by-minute record of the water level and temperature of the Pawnee River, the Eagleton River, and the Wamapoke River, so she can look for trends. She could send her coworker Jerry to put 30 sensors in each river and then begin storing the sensor data in a MongoDB database.
One way to store the data would be to create a new collection every day to store sensor data. Each collection would contain documents that store information about one reading for one sensor.
1// 2020-05-01 collection
2{
3 "_id": ObjectId("5eac643e64faf3ff31d70d35"),
4 "river": "PawneeRiver",
5 "sensor": 1
6 "timestamp": "2020-05-01T00:00:00Z",
7 "water-level": 61.56,
8 "water-temperature": 72.1
9},
10{
11 "_id": ObjectId("5eac643e64faf3ff31d70d36"),
12 "river": "PawneeRiver",
13 "sensor": 2
14 "timestamp": "2020-05-01T00:00:00Z",
15 "water-level": 61.55,
16 "water-temperature": 72.1
17},
18...
19{
20 "_id": ObjectId("5eac643e64faf3ff31d70dfc"),
21 "river": "WamapokeRiver",
22 "sensor": 90
23 "timestamp": "2020-05-01T23:59:00Z",
24 "water-level": 72.03,
25 "water-temperature": 64.1
26}
27
28// 2020-05-02 collection
29{
30 "_id": ObjectId("5eac644c64faf3ff31d90775"),
31 "river": "PawneeRiver",
32 "sensor": 1
33 "timestamp": "2020-05-02T00:00:00Z",
34 "water-level": 63.12,
35 "water-temperature": 72.8
36},
37 {
38 "_id": ObjectId("5eac644c64faf3ff31d90776"),
39 "river": "PawneeRiver",
40 "sensor": 2
41 "timestamp": "2020-05-02T00:00:00Z",
42 "water-level": 63.11,
43 "water-temperature": 72.7
44},
45...
46{
47 "_id": ObjectId("5eac644c64faf3ff31d9079c"),
48 "river": "WamapokeRiver",
49 "sensor": 90
50 "timestamp": "2020-05-02T23:59:00Z",
51 "water-level": 71.58,
52 "water-temperature": 66.2
53}
Let's say that Leslie wants to be able to easily query on the river and sensor fields, so she creates an index on each field.
If Leslie were to store hourly data throughout all of 2019 and create two indexes in each collection (in addition to the default index on _id), her database would have the following stats:
  • Database size: 5.2 GB
  • Index size: 1.07 GB
  • Total Collections: 365
Each day she creates a new collection and two indexes. As Leslie continues to collect data and her number of collections exceeds 10,000, the performance of her database will decline.
Also, when Leslie wants to look for trends across weeks and months, she'll have a difficult time doing so since her data is spread across multiple collections.
Leslie makes a grimace
Let's say Leslie realizes this isn't a great schema, so she decides to restructure her data. This time, she decides to keep all of her data in a single collection. She'll bucket her information, so she stores one hour's worth of information from one sensor in each document.
1// data collection
2{
3 "_id": "PawneeRiver-1-2019-05-01T00:00:00.000Z",
4 "river": "PawneeRiver",
5 "sensor": 1,
6 "readings": [
7 {
8 "timestamp": "2019-05-01T00:00:00.000+00:00",
9 "water-level": 61.56,
10 "water-temperature": 72.1
11 },
12 {
13 "timestamp": "2019-05-01T00:01:00.000+00:00",
14 "water-level": 61.56,
15 "water-temperature": 72.1
16 },
17 ...
18 {
19 "timestamp": "2019-05-01T00:59:00.000+00:00",
20 "water-level": 61.55,
21 "water-temperature": 72.0
22 }
23 ]
24},
25...
26{
27 "_id": "PawneeRiver-1-2019-05-02T00:00:00.000Z",
28 "river": "PawneeRiver",
29 "sensor": 1,
30 "readings": [
31 {
32 "timestamp": "2019-05-02T00:00:00.000+00:00",
33 "water-level": 63.12,
34 "water-temperature": 72.8
35 },
36 {
37 "timestamp": "2019-05-02T00:01:00.000+00:00",
38 "water-level": 63.11,
39 "water-temperature": 72.8
40 },
41 ...
42 {
43 "timestamp": "2019-05-02T00:59:00.000+00:00",
44 "water-level": 63.10,
45 "water-temperature": 72.7
46 }
47 ]
48}
49...
Leslie wants to query on the river and sensor fields, so she creates two new indexes for this collection.
If Leslie were to store hourly data for all of 2019 using this updated schema, her database would have the following stats:
  • Database size: 3.07 GB
  • Index size: 27.45 MB
  • Total Collections: 1
By restructuring her data, she sees a massive reduction in her index size (1.07 GB initially to 27.45 MB!). She now has a single collection with three indexes.
With this new schema, she can more easily look for trends in her data because it's stored in a single collection. Also, she's using the default index on _id to her advantage by storing the hour the water level data was gathered in this field. If she wants to query by hour, she already has an index to allow her to efficiently do so.
Roe canoeing
For more information on modeling time-series data in MongoDB, see Building with Patterns: The Bucket Pattern.

Removing unnecessary collections

In the example above, Leslie was able to remove unnecessary collections by changing how she stored her data.
Sometimes, you won't immediately know what collections are unnecessary, so you'll have to do some investigating yourself. If you find an empty collection, you can drop it. If you find a collection whose size is made up mostly of indexes, you can probably move that data into another collection and drop the original. You might be able to use $merge to move data from one collection to another.
Below are a few ways you can begin your investigation.
Sherlock Traeger

Using MongoDB Atlas

If your database is hosted in Atlas, navigate to the Atlas Data Explorer. The Data Explorer allows you to browse a list of your databases and collections. Additionally, you can get stats on your database including the database size, index size, and number of collections.
The Atlas Data Explorer displays the database size, index size, and total number of collections for the riverstats database.
The Atlas Data Explorer displays the database size, index size, and total number of collections for the riverstats database.
If you are using an M10 cluster or larger on Atlas, you can also use the Real-Time Performance Panel to check if your application is actively using a collection you're considering dropping.

Using MongoDB Compass

Regardless of where your MongoDB database is hosted, you can use MongoDB Compass, MongoDB's desktop GUI. Similar to the Data Explorer, you can browse your databases and collections so you can check for unused collections. You can also get stats at the database and collection levels.
Caption: Compass shows the storage size, number of collections, and number of indexes for databases.
Caption: Compass shows the storage size, number of collections, and number of indexes for databases.
Compass shows the number of documents, average document size, total document size, number of indexes, and total index size for collections in the riverstats database.
Compass shows the number of documents, average document size, total document size, number of indexes, and total index size for collections in the riverstats database.

Using the MongoDB Shell

If you prefer working in a terminal instead of a GUI, connect to your database using the Mongo Shell.
To see a list of collections, run db.getCollectionNames(). Output like the following will be displayed:
1[
2 "2019-01-01",
3 "2019-01-02",
4 "2019-01-03",
5 "2019-01-04",
6 "2019-01-05",
7 ...
8]
To retrieve stats about your database, run db.stats(). Output like the following will be displayed:
1{
2 "db" : "riverstats",
3 "collections" : 365,
4 "views" : 0,
5 "objects" : 47304000,
6 "avgObjSize" : 118,
7 "dataSize" : 5581872000,
8 "storageSize" : 1249677312,
9 "numExtents" : 0,
10 "indexes" : 1095,
11 "indexSize" : 1145790464,
12 "scaleFactor" : 1,
13 "fsUsedSize" : 5312217088,
14 "fsTotalSize" : 10726932480,
15 "ok" : 1,
16 "$clusterTime" : {
17 "clusterTime" : Timestamp(1588795184, 3),
18 "signature" : {
19 "hash" : BinData(0,"orka3bVeAiwlIGdbVoP+Fj6N01s="),
20 "keyId" : NumberLong("6821929184550453250")
21 }
22 },
23 "operationTime" : Timestamp(1588795184, 3)
24}
You can also run db.collection.stats() to see information about a particular collection.

Summary

Be mindful of creating a massive number of collections as each collection likely has a few indexes associated with it. An excessive number of collections and their associated indexes can drain resources and impact your database's performance. In general, try to limit your replica set to 10,000 collections.
Come back soon for the next post in this anti-patterns series!
When you're ready to build a schema in MongoDB, check out MongoDB Atlas, MongoDB's fully managed database as a service. Atlas is the easiest way to get started with MongoDB. With a forever-free tier, you're on your way to realizing the full value of MongoDB.
Check out the following resources for more information:

Facebook Icontwitter iconlinkedin icon
Rate this article
star-empty
star-empty
star-empty
star-empty
star-empty
This is part of a series
MongoDB Schema Design Anti-Patterns
Up Next
Continue

More in this series
Related
Article

How Prisma Introspects a Schema from a MongoDB Database


May 19, 2022 | 8 min read
Article

Java 21: Unlocking the Power of the MongoDB Java Driver With Virtual Threads


Jan 31, 2024 | 2 min read
Article

From Zero to Hero with MrQ


Jun 13, 2023 | 3 min read
Tutorial

Orchestrating MongoDB & BigQuery for ML Excellence with PyMongoArrow and BigQuery Pandas Libraries


Feb 08, 2024 | 4 min read
Table of Contents