2 / 6
Aug 2024

When I check the slow queries in my database, I find the following aggregate query being executed on multiple collections:

[ { "$addFields": { "temporary_doc_size_field": { "$bsonSize": "$$ROOT" } } }, { "$match": { "temporary_doc_size_field": { "$lt": 16793600 } } }, { "$project": { "temporary_doc_size_field": 0 } }, { "$sample": { "size": 1000 } } ]

It performs a collection scan and reads a lot into the disk. I checked my application and couldn’t find this query in the source code.

So, is this aggregate query automatically executed by MongoDB, and what is its purpose? Is there any way to limit it because it is quite costly in terms of disk I/O?

Where and how are you running mongod?

This is a really bad query, and I am 99.999999% sure that MongoDB does not execute that automatically. Especially, if run on multiple collections.

I think, the value 16_793_600 looks pretty close to the 16MB size limit of BSON so every documents will match.

How is your system secured?

May be you have some users, analyzing the schema with Compass. The presence of $sample:size:1000 might seems to match Compass remark that is printed when analyzing the schema.

This report is based on a sample of 1000 documents.

Are the queries spaced evenly over time? In principal an automated process would run the queries at a regular interval. A human will have more random interval.