Taking too much time to get the countDocument with filter

Pawan_Sonole · 2024-07-23T12:16:21.210Z

Hi Community,

I am using db.collection.countDocument({filter1 : val, filter2 : val2}) it is taking 5 mins to return the result, total records are 206 m after filter it is 80 m
I have created index as well but still no luck also the filter condition will change as per the user inputs.
Please suggest the faster approach to get the count and this will be used in Java spring boot.

TIA

Stanimira_Vlaeva · 2024-07-23T16:12:42.940Z

Hi Pawan! Thanks for your question!

What index have you created to cover this query? How can the filter condition change depending on the user input?

Pawan_Sonole · 2024-07-24T09:02:35.490Z

I have created regular index on filter1 and another regular index on filter 2
depends on the user its a search screen multiple search box are present on UI application as per user requirement they will enter the search value and hit the search button

chris · 2024-07-25T22:17:56.766Z

The best index for this specific query is the compound index {filter:1,filter:2}, both filter1 and filter2 would have to be matched on, and in that order for the index to be used.

In the sample here there are 3 indexes plus the _id index to begin with. The index I mention above is hidden:

 db.foo.getIndexes()
[
  { v: 2, key: { _id: 1 }, name: '_id_' },
  { v: 2, key: { f1: 1 }, name: 'f1_1' },
  { v: 2, key: { f2: 1 }, name: 'f2_1' },
  { v: 2, key: { f1: 1, f2: 1 }, name: 'f1_1_f2_1', hidden: true }
]

Here are the query times:

test> s=performance.now(); print(db.foo.countDocuments({ f1: 2, f2: 1 })); print(`Duration: ${performance.now() - s}`)
2406166
Duration: 21231.54756000638

test> s=performance.now(); print(db.foo.countDocuments({ f1: 2, f2: 1 })); print(`Duration: ${performance.now() - s}`)
2406166
Duration: 14208.398685008287

test> s=performance.now(); print(db.foo.countDocuments({ f1: 2, f2: 1 })); print(`Duration: ${performance.now() - s}`)
2406166
Duration: 13774.462996006012

Pretty slow. Now using the compound index:

test> db.foo.unhideIndex({f1:1,f2:1})
{ hidden_old: true, hidden_new: false, ok: 1 }

test> s=performance.now(); print(db.foo.countDocuments({ f1: 2, f2: 1 })); print(`Duration: ${performance.now() - s}`)
2406166
Duration: 1593.1604879945517

test> s=performance.now(); print(db.foo.countDocuments({ f1: 2, f2: 1 })); print(`Duration: ${performance.now() - s}`)
2406166
Duration: 1393.1082039922476

test> s=performance.now(); print(db.foo.countDocuments({ f1: 2, f2: 1 })); print(`Duration: ${performance.now() - s}`)
2406166
Duration: 1400.4836210012436

This is a big improvement. This is limited in that the index will support queries on f1 alone and f1,f2 combined, but not queries on f2 alone.

It is possible the the Attribute Pattern may suit this use case, updating the schema to support this could be beneficial. Have a read and see if it fits.

Stanimira_Vlaeva · 2024-07-27T14:57:15.306Z

I agree with Chris. Alternatively, you can create two indexes: one compound index on (f1, f2) and a single index on f2. This approach will handle all your queries involving f1, f2, and f1+f2. I don’t think the attribute pattern is necessary for just two fields.

Pawan_Sonole · 2024-07-31T09:57:09.299Z

Thank you Chris and Stanimira

I tried with composite index but not much improvement in performance, we are planning to upgrade the server with more memory and space.