Aggregating 50M of records

aditya_rai1 · 2024-06-14T13:06:20.420Z

i have a timeseries collection whose size is expected to have 50M records, right now i’m doing a poc with 3M records in which i aggregate just using group and it takes 20 sec, how can i make it fast? Note: i have a index on source field but mongo ends up doing collscan.

Below query:

[
  {
    $group: {
      _id: "$source",
      sum: {
        $sum: 1
      }
    }
  }
]

Ayush_Tiwari2 · 2024-11-11T03:13:57.188Z

same problem I am also getting, for me it is taking around 10 sec for 6 million records. Just a normal project and group. my actual logic is taking around a minute.
I wanted to use this for analytics but the performance is not looking great. Did you find any solution.

steevej · 2024-11-12T22:52:59.888Z

Try some of the things that were share in

Aggregating 50M of records MongoDB for Academia

I forced engine to use the index by using hint but still wasn’t able to optimize the group by. Even the 2nd hack is not working atleast not in my case. But anyways my use case is satisfied by approach 2. Thanks for the ideas, steeve.

steevej · 2024-11-20T02:06:32.467Z

@Ayush_Tiwari2, can you provide on any findings you got while try the proposed alternative.

Please provide some feedback.

Ayush_Tiwari2 · 2024-11-20T04:42:05.735Z

will try this and share the findings