Hi,
We are using mongodb v7.0 with feature set 6.0 enabled.

Version: 6.0.5
meta field: userid
time field: event time
granularity: hours
collection size: 40billion events / 3TB compressed
indexes: userid & userid_eventtime

We have already migrated around 40billion events to timeseries data model and the writes and reads were fine. As we progress thru 3 weeks, we started writes and reads gettting slow. After debugging we found that the number of buckets got higher for metaField (user id) and if I read the documents of that user key, delete and re-insert, the performace increased more than 10x.

This could be a crude way of merging the buckets and there should be some function to do it.

Also why hudreds or thousands of buckets get created if I expect only one bucket for a month for a metaField (userid)

Below numbers for a sample user:

db.system.buckets.userEvents.find({“meta”:380580264}).count()
3120
db.userEvents.find({“uid”:380580264}).count()
20539

After deleting and re-inserting the user documents:

db.system.buckets.userEvents.find({“meta”:380580264}).count()
370
db.userEvents.find({“uid”:380580264}).count()
20539

Please help with

  • How can i make sure the events are merged to the respective bucket and not new buckets getting created frequently
  • How to merge the existing several hundreds of buckets

Let me know if you need any further info.

Thanks.

Hii Dhanasekaran_S
same problem I am also getting, for me it is taking around 10 sec for 6 million records. Just a normal project and group. my actual logic is taking around a minute.
I wanted to use this for analytics but the performance is not looking great. Did you find any solution. also what time it is taking for yours as you have 40 billion records

{
    $project: {
      _id: 0,
      timestamp: 1,
      project: 1,
      category: 1,
      home: 1,
      productAndService: 1,
      type: 1,
      amount: { $round: ["$dailyAmount", 2] },
      orginalAmount: "$amount",
      startInterval: 1,
      endInterval: 1,
      metadata: {
        organizationId: "$organizationId",
      }
    },
  },
  {
    $out: {
      db: "analytics",
      coll: "revenue_timeseries",
      timeseries: {
        timeField: "timestamp",
        metaField: "metadata",
        granularity:"hours"
      },
    },
  },

basically I inserting like this in my time series db . now the

 pipeline.push({
        $match: {
            "metadata.organizationId": mongoose.Types.ObjectId("620cb30720ec482d0c5191e7"),
            timestamp: {
                $gte: new Date(startDate),
                $lte: new Date(endDate),
            },
            "metadata.productAndService": {
                $nin: depositIds,
            },
            ...(projectObjectIds.length && {
                "project": { $in: projectObjectIds },
            }),
            ...(categoryObjectIds.length && {
                "category": { $in: categoryObjectIds },
            }),
        },
    });

    pipeline.push({
        $project: {
            yearMonth: {
                $dateToString: { format: "%Y-%m", date: "$timestamp" }
            },
            amount: 1,
            isRentRevenue: { $in: ["$metadata.productAndService", rentIds] }
        }
    });
    

    pipeline.push({
        $group: {
            _id: "$yearMonth",
            totalAmount: { $sum: "$amount" },
            rentRevenue: {
                $sum: {
                    $cond: ["$isRentRevenue", "$amount", 0],
                },
            },
            othersRevenue: {
                $sum: {
                    $cond: ["$isRentRevenue", 0, "$amount"],
                },
            },
        },
    });

    pipeline.push({
        $sort: {
            _id: 1,
        },
    })`

for this it is taking around 1 minute.
can you help me with this. what I am doing wrong