How to optimize billion time series data collection

WONG_TUNG_TUNG · 2024-04-16T10:58:22.773Z

I have a time series collection that contains nearly one billion data.

I simply running:

db.cllection.find({'metaData.eventId': "FB-2348", 'metaData.marketName': "Match Odds", 'metaData.selectionId': 0})

The query cost more than 2mins to finish

"executionStats":{
               "executionSuccess":true,
               "nReturned":10,
               "executionTimeMillis":207968,
               "totalKeysExamined":0,
               "totalDocsExamined":5032348,
               "executionStages":{
                  "stage":"COLLSCAN",
                  "filter":{
                     "$and":[
                        {
                           "meta.eventId":{
                              "$eq":"FB-2348"
                           }
                        },
                        {
                           "meta.marketName":{
                              "$eq":"Match Odds"
                           }
                        },
                        {
                           "meta.selectionId":{
                              "$eq":0
                           }
                        }
                     ]
                  },
                  "nReturned":10,
                  "executionTimeMillisEstimate":200392,
                  "works":5032349,
                  "advanced":10,
                  "needTime":5032338,
                  "needYield":0,
                  "saveState":13268,
                  "restoreState":13268,
                  "isEOF":1,
                  "direction":"forward",
                  "docsExamined":5032348
               },

I have billion data looks like this:

{
ts: 2024-04-05T07:50:53.111+00:00
metaData: {
    betType: "for,ahover,18",
    bookie: "pin88",
    eventId:"2024-04-07,1514,1154",
    inRunning:false,
    sport:"FB"
},
min:null,
price:5.03,
max:118.6643
}

Is there anything wrong with my query? Or is there anything I can optimize in my collection/schema design/data configuration? Thank you.

Joao_Pedro_Jesus_da_Silva · 2024-05-02T13:57:34.780Z

Hi @WONG_TUNG_TUNG

You are querying a timeseries collection without a time criterion, this is one of the main problems, this is related to the way mongodb stores data from this type of collection.
In general, any query on a timeseries dataset will always be based on a time criterion combined with others.
If your queries are not directly related to a time period, you should not use timeseries collections or timeseries database.

Best regards,

Topic	Replies	Views	Activity
What is columnar index? Working with Data aggregation queries indexes	4	895	Jul 2024
What is the performance comparison between Queryable encryption and CSFLE? if I’m looking to encrypt strings of size in KBs stored in mongo db cluster? Working with Data field-encryption	1	307	Aug 2024
Operator $in with array field Working with Data aggregation	1	161	Aug 2024
Best Schema design for graphing s4p touchstone files Working with Data data-modeling	0	136	Aug 2024
Find multiple items in collection by passing an array of id’s Working with Data aggregation	2	1.2k	Nov 2024

How to optimize billion time series data collection

New & Unread Topics

Want to read more? Browse other topics in Working with Data or view latest topics.