Index on timestamp fields

Moni_Hazarika · 2024-05-29T08:25:28.027Z

we have a query like below. the collection has index on start field which is datetime field(start :2024-04-28T23:24:35.088+00:00)
However as we can see from below looks like index scan is not working as expected and these queries take a long time like 10-13mins to run. what improvements can be made?
{
“type”: “command”,
“command”: {
“aggregate”: “collectionName”,
“pipeline”: [
{
“$match”: {
“start”: {
“$gte”: {
“$date”: “2024-05-22T16:25:52.029Z”
}
}
}
},
{
“$group”: {
“_id”: {
“date”: {
“$dateToString”: {
“date”: “$start”,
“format”: “%Y-%m-%dT%H:00:00.000Z”
}
},
“Id”: “$Id”
},
“count”: {
“$sum”: 1
}
}
}
]
“planSummary”: “IXSCAN { start: -1 }”,
“cursorid”: 3388478945949195000,
“keysExamined”: 1175129,
“docsExamined”: 1175129,
“numYields”: 69312,
“nreturned”: 101,
“reslen”: 10976

This isn’t a one off case but thats the generic pattern we are seeing for all such queries. We have M200 cluster.

steevej · 2024-05-29T12:19:51.239Z

Please read Formatting code and log snippets in posts and update your post.

Moni_Hazarika · 2024-05-29T15:23:18.722Z

I am not able to edit the post to format the code part nor delete to create new point

Moni_Hazarika · 2024-06-09T02:57:47.777Z

Need some suggestions here. Having indexes on the match condition fields still end up as slow query. The collection has index on start field which is datetime field (i.e. start :2024-04-28T23:24:35.088+00:00)
However as we can see from below looks like index scan is not working as expected and these queries take a long time like 10-13mins to run. What improvements can be made?
Will changing the $dateToString to something else but achieving same results help with this query performance? Based on the data volume we have seen this query running for hours and had to be killed.

“type”: “command”,
“command”: {
“aggregate”: “collectionName”,
“pipeline”: [
{
“$match”: {
“start”: {
“$gte”: {
“$date”: “2024-05-22T16:25:52.029Z”
}
}
}
},
{
“$group”: {
“_id”: {
“date”: {
“$dateToString”: {
“date”: “$start”,
“format”: “%Y-%m-%dT%H:00:00.000Z”
}
},
“Id”: “$Id”
},
“count”: {
“$sum”: 1
}
}
}
]
“planSummary”: “IXSCAN { start: -1 }”,
“cursorid”: 3388478945949195000,
“keysExamined”: 1175129,
“docsExamined”: 1175129,
“numYields”: 69312,
“nreturned”: 101,
“reslen”: 10976

This isn’t a one off case but thats the generic pattern we are seeing for all such queries. We have M200 cluster.

steevej · 2024-06-12T12:25:16.708Z

Please share the explain plan of your slow queries.

Please share the exact indexes with getIndexes().

Please share sample documents that match your slow queries.

Please share size information about your collections and databases.

Code that is not indented is hard to read. Time is scarce and most will not take the time to indent someone else code. I won’t.

The are other date functions such as $dateToParts or $dateTrunc that may be more efficient than $dateToString that must interpret a format string for each document.

I do not understand the purpose of

Moni_Hazarika:

“Id”: “$Id”

since you are counting. I suspect it is useless and potentially detrimental since the index does not include that field.