Hi @chris
I see, so since the driver that currently connects to MongoDB 4.4 is in the category that it would connect but not support all features, there is an instance where data extracts are happening through aggregates sent from Spark.
Version: Scala/2.11.12:Spark/2.4.8.7.1.9.0-387
and what happens is,
{
"$match": {
"_id": {
"$lt": "65b09952fc92b6115f6b07b4"
}
}
},
{
"$match": {
"date_index": 202401,
"day_index": 24,
"hour_index": 7,
"field.field": "value"
}
},
{
"$project": {
"field.field": 0
}
}
Although the only pipeline made in Spark by us is:
"$match": {
"date_index": 202401,
"day_index": 24,
"hour_index": 7,
"field.field": "value"
}
},
{
"$project": {
"field.field": 0
}
Therefore I’m not sure why the
{ "$match": { "_id": { "$lt": "65b09952fc92b6115f6b07b4" } } },
is being created or populated, which makes the aggregate go on for over an hour sometimes…
Thanks for the link resource, I will look into it.