Spark settings:
{ 'pipeline': [ { '$match': { 'date': 202311,
'day': 30,
'hour': 11,
'array.field': 'data'}},
{ '$project': { 'array.field': 0,
'array.field.headers': 0,
'array.field': 0}}],
'spark.mongodb.input.batchSize': '1000',
'spark.mongodb.input.localThreshold': '15',
'spark.mongodb.input.readPreference.name': 'secondary',
'spark.mongodb.input.registerSQLHelperFunctions': False,
'spark.mongodb.input.sampleSize': '1000',
This is in the spark settings but when the query goes through and i check it when it becomes a slow query it includes the:
"command": {
"aggregate": "collection",
"pipeline": [
{
"$match": {
"_id": {
"$lt": "23487fhisjdkcn"
}
}
},
{
"$match": {
"date": 202311,
"day": 30,
"hour": 11,
"array.field": "field"
}
},
{
"$project": {
"array.field": 0,
"array.field.headers": 0,
"array.field": 0
}
}
],
"cursor": {
"batchSize": 1000
},