Sensors check-in periodically, but network connectivity issues may cause them to check-in with the same data more than once.
MongoDB does not allow the unique property on secondary indexes for timeseries collections (MongoDB 5.0). Timeseries Limitations
In addition, calculations need to be done on the data (preferably using aggregations) that involve counting the number of entries, which will be inaccurate if there are duplicates. Not to mention it bloats the database and is just messy.
Question
Is there any way to prevent duplicate entries in a MongoDB Timeseries collection?
// Usual inserted document that produces duplicates
insert = { "sensor":1 ,
"timestamp":2 ,
"mesure":"pressure" ,
"value":10 }
// Modified into a query that would define you uniqueness
query = { "sensor" : 1 , "timestamp" : 2 , "mesure" : "pressure" }
// With the value
value = { "value":10 }
// Also replace the following insert that produces a new duplicate
// every time it is called
c.insert( insert ) ;
// into an update with upsert:true that will insert only once
// no matter how often it is called.
c.update( query , value , { upsert:true } )
I am not yet familiar with the new time series collections. I do not know if it is applicable or not. But I feel it could.
It appears that the limitations imposed on time-series collections are temporary because in the upcoming version MongoDB 5.1, we can perform some delete and update operations. I’m guessing time-series might support upsert and the unique property on secondary indexes in future releases.
I came up with my own method to ensure data is never retrieved from an API and accidentally inserted twice, which was my original problem.
I’m still learning the aggregation pipeline, but @sergeimeza way of creating a query that ignores duplicates is probably the best solution for now.