Mysterious loss of data, in a very strange manner.

Hi everyone. I am currently facing an extremely weird behaviour from one of our test MongoDB cluster.
We are currently running some Glue-based data migration pipelines, mapping data from a bunch of CSVs into our MongoDB. Everything seems fine, except for a very strange Int32-type field of one of the collection. The field at first, right after insertion is populated with the correct data from the CSVs. But after one full table read, of any kind (normal query, read from Spark connector, dump collection to CSV,… etc…) all of the values in said field is turned into 0, every single one of them.
We are, dumbfounded at first, checked the input CSVs, checked the pipelines, output that field during mapping Glue jobs runs, aggregate the field during the mapping jobs runs, … none gives us any clue of how this is happening.
Im writing this in request of the community for this strange problem that we are having, looking for people who has experienced the same thing and just about any hint on what could be the root cause for this.

Every single value and for only this field.

This looks like a typo error in the name of the field. Upper/lower case issues are the hardest to spot.

@Quan_Le_Anh this appears to be the same question as https://www.reddit.com/r/mongodb/comments/1h2p3px/mysterious_loss_of_data_in_a_very_strange_manner/.

The OP response there to some of the threads also makes it sound like this is solved now:

The oplog was mad helpful, I inserted into a new field (no, renamed the old field) and turns out one of the K8S Pod (we have a cloned app cluster, but essentially abandoned till the migration process is done) is actively re-inserting the 0 value because it is a cyclic field (renewed every week, the data is 2 months old, there is another field for cond check, the checking interval is 5 seconds). Shit was crazy haunted till the DevOps guy comes by and said “yeah we cloned the whole thing even the K8S cron Pod” so while Atlas is not showing any native Mongo crons we saw 2 inserts in the oplog.rs . This stupif “bug” took us 4 days to come by. I guess I should be more specific when telling the Ops guy to provide an env thats “as close as possible to prod” lmao.

If you’re copy/pasting questions between sites like Reddit, SO or elsewhere it wouldn’t hurt to link those together to prevent duplication of effort and to promote discoverability :wink:

2 Likes

Sorry for this rant in advanced. It’s kind of sad but I don’t think he will ever come back here (well, not until he has another “strange” issue). I am in the same country with him, so I know the culture over here. Most of the time they spam threads all over the place to find an answer for their specific problem. They’re pushy, demanding, but stop caring at all whenever they found their solution. They don’t think it’s their responsibility to close what they’ve opened. So that’s that.

This is also why I’m very glad that you spend your valuable time to help OP with his problem. Even if the OP does not confirm it’s solved. I think you can close the thread and mark your answer as solution, so other community members know this is done.

1 Like

Thanks for the feedback Billy, it’s very much appreciated :slight_smile:

2 Likes

We eventually figure out the style of those who

and we stop caring too and leave their OP unanswered.

Thanks Billy_Bui for opening the door to some ranting.

;-)
1 Like