kevinadi
(Kevin Adistambha)
9
Hi @Franz_van_Betteraey
I did a similar repro using a similar document sizes, 100,000 of them, using the procedure you described and came across these results:
_id index size vs. 4.2.22
4.2.22 2088960 100.00%
4.4.16 2805760 134.31%
5.0.12 2809856 134.51%
6.0.1 2650112 126.86%
secondary index size vs. 4.2.22
4.2.22 1277952 100.00%
4.4.16 1417216 110.90%
5.0.12 1417216 110.90%
6.0.1 1417216 110.90%
Before taking the size of each collection, I executed db.adminCommand({fsync:1}) to ensure that WiredTiger does a checkpoint. This will make the sizes consistent as written on disk. Without fsync, you might find that the sizes keeps fluctuating before it settles after a minute (WiredTiger does a checkpoint every minute).
In addition to the _id index, I also created a secondary index just to double check.
What I found is that secondary index sizes are quite consistent from 4.4 to 6.0, with 4.2 being the odd one out. With regard to _id, 4.4 to 6.0 are about 130% larger than 4.2.
I believe what you’re seeing was caused by the new-ish (from MongoDB 5.0) WiredTiger feature of Snapshot History Retention. The introduction of this feature changes a lot of WiredTiger internals, and this is one of the side effect of the change. To be complete, this issue was known, and was mentioned in SERVER-47652, WT-6082, and WT-6251.
Hope this explains what you’re seeing here 
Best regards
Kevin
5 Likes