Hi @Franz_van_Betteraey

I did a similar repro using a similar document sizes, 100,000 of them, using the procedure you described and came across these results:

	_id index size vs. 4.2.22	
4.2.22	2088960	100.00%
4.4.16	2805760	134.31%
5.0.12	2809856	134.51%
6.0.1	2650112	126.86%
		
	secondary index size vs. 4.2.22	
4.2.22	1277952	100.00%
4.4.16	1417216	110.90%
5.0.12	1417216	110.90%
6.0.1	1417216	110.90%

Before taking the size of each collection, I executed db.adminCommand({fsync:1}) to ensure that WiredTiger does a checkpoint. This will make the sizes consistent as written on disk. Without fsync, you might find that the sizes keeps fluctuating before it settles after a minute (WiredTiger does a checkpoint every minute).

In addition to the _id index, I also created a secondary index just to double check.

What I found is that secondary index sizes are quite consistent from 4.4 to 6.0, with 4.2 being the odd one out. With regard to _id, 4.4 to 6.0 are about 130% larger than 4.2.

I believe what you’re seeing was caused by the new-ish (from MongoDB 5.0) WiredTiger feature of Snapshot History Retention. The introduction of this feature changes a lot of WiredTiger internals, and this is one of the side effect of the change. To be complete, this issue was known, and was mentioned in SERVER-47652, WT-6082, and WT-6251.

Hope this explains what you’re seeing here :slight_smile:

Best regards
Kevin

5 Likes