Hello, I have a replica set where the primary (db2) runs on 7.0.0 on Amazon Linux 2, one replicant (db3) runs on 7.0.8 on Amazon Linux 2 and one replicant (db1) runs on 7.0.15 on Amazon Linux 2023. I know it’s not ideal, but we got in this state due to some upgrades when machines died. The machines are all i3en.2xlarge with 64Gb of RAM.
I’m having the following issue, where db2 and db3 consistently use <50% of RAM while db1 starts from <50% after a reboot, but then memory keeps growing unbounded till the machine becomes unresponsive and dies.
If I run top on db1, I can see that currently it is using 47Gb of virtual memory and that 42 is resident.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
77416 mongod 20 0 47.2g 42.8g 27500 S 130.2 69.0 1225:27 mongod
vmstats:
procs -----------memory---------- —swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 450596 24 19457108 0 0 1366 1407 31 17 12 1 86 1 0
On the other secondary (db3), instead I see a higher virtual memory usage (52Gb), but the resident memory is much lower (25Gb).
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29720 mongod 20 0 52.9g 25.2g 17340 S 9.3 40.6 515443:49 mongod
vmstat:
procs -----------memory---------- —swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 540764 0 37828440 0 0 1177 1188 0 0 12 2 85 1 0
I already checked, and the wired tiger cache max size is not configured, so it should default to 50%.
The vm configuration is pretty much the same on both machines, and vm.swappiness is set to zero in both machines.
Any idea on what this could be?