Hello MongoDB Community,
We are currently using MongoDB 6.0.8 (also tested on 6.0.9) with the following architecture:
Primary
Secondary (HA)
Secondary (DR)
Arbiter
Problem Description:
When the Secondary (HA) goes down, the Primary experiences significant performance degradation. However, this issue does not occur when the Secondary (DR) goes down.
We have tested various architectures (PSA, PSSAA, and PSSA), but the issue persists. Network, disk, and memory health checks do not indicate any anomalies. Our configuration involves:
Writes on the Primary only.
Reads distributed across both the Primary and Secondary.
Current MongoDB Configuration:
RW Concern:
db.adminCommand({ getDefaultRWConcern: 1 })
{
defaultReadConcern: { level: ‘local’ },
defaultWriteConcern: { w: 1, wtimeout: 0 },
updateOpTime: Timestamp({ t: 1712918392, i: 367 }),
updateWallClockTime: ISODate(“2024-04-12T10:39:52.236Z”),
defaultWriteConcernSource: ‘global’,
defaultReadConcernSource: ‘global’,
localUpdateWallClockTime: ISODate(“2024-10-31T15:34:18.932Z”),
ok: 1,
‘$clusterTime’: {
clusterTime: Timestamp({ t: 1732012824, i: 127 }),
signature: {
hash: Binary.createFromBase64("AAAAAAAAAAAAAAAAAAAAAAAAAAA=", 0),
keyId: Long("0")
}
},
operationTime: Timestamp({ t: 1732012824, i: 127 })
}
Initial Sync Transient Error Retry Period:
db.adminCommand( { getParameter: 1, initialSyncTransientErrorRetryPeriodSeconds: 1 } )
{
initialSyncTransientErrorRetryPeriodSeconds: 86400,
ok: 1,
‘$clusterTime’: {
clusterTime: Timestamp({ t: 1732192160, i: 24 }),
signature: {
hash: Binary.createFromBase64("AAAAAAAAAAAAAAAAAAAAAAAAAAA=", 0),
keyId: Long("0")
}
},
operationTime: Timestamp({ t: 1732192160, i: 24 })
}
Oplog Initial Find Max Seconds:
db.adminCommand({getParameter: 1, oplogInitialFindMaxSeconds: 1})
{
oplogInitialFindMaxSeconds: 60,
ok: 1,
‘$clusterTime’: {
clusterTime: Timestamp({ t: 1732192172, i: 69 }),
signature: {
hash: Binary.createFromBase64("AAAAAAAAAAAAAAAAAAAAAAAAAAA=", 0),
keyId: Long("0")
}
},
operationTime: Timestamp({ t: 1732192172, i: 69 })
}
MongoDB Configuration File (mongod.conf):
###New MongoDB conf
systemLog:
destination: file
logAppend: false
path: /Data/MongoDB_Live/MongoDB/log/mongo.log
storage:
dbPath: /Data/MongoDB_Live/MongoDB/db
journal:
enabled: true
directoryPerDB: true
wiredTiger:
engineConfig:
cacheSizeGB: 150
processManagement:
fork: true
net:
port: 27020
bindIp: 0.0.0.0
replication:
oplogSizeMB: 5242880
replSetName: ABC_Production
#setParameter:
initialSyncOplogFetcherBatchSize: 41943040 # 40 MB
bgSyncOplogFetcherBatchSize: 41943040
Observations:
The issue is consistent across MongoDB versions 6.0.8 and 6.0.9.
The Primary does not face performance degradation when the Secondary (DR) is down.
We tried adjusting batch sizes (initialSyncOplogFetcherBatchSize, bgSyncOplogFetcherBatchSize) without any significant improvement.
Request for Assistance:
Is there any known issue related to Primary performance degradation when one specific secondary (HA) is unavailable?
Could there be specific replication settings or timeout parameters that need adjustment in this scenario?
Any suggestions on troubleshooting this further?
Thank you in advance for your support!