I’ve been watching this thread for a couple months as we are experiencing the same issue. We are running three instances of mongo, each in an EC2 instance inside the same VPC and subnet, configured as a replica set. We have been receiving the same CompositeServerSelector error in a few different flavors, and upgrading to the 2.11.6 CSHARP driver version did not improve these timeouts.
We were hit a rash of these in production last week and applied a minPoolSize=20 to the most active mongoDB API clients, and that seems to have improved it slightly, but not eliminated the problem. This newer connection string is: mongodb://username:password@mongo0:27017,mongo1:27017,mongo2:27017/admin?authSource=admin&replicaSet=rs0&minPoolSize=20&maxPoolSize=500
The stack trace of the errors we are seeing:
[22:19:41 ERR] A timeout occured after 30000ms selecting a server using CompositeServerSelector{ Selectors = MongoDB.Driver.MongoClient+AreSessionsSupportedServerSelector, LatencyLimitingServerSelector{ AllowedLatencyRange = 00:00:00.0150000 } }. Client view of cluster state is { ClusterId : "1", ConnectionMode : "ReplicaSet", Type : "ReplicaSet", State : "Disconnected", Servers : [{ ServerId: "{ ClusterId : 1, EndPoint : "Unspecified/mongo0:27017" }", EndPoint: "Unspecified/mongo0:27017", ReasonChanged: "Heartbeat", State: "Disconnected", ServerVersion: , TopologyVersion: , Type: "Unknown", HeartbeatException: "MongoDB.Driver.MongoConnectionException: An exception occurred while receiving a message from the server.
---> System.TimeoutException: The operation has timed out.
at MongoDB.Driver.Core.Misc.StreamExtensionMethods.ReadAsync(Stream stream, Byte[] buffer, Int32 offset, Int32 count, TimeSpan timeout, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Misc.StreamExtensionMethods.ReadBytesAsync(Stream stream, Byte[] buffer, Int32 offset, Int32 count, TimeSpan timeout, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Connections.BinaryConnection.ReceiveBufferAsync(CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at MongoDB.Driver.Core.Connections.BinaryConnection.ReceiveMessageAsync(Int32 responseTo, IMessageEncoderSelector encoderSelector, MessageEncoderSettings messageEncoderSettings, CancellationToken cancellationToken)
at MongoDB.Driver.Core.WireProtocol.CommandUsingCommandMessageWireProtocol`1.ExecuteAsync(IConnection connection, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Connections.IsMasterHelper.GetResultAsync(IConnection connection, CommandWireProtocol`1 isMasterProtocol, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Servers.ServerMonitor.GetIsMasterResultAsync(IConnection connection, CommandWireProtocol`1 isMasterProtocol, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Servers.ServerMonitor.HeartbeatAsync(CancellationToken cancellationToken)", LastHeartbeatTimestamp: "2021-03-14T22:18:27.4390847Z", LastUpdateTimestamp: "2021-03-14T22:18:27.4390850Z" }, { ServerId: "{ ClusterId : 1, EndPoint : "Unspecified/mongo1:27017" }", EndPoint: "Unspecified/mongo1:27017", ReasonChanged: "Heartbeat", State: "Disconnected", ServerVersion: , TopologyVersion: , Type: "Unknown", HeartbeatException: "MongoDB.Driver.MongoConnectionException: An exception occurred while opening a connection to the server.
---> MongoDB.Driver.MongoConnectionException: An exception occurred while receiving a message from the server.
---> System.TimeoutException: The operation has timed out.
at MongoDB.Driver.Core.Misc.StreamExtensionMethods.ReadAsync(Stream stream, Byte[] buffer, Int32 offset, Int32 count, TimeSpan timeout, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Misc.StreamExtensionMethods.ReadBytesAsync(Stream stream, Byte[] buffer, Int32 offset, Int32 count, TimeSpan timeout, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Connections.BinaryConnection.ReceiveBufferAsync(CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at MongoDB.Driver.Core.Connections.BinaryConnection.ReceiveMessageAsync(Int32 responseTo, IMessageEncoderSelector encoderSelector, MessageEncoderSettings messageEncoderSettings, CancellationToken cancellationToken)
at MongoDB.Driver.Core.WireProtocol.CommandUsingQueryMessageWireProtocol`1.ExecuteAsync(IConnection connection, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Connections.IsMasterHelper.GetResultAsync(IConnection connection, CommandWireProtocol`1 isMasterProtocol, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Connections.ConnectionInitializer.InitializeConnectionAsync(IConnection connection, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Connections.BinaryConnection.OpenHelperAsync(CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at MongoDB.Driver.Core.Connections.BinaryConnection.OpenHelperAsync(CancellationToken cancellationToken)
at MongoDB.Driver.Core.Servers.ServerMonitor.InitializeConnectionAsync(CancellationToken cancellationToken)
at MongoDB.Driver.Core.Servers.ServerMonitor.HeartbeatAsync(CancellationToken cancellationToken)", LastHeartbeatTimestamp: "2021-03-14T22:18:27.4350973Z", LastUpdateTimestamp: "2021-03-14T22:18:27.4350976Z" }, { ServerId: "{ ClusterId : 1, EndPoint : "Unspecified/mongo2:27017" }", EndPoint: "Unspecified/mongo2:27017", ReasonChanged: "Heartbeat", State: "Disconnected", ServerVersion: , TopologyVersion: , Type: "Unknown", HeartbeatException: "MongoDB.Driver.MongoConnectionException: An exception occurred while opening a connection to the server.
---> System.TimeoutException: Timed out connecting to 172.31.128.202:27017. Timeout was 00:00:30.
at MongoDB.Driver.Core.Connections.TcpStreamFactory.ConnectAsync(Socket socket, EndPoint endPoint, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Connections.TcpStreamFactory.CreateStreamAsync(EndPoint endPoint, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Connections.BinaryConnection.OpenHelperAsync(CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at MongoDB.Driver.Core.Connections.BinaryConnection.OpenHelperAsync(CancellationToken cancellationToken)
at MongoDB.Driver.Core.Servers.ServerMonitor.InitializeConnectionAsync(CancellationToken cancellationToken)
at MongoDB.Driver.Core.Servers.ServerMonitor.HeartbeatAsync(CancellationToken cancellationToken)", LastHeartbeatTimestamp: "2021-03-14T22:19:25.9946242Z", LastUpdateTimestamp: "2021-03-14T22:19:25.9946246Z" }] }.
We are not experiencing primary stepdowns, but may be seeing widely varying load conditions that are affecting the RTT.
From the primary:
"connections" : {
"current" : 1142,
"available" : 50058,
"totalCreated" : 7653175,
"active" : 3
},
"electionMetrics" : {
"stepUpCmd" : {
"called" : 0,
"successful" : 0
},
"priorityTakeover" : {
"called" : 1,
"successful" : 1
},
"catchUpTakeover" : {
"called" : 0,
"successful" : 0
},
"electionTimeout" : {
"called" : 0,
"successful" : 0
},
"freezeTimeout" : {
"called" : 0,
"successful" : 0
},
"numStepDownsCausedByHigherTerm" : 0,
"numCatchUps" : 0,
"numCatchUpsSucceeded" : 0,
"numCatchUpsAlreadyCaughtUp" : 1,
"numCatchUpsSkipped" : 0,
"numCatchUpsTimedOut" : 0,
"numCatchUpsFailedWithError" : 0,
"numCatchUpsFailedWithNewTerm" : 0,
"numCatchUpsFailedWithReplSetAbortPrimaryCatchUpCmd" : 0,
"averageCatchUpOps" : 0
},
As far I can tell with multiple dev reviews, we are correctly using the singleton instantiation of the MongoClient and rely on it to manage the connection pools when needed.
When the mongo pool hangs, the API response times spike, then monotonically decrease until the API container that is using the MongoClient is recycled because of connection timeouts.
My next thought is to start playing with localThreshold to increase the window above the default 15ms to make sure that one or more servers are available in extreme events. Is there a way to get the RTT to servers as seen from the mongo driver and dump those in the case of these timeouts to see if they are limiting?