2 / 4
Apr 2024

Hi all,

I have a replicaSet with two secondaries and one primary.
I want to direct all the readonly traffic to the secondaries.
I am doing this by using the readPreference=‘secondary’ or readPreference=‘secondaryPreferred’.

The clients are opened with pymongo and the url in the uri is a seed list, with the two secondaries and also the primaries.

This used to work for the last few years. All the traffic always just went to the secondaries, never received connections to the primary.

Now… something changed in the last three months, and now it looks like each connection to the secondaries produces a connection also to the primary. This really blows out the machine where the primary is hosted.

I am trying to figure out how this could have happened. The only hint is that in some applications the pymongo was bumped from 3.13.0 to 4.3.3 around that time.

Now testing locally I can see that specifying a seed list or a single instance makes a difference. But only for pymongo==3.13.0.

  • with pymongo==3.13.0 - seed list → connects to primary
  • with pymongo==3.13.0 - one server → NO connection to primary
  • with pymongo==4.3.3 - seed list → connects to primary
  • with pymongo==4.3.3 - one server → connects to primary
  • with pymongo==4.3.3 - one server + directConnection=True–> NO connection to primary

So… how do I now go back to being able to give a seed list and not getting a connection to the primary? What else could have changed? How to solve?

Thanks so much for the help.

Best,
Carlo

Yes, starting in 4.0 pymongo will automatically connect to the entire replica set even if only one host is given unless directConnection is set to True, see: PyMongo 4 Migration Guide - PyMongo 4.6.3 documentation

However, this should not result in too many connections being opened to the primary unless the app is setting minPoolSize. Are you using minPoolSize? Can you share how you’re creating your MongoClient(s) including settings and how many get created?

Note that there was never a way to connect with a seedlist and discover the replica set secondaries without ever connecting to the primary.

Hi Shane,

thank you so much for your answer.

This is how we are initialising the client:

url = "secondary_1:27017,secondary2:27017,primary:27017/database" uri = f"mongodb://{user}:{password}@{url}" client = pymongo.MongoClient( uri, readPreference='secondaryPreferred', maxPoolSize=max_pool_size, # default to 100 socketTimeoutMS=socket_timeout, # default to 60000 connectTimeoutMS=connect_timeout # default to 60000 )

There are sometimes thousands of simultaneous jobs running on different clusters in different sites.

I understand now the behaviour of pymongo>=4.0, and the origin of all these connections to the primary ( a lot of them ! ). Still, I am quite sure that we used to run with the same settings with pymongo==3.13.0, and never got any connection to the primary even with the seed list, but just to the two secondaries. :confused: Any idea how that could happen?

Best,
Carlo

that we used to run with the same settings with pymongo==3.13.0, and never got any connection to the primary even with the seed list, but just to the two secondaries.

Even in PyMongo 3, it’s not possible for MongoClient to connect like this without creating at least one connection to the primary.

Back in PyMongo 3.11 we did increase the number of monitoring connections from 1 per node per MongoClient to 2 (part of https://jira.mongodb.org/browse/PYTHON-2123). If you are creating thousands of MongoClients then you could be seeing the effects of that change.

Could you briefly describe the app architecture? Like how many app servers, how many MongoClients are created at once, are the MongoClient reused between threads, are you using a function as a service like AWS Lambda, etc…

One last suggestion: what happens when you set serverMonitoringMode="poll" when creating the MongoClients? serverMonitoringMode was introduced in PyMongo 4.6 (https://jira.mongodb.org/browse/PYTHON-3668). serverMonitoringMode="poll" makes the client go back to using the pre-3.11 behavior which only requires 1 connection per node instead of 2.