I ran some load tests on my Azure Functions, which talk to a MongoDb dedicated M10 cluster. I have the cluster set to autoscale.

After trying to run 1000 concurrent users reading and writing to the database, I quickly started getting WaitForQueueTimeout exceptions. I fixed this by increasing the settings.MaxConnecting and settings.MaxConnectionPoolSize values on the driver settings.

But now what happens when my game has 5000, 10000, 1 million users? How am I supposed to scale my game confidently when the driver settings are imposing limitations on my game’s ability to scale?

I want my infrastructure (Azure Functions and MongoDb Cluster) to set the limits, not some arbitrary value on the driver settings.

Please advise which settings I should use on the driver for this case.

Hi @MeltdownInteractive,

Are there any other details you can share about the application architecture? Assuming each game client (Player) is not corresponding to a connection? Are the functions using the same connection pool? i.e The connection pool might be starved due to the concurrent workload. There are more details around how connection pooling works with the .NET/C# driver here.

Thanks,

Rishit.

Hi Rishit,

I have read that document.

My point is I don’t want the connection pool to be starved because of some setting on the driver, I want to set it, so my Azure Functions can cause my MongoDb environment to scale all the way to an M200 dedicated cluster if needed.

My Azure Functions instance uses one MongoClient instance for all MongoDb operations.

So what is the recommended way to configure the driver settings so it doesn’t put any limitation on the driver?

1 month later
8 days later

Hi @MeltdownInteractive,

Sorry for the delay but here are some pointers based on my conversations with the engineering team. Our connection pool does not support dynamic settings changes (this is not unique to C# driver).

  • Connection pool settings are per node, therefore horizontal scaling is not effected (sharding).
  • MaxConnecting value should not change significantly across different cluster tiers. Its value should be much lower than MaxConnectionPoolSize. Too many concurrent connection establishments would result in degradation of overall performance on any tier.
  • MaxConnecting should match your expected maximal load. Higher value on smaller clusters, will just result in not protecting the server against load spikes. In that case instead of WaitForQueueTimeout you would face server errors/degraded performance.
  • As a rule of thumb, scaling the cluster up should not require significantly more concurrent connections, as every request is processed faster, overall throughput increases with the same amount of connections.

Hope that helps.

Thanks,

Rishit.

@Rishit_Bhatia

My expected maximal load could be 10 000 concurrent users. So how do I configure the driver for this?

Higher value on smaller clusters, will just result in not protecting the server against load spikes. In that case instead of WaitForQueueTimeout you would face server errors/degraded performance.

I don’t want WaitForQueueTimeout and I don’t want server errors or degraded performance, I want a solution/plan so my players don’t have API calls failing.

Currently I have MaxConnecting and MaxPoolSize both set to 1000, but with 1000 concurrent users this still results in 2-5% of requests not being served which is not acceptable.

I need a solution that lets me handle up to 10 000 concurrent users with all requests being served. Please advise how I can achieve this.

Hi @MeltdownInteractive,

We don’t have a way to dynamically change driver’s settings on the fly. It’s difficult to provide you with a solution for what you’re looking for around scaling with the limited information I have. The number of users you are catering to should not be directly proportional to the number of connections, so I’m not sure why are you seeing requests not being served. Each Atlas tier has its connection limits mentioned here and there will be some connections used up by monitoring itself. I’m not sure in this scenario if all the connections are being used up. If you have a way to reproduce an issue you are seeing in a self contained way then please feel free to raise a ticket here. The other option would be to reach out to our support team but the self contained repro would be needed in the latter case as well.

Thanks,

Rishit.