I am using Node.js with Mongoose to connect with these settings: let connection = await mongoose.createConnection(process.env.MONGO_DB_URL, {useNewUrlParser: true, useUnifiedTopology: true});
Randomly once a week or so I will start getting connection errors of MongoNetworkError: read ECONNRESET for between 10 minutes to 1.5 hours on some not all of my connections. I am using the serverless instance with MongoDB and was informed by support that there are downtimes for the serverless instance.
I don’t currently have a retry connection flow and would like to implement one but since these tasks need to be handled quickly to represent data to the app users, does anyone have thoughts on what an optimized retry flow would look like for this? Is there any other connection settings you would recommend like increasing timeouts?
Randomly once a week or so I will start getting connection errors of MongoNetworkError: read ECONNRESET for between 10 minutes to 1.5 hours on some not all of my connections
Can you clarify what this means to your application. For example, are you “production down” during these times or is there just a lot of “noise” in the logs regarding connections being reset?
I don’t currently have a retry connection flow and would like to implement one but since these tasks need to be handled quickly to represent data to the app users, does anyone have thoughts on what an optimized retry flow would look like for this?
All modern MongoDB Drivers (including the Node.js driver that is included with Mongoose) have Retryable Writes and Retryable Reads enabled by default. These should automatically handle most transient network failures without the need for additional retry logic.
Is there any other connection settings you would recommend like increasing timeouts?
Before trying to tune any settings it would help if we understood what the impact of these errors are on your active workload.
For users who are trying to retrieve information on active web pages, they may not be served that information across multiple websites during the time the connections are down. Additionally, if an app user attempts a POST request action that action will error out during that time. Webhooks retry themselves so that isn’t really much of an issue. Otherwise, yes it’s a lot of noise in the logs.
Unfortunately if there is genuinely a service disruption (as opposed to a transient network error) custom retry logic wouldn’t mitigate this issue. The serverless team is looking into improvements in this area though
Hey @chris_nguyen_210, it really depends what the underlying issue is. ECONNRESET just means the underlying connection was closed, so understanding “why” that happening in the first place would be necessary.