2 / 2
Apr 2024

We’re running a mongoDB cluster on Atlas which makes use of a few triggered functions. One in particular will populate a target collection with a transformed version of a document whenever said document is inserted/updated in a source collection (by executing an aggregation pipeline against the changed document).

While investigating a discrepancy in document counts between the two collections, we’ve identified that the trigger fails to execute properly when the primary is down.

Specifically, during a cluster auto-scaling event, our triggers failed to execute and threw this error:

(PrimarySteppedDown) PlanExecutor error during aggregation :: caused by :: No primary exists currently

The meat of the function is pretty straightforward:

try { // If this is a "delete" event, delete the document in the other collection if (changeEvent.operationType === "delete") { await targetCollection.deleteOne({"_id": docId}); console.log("deleted doc id: ",docId.id); } // If this is an "insert", "update" or "replace" event, then execute pipeline on doc in the source collection to replace the document in the target collection else if (changeEvent.operationType === "insert" || changeEvent.operationType === "update" || changeEvent.operationType === "replace") { await sourceCollection.aggregate(pipeline).toArray(); console.log("updated doc id: ",docId.id); } } catch(err) { console.log("error performing mongodb write: ",err.message); }

What can we do to ensure that a trigger will execute properly in the face of an auto-scaling event?

Hey Robert_Lancaster,

I understand that you are facing some issue with trigger failure so I am here to provide you some solutio to get out of it.

To make sure that triggers execute appropriately during auto-scaling events in a MongoDB cluster on Atlas, you can deal with the “PrimarySteppedDown” error by implementing a retry mechanism. Here is a worked on approach:

try { // Your trigger logic here } catch (err) { if (err.message.includes("PrimarySteppedDown")) { // Retry logic await new Promise(resolve => setTimeout(resolve, 1000)); // Wait for a short duration // Retry the trigger logic try { // Your trigger logic again } catch (retryErr) { console.log("Error performing MongoDB write even after retry:", retryErr.message); } } else { console.log("Error performing MongoDB write:", err.message); } }

This code gets the “PrimarySteppedDown” mistake, wait for a short duration (1 second in this model), and afterward retries the trigger logic. Depending on your requirements and the expected downtime during auto-scaling events, you can adjust the retry interval and number of retries.

Thanks
(James)