Recover the Kubernetes Operator and Ops Manager for Multi-Cluster AppDB Deployments

On this page

Prerequisites

Considerations
Procedure

This version of the documentation is archived and no longer supported. View the current documentation to learn how to upgrade your version of the MongoDB Enterprise Kubernetes Operator.

If you host an Ops Manager resource in the same Kubernetes cluster as the Kubernetes Operator and have the Application Database (AppDB) deployed on selected member clusters in your multi-Kubernetes-cluster deployment, you can manually recover the Kubernetes Operator and Ops Manager in the event that the cluster fails.

To learn more about deploying Ops Manager on a central cluster and the Application Database across member clusters, see Using Ops Manager with Multi-Kubernetes-Cluster Deployments.

Prerequisites

Before you can recover the Kubernetes Operator and Ops Manager, ensure that you meet the following requirements:

Configure backups for your Ops Manager and Application Database resources, including any ConfigMaps and secrets created by the Kubernetes Operator, to indicate the previous running state of Ops Manager. To learn more, see Backup.
The Application Database must have at least three healthy nodes remaining after failure of the Kubernetes Operator's cluster.
The healthy clusters in your multi-Kubernetes-cluster deployment must contain a sufficient number of members to elect a primary node. To learn more, see Application Database Architecture.

Considerations

Application Database Architecture

Because the Kubernetes Operator doesn't support forcing a replica set reconfiguration, the healthy Kubernetes clusters must contain a sufficient number of Application Database members to elect a primary node for this manual recovery process. A majority of the Application Database members must be available to elect a primary. To learn more, see Replica Set Deployment Architectures.

If possible, use an odd number of member Kubernetes clusters. Proper distribution of your Application Database members can help to maximize the likelihood that the remaining replica set members can form a majority during an outage. To learn more, see Replica Sets Distributed Across Two or More Data Centers.

Consider the following examples:

For a five-member Application Database, some possible distributions of members include:

Two clusters: three members to Cluster 1 and two members to Cluster 2.
- If Cluster 2 fails, there are enough members on Cluster 1 to elect a primary node.
- If Cluster 1 fails, there are not enough members on Cluster 2 to elect a primary node.
Three clusters: two members to Cluster 1, two members to Cluster 2, and one member to Cluster 3.
- If any single cluster fails, there are enough members on the remaining clusters to elect a primary node.
- If two clusters fail, there are not enough members on any remaining cluster to elect a primary node.

For a seven-member Application Database, consider the following distribution of members:

Two clusters: four members to Cluster 1 and three members to Cluster 2.
- If Cluster 2 fails, there are enough members on Cluster 1 to elect a primary node.
- If Cluster 1 fails, there are not enough members on Cluster 2 to elect a primary node.

Although Cluster 2 meets the three member minimum for the Application Database, a majority of the Application Database's seven members must be available to elect a primary node.

Procedure

To recover the Kubernetes Operator and Ops Manager, restore the Ops Manager resource on a new Kubernetes cluster:

Configure the Kubernetes Operator in a new cluster.

Follow the instructions to install the Kubernetes Operator in a new Kubernetes cluster.

Note

If you plan to re-use a member cluster, ensure that the appropriate service account and role exist. These values can overlap and have different permissions between the central cluster and member cluster.

To see the appropriate role required for the Kubernetes Operator, refer to the sample in the public repository.

Retrieve the backed-up resources from the failed Ops Manager resource.

Copy the object specification for the failed Ops Manager resource and retrieve the following resources, replacing the placeholder text with your specific Ops Manager resource name and namespace.

Resource Type	Values
Secrets	`<om-name>-db-om-password` `<om-name>-db-agent-password` `<om-name>-db-keyfile` `<om-name>-db-om-user-scram-credentials` `<om-namespace>-<om-name>-admin-key` `<om-name>-admin-secret` `<om-name>-gen-key` TLS certificate secrets (optional)
ConfigMaps	`<om-name>-db-cluster-mapping` `<om-name>-db-member-spec` Custom CA for TLS certificates (optional)
OpsManager	`<om-name>`

Then, paste the specification that you copied into a new file and configure the new resource by using the preceding values. To learn more, see Deploy an Ops Manager Resource.

Re-apply the Ops Manager resource to the new cluster.

Use the following command to apply the updated resource:

kubectl apply \
  --context "$MDB_CENTRAL_CLUSTER_FULL_NAME" \
  --namespace "mongodb"
   -f https://raw.githubusercontent.com/mongodb/mongodb-enterprise-kubernetes/master/samples/ops-manager/ops-manager-external.yaml

To check the status of your Ops Manager resource, use the following command:

kubectl get om -o yaml -w

Once the central cluster reaches a Running state, you can re-scale the Application Database to your desired distribution of member clusters.

Re-apply the MongoDB resources to the new cluster.

To host your MongoDB resource or MongoDBMultiCluster resource on the new Kubernetes Operator instance, apply the following resources to the new cluster:

The ConfigMap used to create the initial project.
The secrets used in the previous Kubernetes Operator instance.
The MongoDB or MongoDBMulticluster custom resource at its last available state on the source cluster, including any Annotations added by the Kubernetes Operator during its lifecycle.

Note

If you deployed a MongoDB resource and not a MongoDBMultiCluster resource and wish to migrate the failed Kubernetes cluster's data to the new cluster, you must complete the following additional steps:

Create a new MongoDB resource on the new cluster.
Migrate the data to the new resource by Backing Up and Restoring the data in Ops Manager.

If you deployed a MongoDBMultiCluster resource, you must re-scale the resource that you applied on the new healthy clusters if the failed cluster contained any Application Database nodes.

Back

Configure cert-manager

Deploy Database Resources