Recover the Kubernetes Operator and Ops Manager for Multi-Cluster AppDB Deployments
On this page
If you host an Ops Manager resource in the same Kubernetes cluster as the Kubernetes Operator and have the Application Database (AppDB) deployed on selected member clusters in your multi-Kubernetes-cluster deployment, you can manually recover the Kubernetes Operator and Ops Manager in the event that the cluster fails.
To learn more about deploying Ops Manager on a central cluster and the Application Database across member clusters, see Using Ops Manager with Multi-Kubernetes-Cluster Deployments.
Prerequisites
Before you can recover the Kubernetes Operator and Ops Manager, ensure that you meet the following requirements:
Configure backups for your Ops Manager and Application Database resources, including any ConfigMaps and secrets created by the Kubernetes Operator, to indicate the previous running state of Ops Manager. To learn more, see Backup.
The Application Database must have at least three healthy nodes remaining after failure of the Kubernetes Operator's cluster.
The healthy clusters in your multi-Kubernetes-cluster deployment must contain a sufficient number of members to elect a primary node. To learn more, see Application Database Architecture.
Considerations
Application Database Architecture
Because the Kubernetes Operator doesn't support forcing a replica set reconfiguration, the healthy Kubernetes clusters must contain a sufficient number of Application Database members to elect a primary node for this manual recovery process. A majority of the Application Database members must be available to elect a primary. To learn more, see Replica Set Deployment Architectures.
If possible, use an odd number of member Kubernetes clusters. Proper distribution of your Application Database members can help to maximize the likelihood that the remaining replica set members can form a majority during an outage. To learn more, see Replica Sets Distributed Across Two or More Data Centers.
Consider the following examples:
For a five-member Application Database, some possible distributions of members include:
Two clusters: three members to Cluster 1 and two members to Cluster 2.
If Cluster 2 fails, there are enough members on Cluster 1 to elect a primary node.
If Cluster 1 fails, there are not enough members on Cluster 2 to elect a primary node.
Three clusters: two members to Cluster 1, two members to Cluster 2, and one member to Cluster 3.
If any single cluster fails, there are enough members on the remaining clusters to elect a primary node.
If two clusters fail, there are not enough members on any remaining cluster to elect a primary node.
For a seven-member Application Database, consider the following distribution of members:
Two clusters: four members to Cluster 1 and three members to Cluster 2.
If Cluster 2 fails, there are enough members on Cluster 1 to elect a primary node.
If Cluster 1 fails, there are not enough members on Cluster 2 to elect a primary node.
Although Cluster 2 meets the three member minimum for the Application Database, a majority of the Application Database's seven members must be available to elect a primary node.
Procedure
To recover the Kubernetes Operator and Ops Manager, restore the Ops Manager resource on a new Kubernetes cluster:
Configure the Kubernetes Operator in a new cluster.
Follow the instructions to install the Kubernetes Operator in a new Kubernetes cluster.
Note
If you plan to re-use a member cluster, ensure that the appropriate service account and role exist. These values can overlap and have different permissions between the central cluster and member cluster.
To see the appropriate role required for the Kubernetes Operator, refer to the sample in the public repository.
Retrieve the backed-up resources from the failed Ops Manager resource.
Copy the object specification for the failed Ops Manager resource and retrieve the following resources, replacing the placeholder text with your specific Ops Manager resource name and namespace.
Resource Type | Values |
---|---|
Secrets |
|
ConfigMaps |
|
OpsManager |
|
Then, paste the specification that you copied into a new file and configure the new resource by using the preceding values. To learn more, see Deploy an Ops Manager Resource.
Re-apply the Ops Manager resource to the new cluster.
Use the following command to apply the updated resource:
kubectl apply \ --context "$MDB_CENTRAL_CLUSTER_FULL_NAME" \ --namespace "mongodb" -f https://raw.githubusercontent.com/mongodb/mongodb-enterprise-kubernetes/master/samples/ops-manager/ops-manager-external.yaml
To check the status of your Ops Manager resource, use the following command:
kubectl get om -o yaml -w
Once the central cluster reaches a Running
state, you can
re-scale the Application Database to your desired
distribution of member clusters.
Re-apply the MongoDB resources to the new cluster.
To host your MongoDB
resource or MongoDBMultiCluster
resource on the new
Kubernetes Operator instance, apply the following resources to the
new cluster:
The ConfigMap used to create the initial project.
The secrets used in the previous Kubernetes Operator instance.
The
MongoDB
orMongoDBMulticluster
custom resource at its last available state on the source cluster, including any Annotations added by the Kubernetes Operator during its lifecycle.
Note
If you deployed a MongoDB
resource and not a MongoDBMultiCluster
resource
and wish to migrate the failed Kubernetes cluster's data
to the new cluster, you must complete the following additional steps:
Create a new
MongoDB
resource on the new cluster.Migrate the data to the new resource by Backing Up and Restoring the data in Ops Manager.
If you deployed a MongoDBMultiCluster
resource, you must re-scale the resource that you
applied on the new healthy clusters if the failed cluster contained any
Application Database nodes.