There are various mechanisms used to keep Kubernetes clusters up and running, such as high availability (HA), self-healing, and more. In many cases, even if something fails, the cluster can get quickly recovered without any effect on the workload.
In rare cases, such as when multiple instances fail at the same time, etcd can lose the quorum, making the cluster fail completely. When that happens the only possibility is to recreate the cluster and restore it from a backup.
This document explains how to manually recreate a cluster and recover from a backup that was made previously.
This approach should be used only when the etcd quorum is lost or when it’s
impossible to repair the cluster for some other reason. The general rule is
that the etcd quorum should be satisfied as long as there are
healthy etcd members in the etcd ring.
This guide can also be used if you want to migrate to the new infrastructure or the new provider.
In other cases, you should first try to repair the cluster by following the manual cluster repair guide.
(n/2)+1), a quorum, to agree on updates to the cluster state
As long as the cluster endpoint (load balancer address) is the same, the worker nodes will automatically rejoin the new cluster after some time. Besides that, you’ll be able to use the old kubeconfig files to access the cluster.
In the case the cluster endpoint is different, the worker nodes and all kubeconfig files must be recreated.
This is important, because even if the control plane nodes are down, the workload should still be running on the worker nodes. Although, the workload might be inaccessible and you’ll not be able to do any changes, such as schedule new pods or remove existing ones.
In this guide, we will assume that you have used our backups addon to create and manage backups. You can also use any other solution, but for a successful recovery, you will need all files mentioned in this step.
First, you need to instruct Restic how to access the S3 bucket containing the backups by exporting the environment variables with credentials, bucket name, and the encryption password:
export RESTIC_REPOSITORY="s3:s3.amazonaws.com/<<S3_BUCKET>>" export RESTIC_PASSWORD="<<RESTIC_PASSWORD>>" export AWS_ACCESS_KEY_ID="<<AWS_ACCESS_KEY_ID>>" export AWS_SECRET_ACCESS_KEY="<<AWS_SECRET_ACCESS_KEY>>"
With the credentials and information about the bucket in place, you can now list all available backups:
You should see the output such as:
repository cd5add2d opened successfully, password is correct ID Time Host Tags Paths ----------------------------------------------------------------------------------------------- b1ea3ff1 2020-04-21 16:41:46 ip-172-31-122-61.eu-west-3.compute.internal etcd /backup c43ea2d7 2020-04-21 16:46:46 ip-172-31-122-61.eu-west-3.compute.internal etcd /backup 92f33a9c 2020-04-21 16:51:46 ip-172-31-122-61.eu-west-3.compute.internal etcd /backup 3ee8cc50 2020-04-21 16:56:47 ip-172-31-122-61.eu-west-3.compute.internal etcd /backup a2bbd29f 2020-04-21 17:01:48 ip-172-31-122-61.eu-west-3.compute.internal etcd /backup 8d9a9d63 2020-04-21 17:06:49 ip-172-31-122-61.eu-west-3.compute.internal etcd /backup ----------------------------------------------------------------------------------------------- 6 snapshots
Copy the ID of the backup you want to restore the cluster from. You should use the backup from when the cluster was fully-functional.
Run the following command to download the backup:
restic restore <<BACKUP_ID>> --target .
This command will download the backup to your current directory. After the
command is done, you should have the
backup directory with the following
backup ├── ip-172-31-122-61.eu-west-3.compute.internal-snapshot.db └── pki ├── etcd │ ├── ca.crt │ └── ca.key └── kubernetes ├── ca.crt ├── ca.key ├── front-proxy-ca.crt ├── front-proxy-ca.key ├── sa.key └── sa.pub 3 directories, 9 files
As the cluster is in the beyond repairable state, we want to start from scratch
and provision the cluster again. The first step is to unprovision the existing
cluster. There are two possible options: recreate the VM instances
(recommended) or reset the cluster using the
kubeone reset command.
The worker nodes should not be removed. Instead, we will attempt to reuse the existing nodes by rejoining them to the new cluster.
The best approach is to destroy the old VM instances and create new ones. This ensures that if anything is broken on the instance itself, it’ll not affect the newly provisioned cluster.
If you are using Terraform, you can mark instances for recreation using the
taint command. The
taint command takes the resource type and the resource
In case of AWS, the following taint commands should be used:
terraform taint 'aws_instance.control_plane' terraform taint 'aws_instance.control_plane' terraform taint 'aws_instance.control_plane'
apply command will recreate the instances and update the load
balancer to point to the new instances.
Export the new Terraform state file:
terraform output -json > tf.json
If you are managing the infrastructure manually, you need to remove and create instances using your preferred method for managing infrastructure. Once the new instances are created, update the KubeOne configuration manifest.
The information about the instances are located in the
part of the configuration manifest:
apiVersion: kubeone.io/v1beta1 kind: KubeOneCluster name: demo-cluster controlPlane: hosts: - privateAddress: '172.18.0.1' ... - privateAddress: '172.18.0.2' ... - privateAddress: '172.18.0.3' ...
This is not recommended because if something is broken on the instance itself, it can affect the newly created cluster as well.
If you’re not able to recreate the VM instances you can reuse the existing ones.
Unprovision the cluster by running the
reset command such as:
kubeone reset config.yaml -t tf.json --destroy-workers=false
After this is done, ensure that the
/etc/kubernetes directory is empty on all
control plane instances. You can do that by SSH-ing to each instance and
sudo rm -rf /etc/kubernetes/*
Once you have the new instances, you need to install the Kubernetes binaries before you restore the backup. We’ll not provision the new cluster at this stage because we want to restore all the needed files first.
Run the following command to install the prerequisites and the Kubernetes binaries:
kubeone install --manifest kubeone.yaml -t tf.json --no-init
At this point, we want to restore the backup. We’ll first restore the PKI and then the etcd backup.
Tasks mentioned in this step should be run only on the leader control plane instance. KubeOne will automatically synchronize files with other instances.
First, copy the downloaded backup to the leader control plane instance.
On Linux and macOS systems, that can be done using
rsync, such as:
rsync -av ./backup user@leader-ip:~/
-e 'ssh -J user@bastion-ip' in the case when bastion host is used.
Once this is done, connect over SSH to the leader control plane instance.
Run the following command to restore the etcd PKI:
sudo rsync -av $HOME/backup/pki/etcd /etc/kubernetes/pki/
Run the following command to restore the Kubernetes PKI:
sudo rsync -av $HOME/backup/pki/kubernetes/ /etc/kubernetes/pki
With PKI in the place, ensure correct ownership on the
sudo chown -R root:root /etc/kubernetes
The easiest way to restore the etcd snapshot is to run a Docker container using
the etcd image which comes with
etcdctl. In this case, we can use the same
etcd image as used by Kubernetes.
Inside the container, we’ll mount the
/var/lib directory, as the etcd data
is by default located in the
/var/lib/etcd directory. Besides the
directory, we need to mount the backup directory and provide some information
about the cluster and the node.
Run the following command. Make sure to provide the correct hostname and IP address.
It’s advised to use the same etcd
major.minor version as used for creating
sudo docker run --rm \ -v $HOME/backup:/backup \ -v /var/lib:/var/lib \ -e ETCDCTL_API=3 \ k8s.gcr.io/etcd:3.4.3-0 \ etcdctl \ snapshot restore \ --data-dir=/var/lib/etcd \ --name=<<INSTANCE-HOSTNAME-FQDN>> \ --initial-advertise-peer-urls=https://<<LEADER-PRIVATE-IP-ADDRESS>>:2380 \ --initial-cluster=<<INSTANCE-HOSTNAME-FQDN>>=https://<<LEADER-PRIVATE-IP-ADDRESS>>:2380 \ /backup/etcd-snapshot.db
After the command is done, etcd data will be in the place. Other nodes will get the data when they join the etcd cluster.
Finally, with all the needed files in the place, along with the etcd data, proceed with provisioning the new cluster.
On your local machine, run the
kubeone apply command:
kubeone apply --manifest kubeone.yaml -t tf.json
The provisioning process takes about 5-10 minutes. If the cluster endpoint (load balancer) is the same as the old one, the existing worker will join the new cluster after some time. Otherwise, the machine-controller will create the new worker nodes.