Overview
There are various mechanisms used to keep Kubernetes clusters up and running,
such as high availability (HA), self-healing, and more. In many cases, even if
something fails, the cluster can get quickly recovered without any effect on
the workload.
In rare cases, such as when multiple instances fail at the same time, etcd
can lose the quorum, making the cluster fail completely. When that happens
the only possibility is to recreate the cluster and restore it from a backup.
This document explains how to manually recreate a cluster and recover from
a backup that was made previously.
When Should I Recover The Cluster?
This approach should be used only when the etcd quorum is lost or when it’s
impossible to repair the cluster for some other reason. The general rule is
that the etcd quorum should be satisfied as long as there are (n/2)+1
healthy etcd members in the etcd ring.
This guide can also be used if you want to migrate to the new infrastructure
or the new provider.
In other cases, you should first try to repair the cluster by following the
manual cluster repair guide.
Terminology
- etcd quorum: an etcd cluster needs a majority of nodes (
(n/2)+1
),
a quorum, to agree on updates to the cluster state - etcd ring: a group of etcd instances forming a single etcd cluster.
- etcd member: a known peer instance (running on the control plane nodes) of
etcd inside the etcd ring.
- leader instance: a VM instance where cluster PKI gets generated and first
control plane components are launched at cluster initialization time.
Goals
- Destroy/unprovision old, non-functional cluster
- Recreate a cluster from a previously made backup
Non-goals (Out of the Scope)
- Create a backup
- You can use our backups addon to automatically backup all
important files and components.
- Repair a cluster without restoring from a backup
Requirements
- Restic installed on your local machine in case you’ve used our
backups addon.
As long as the cluster endpoint (load balancer address) is the same, the worker
nodes will automatically rejoin the new cluster after some time. Besides that,
you’ll be able to use the old kubeconfig files to access the cluster.
In the case the cluster endpoint is different, the worker nodes and all
kubeconfig files must be recreated.
This is important, because even if the control plane nodes are down, the
workload should still be running on the worker nodes. Although, the workload
might be inaccessible and you’ll not be able to do any changes, such as
schedule new pods or remove existing ones.
Step 1 — Download and Unpack The Backup
In this guide, we will assume that you have used our backups addon
to create and manage backups. You can also use any other solution, but for a
successful recovery, you will need all files mentioned in this step.
First, you need to instruct Restic how to access the S3 bucket containing
the backups by exporting the environment variables with credentials, bucket
name, and the encryption password:
export RESTIC_REPOSITORY="s3:s3.amazonaws.com/<<S3_BUCKET>>"
export RESTIC_PASSWORD="<<RESTIC_PASSWORD>>"
export AWS_ACCESS_KEY_ID="<<AWS_ACCESS_KEY_ID>>"
export AWS_SECRET_ACCESS_KEY="<<AWS_SECRET_ACCESS_KEY>>"
With the credentials and information about the bucket in place, you can now
list all available backups:
You should see the output such as:
repository cd5add2d opened successfully, password is correct
ID Time Host Tags Paths
-----------------------------------------------------------------------------------------------
b1ea3ff1 2020-04-21 16:41:46 ip-172-31-122-61.eu-west-3.compute.internal etcd /backup
c43ea2d7 2020-04-21 16:46:46 ip-172-31-122-61.eu-west-3.compute.internal etcd /backup
92f33a9c 2020-04-21 16:51:46 ip-172-31-122-61.eu-west-3.compute.internal etcd /backup
3ee8cc50 2020-04-21 16:56:47 ip-172-31-122-61.eu-west-3.compute.internal etcd /backup
a2bbd29f 2020-04-21 17:01:48 ip-172-31-122-61.eu-west-3.compute.internal etcd /backup
8d9a9d63 2020-04-21 17:06:49 ip-172-31-122-61.eu-west-3.compute.internal etcd /backup
-----------------------------------------------------------------------------------------------
6 snapshots
Copy the ID of the backup you want to restore the cluster from. You should use
the backup from when the cluster was fully-functional.
Run the following command to download the backup:
restic restore <<BACKUP_ID>> --target .
This command will download the backup to your current directory. After the
command is done, you should have the backup
directory with the following
files:
backup
├── ip-172-31-122-61.eu-west-3.compute.internal-snapshot.db
└── pki
├── etcd
│ ├── ca.crt
│ └── ca.key
└── kubernetes
├── ca.crt
├── ca.key
├── front-proxy-ca.crt
├── front-proxy-ca.key
├── sa.key
└── sa.pub
3 directories, 9 files
Step 2 — Unprovision The Existing Cluster
As the cluster is in the beyond repairable state, we want to start from scratch
and provision the cluster again. The first step is to unprovision the existing
cluster. There are two possible options: recreate the VM instances
(recommended) or reset the cluster using the kubeone reset
command.
The worker nodes should not be removed. Instead, we will attempt to reuse
the existing nodes by rejoining them to the new cluster.
Option 1: Recreating the VM instances (Recommended)
The best approach is to destroy the old VM instances and create new ones. This
ensures that if anything is broken on the instance itself, it’ll not affect
the newly provisioned cluster.
If you are using Terraform, you can mark instances for recreation using the
taint
command. The taint
command takes the resource type and the resource
name.
In case of AWS, the following taint commands should be used:
terraform taint 'aws_instance.control_plane[0]'
terraform taint 'aws_instance.control_plane[1]'
terraform taint 'aws_instance.control_plane[2]'
Running the apply
command will recreate the instances and update the load
balancer to point to the new instances.
Export the new Terraform state file:
terraform output -json > tf.json
Recreating Instances Manually
If you are managing the infrastructure manually, you need to remove and create
instances using your preferred method for managing infrastructure. Once the
new instances are created, update the KubeOne configuration manifest.
The information about the instances are located in the .controlPlane.hosts
part of the configuration manifest:
apiVersion: kubeone.k8c.io/v1beta2
kind: KubeOneCluster
name: demo-cluster
controlPlane:
hosts:
- privateAddress: '172.18.0.1'
...
- privateAddress: '172.18.0.2'
...
- privateAddress: '172.18.0.3'
...
Option 2: Reset The Cluster Using KubeOne
This is not recommended because if something is broken on the instance itself,
it can affect the newly created cluster as well.
If you’re not able to recreate the VM instances you can reuse the existing
ones.
Unprovision the cluster by running the reset
command such as:
kubeone reset config.yaml -t tf.json --destroy-workers=false
After this is done, ensure that the /etc/kubernetes
directory is empty on all
control plane instances. You can do that by SSH-ing to each instance and
running:
sudo rm -rf /etc/kubernetes/*
Step 3 — Install The Kubernetes Binaries
Once you have the new instances, you need to install the Kubernetes binaries
before you restore the backup. We’ll not provision the new cluster at this
stage because we want to restore all the needed files first.
Run the following command to install the prerequisites and the Kubernetes
binaries:
kubeone apply --manifest kubeone.yaml -t tf.json --no-init
Step 4 — Restore The Backup
At this point, we want to restore the backup. We’ll first restore the PKI and
then the etcd backup.
Tasks mentioned in this step should be run only on the leader control plane
instance. KubeOne will automatically synchronize files with other instances.
First, copy the downloaded backup to the leader control plane instance.
On Linux and macOS systems, that can be done using rsync
, such as:
rsync -av ./backup user@leader-ip:~/
Use -e 'ssh -J user@bastion-ip'
in the case when bastion host is used.
Once this is done, connect over SSH to the leader control plane instance.
Restore the etcd PKI
Run the following command to restore the etcd PKI:
sudo rsync -av $HOME/backup/pki/etcd /etc/kubernetes/pki/
Restore the Kubernetes PKI
Run the following command to restore the Kubernetes PKI:
sudo rsync -av $HOME/backup/pki/kubernetes/ /etc/kubernetes/pki
With PKI in the place, ensure correct ownership on the /etc/kubernetes
directory:
sudo chown -R root:root /etc/kubernetes
Restore the etcd backup
The easiest way to restore the etcd snapshot is to run a Docker container using
the etcd image which comes with etcdctl
. In this case, we can use the same
etcd image as used by Kubernetes.
Inside the container, we’ll mount the /var/lib
directory, as the etcd data
is by default located in the /var/lib/etcd
directory. Besides the /var/lib
directory, we need to mount the backup directory and provide some information
about the cluster and the node.
Run the following command. Make sure to provide the correct hostname and IP
address.
It’s advised to use the same etcd major.minor
version as used for creating
the snapshot.
# NOTE: Nowadays, with latest kubeadm based kubernetes installations, docker is not installed out of the box on the control plane machines. If you do not have docker installed and only have ctr command available in the control plane node.. then follow commands after docker command
sudo docker run --rm \
-v $HOME/backup:/backup \
-v /var/lib:/var/lib \
-e ETCDCTL_API=3 \
k8s.gcr.io/etcd:3.5.9-0 \
etcdctl \
snapshot restore \
--data-dir=/var/lib/etcd \
--name=<<INSTANCE-HOSTNAME-FQDN>> \
--initial-advertise-peer-urls=https://<<LEADER-PRIVATE-IP-ADDRESS>>:2380 \
--initial-cluster=<<INSTANCE-HOSTNAME-FQDN>>=https://<<LEADER-PRIVATE-IP-ADDRESS>>:2380 \
/backup/etcd-snapshot.db
# Run same container as above but using ctr command
# Ensure to pull the right image of etcd that you refer in ctr run subsequently
sudo ctr image pull registry.k8s.io/etcd:3.5.9-0
sudo ctr run --rm \
--mount type=bind,src=$HOME/backup,dst=/backup,options=rbind:ro \
--mount type=bind,src=/var/lib,dst=/var/lib,options=rbind:rw \
--env ETCDCTL_API=3 \
registry.k8s.io/etcd:3.5.9-0 \
sh etcdctl \
snapshot restore \
--data-dir=/var/lib/etcd \
--name=<<INSTANCE-HOSTNAME-FQDN>> \
--initial-advertise-peer-urls=https://<<LEADER-PRIVATE-IP-ADDRESS>>:2380 \
--initial-cluster=<<INSTANCE-HOSTNAME-FQDN>>=https://<<LEADER-PRIVATE-IP-ADDRESS>>:2380 \
/backup/etcd-snapshot.db
# e.g.
sudo ctr image pull registry.k8s.io/etcd:3.5.9-0
sudo ctr run --rm \
--mount type=bind,src=$HOME/backup,dst=/backup,options=rbind:ro \
--mount type=bind,src=/var/lib,dst=/var/lib,options=rbind:rw \
--env ETCDCTL_API=3 \
registry.k8s.io/etcd:3.5.9-0 \
sh etcdctl \
snapshot restore \
--data-dir=/var/lib/etcd \
--name=dev-cp-1 \
--initial-advertise-peer-urls=https://10.5.0.1:2380 \
--initial-cluster=dev-cp-1=https://10.5.0.1:2380 \
/backup/etcd-snapshot.db
After the command is done, etcd data will be in the place. Other nodes will get
the data when they join the etcd cluster.
Step 5 — Provision The New Cluster
Finally, with all the needed files in the place, along with the etcd data,
proceed with provisioning the new cluster.
On your local machine, run the kubeone apply
command:
kubeone apply --manifest kubeone.yaml -t tf.json
The provisioning process takes about 5-10 minutes. If the cluster endpoint
(load balancer) is the same as the old one, the existing worker will join
the new cluster after some time. Otherwise, the machine-controller will create
the new worker nodes.