Restoring from Backup

This page documents a manual restore procedure in case the legacy backup controllers were used to create the backup object in question. Legacy backup controllers will be removed in a future release. The current implementation supports automated restores, so this procedure should not be used. Use the restore functionality in KKP directly instead.

Intro

The etcds of the user-clusters are being backed up on a configured interval. This document will lead through the process of restoring a complete etcd StatefulSet from a single snapshot.

Pausing the Cluster

Restoring a etcd requires manual intervention. As the StatefulSet needs to be modified, the affected cluster needs to be removed from the controllers management:

# set cluster.spec.pause=true
kubectl edit cluster xxxxxxxxxx

Pausing the StatefulSet

To restore an etcd, the etcd must not be running. Therefore the etcd statefulset must be configured to just execute a exec /bin/sleep 86400.

# change command to run 'exec /bin/sleep 86400'
kubectl -n cluster-xxxxxxxxxx edit statefulset etcd

Deleting All PVCs

To ensure that we start on each pod with a empty disk, we delete all PVC’s. The StatefulSet will create new ones with empty PV’s automatically.

kubectl -n cluster-xxxxxxxxxx delete pvc -l app=etcd

Deleting All Pods

To ensure all Pods start with the sleep command and with new PV’s, all etcd pods must be deleted.

kubectl -n cluster-xxxxxxxxxx delete pod -l app=etcd

Restoring etcd (Must Be Executed on All etcd Pods)

The restore command is different for each member. Make sure to update it gets executed.

# Copy snapshot into pod
kubectl cp snapshot.db cluster-xxxxxxxxxx/etcd-0:/var/run/etcd/
# Exec into the pod
kubectl -n cluster-xxxxxxxxxx exec -ti etcd-0 sh

cd /var/run/etcd/
# Inside the pod, restore from the snapshot
# This command is specific to each member.
export MEMBER=etcd-0
export CLUSTER_ID=xxxxxxxxxx

With etcd-launcher Enabled

If etcd-launcher is enabled (which it is by default since KKP v2.22), the restore command needs to use TLS-enabled endpoints:

etcdctl snapshot restore snapshot.db \
  --name ${MEMBER} \
  --initial-cluster etcd-0=https://etcd-0.etcd.cluster-${CLUSTER_ID}.svc.cluster.local:2381,etcd-1=https://etcd-1.etcd.cluster-${CLUSTER_ID}.svc.cluster.local:2381,etcd-2=https://etcd-2.etcd.cluster-${CLUSTER_ID}.svc.cluster.local:2381 \
  --initial-cluster-token ${CLUSTER_ID} \
  --initial-advertise-peer-urls https://${MEMBER}.etcd.cluster-${CLUSTER_ID}.svc.cluster.local:2381 \
  --data-dir /var/run/etcd/pod_${MEMBER}/

With etcd-launcher Disabled

If etcd-launcher is disabled (which is not recommended), the restore command needs to use plain HTTP networking:

etcdctl snapshot restore snapshot.db \
  --name ${MEMBER} \
  --initial-cluster etcd-0=http://etcd-0.etcd.cluster-${CLUSTER_ID}.svc.cluster.local:2380,etcd-1=http://etcd-1.etcd.cluster-${CLUSTER_ID}.svc.cluster.local:2380,etcd-2=http://etcd-2.etcd.cluster-${CLUSTER_ID}.svc.cluster.local:2380 \
  --initial-cluster-token ${CLUSTER_ID} \
  --initial-advertise-peer-urls http://${MEMBER}.etcd.cluster-${CLUSTER_ID}.svc.cluster.local:2380 \
  --data-dir /var/run/etcd/pod_${MEMBER}/

Un-Pausing the Cluster

To let the kubermatic Kubernetes Platform (KKP)-controller-manager update the etcd to normal state, un-pause it.

# set cluster.spec.pause=false
kubectl edit cluster xxxxxxxxxx

Delete etcd-Pods

As the rolling-update of the etcd won’t finish, all etcd pods must be manually.

kubectl -n cluster-xxxxxxxxxx delete pod -l app=etcd