Cluster Reconciliation (apply)
The cluster reconciliation process determines the actual state of the cluster
and takes actions based on the difference between actual and expected states.
The reconciliation is capable of automatically installing, upgrading, and
repairing the cluster. On top of that, reconciliation can change some cluster
properties, such as apply addons and create new worker nodes.
More information about the technical implementation can be found in the
KubeOne Reconciliation Process (kubeone apply) proposal.
Using kubeone apply
Before getting started, make sure you have up-to-date Terraform output
file if you’re using the Terraform integration:
terraform output -json > tf.json
The cluster reconciliation is implemented under the kubeone apply
command
which has similar semantics as other KubeOne commands.
You can use the following command:
kubeone apply --manifest config.yaml -t .
For more details about available flags and properties, run the command with
the -h
flag:
By default, the apply
command expects the user confirmation before taking
any action. If you are running kubeone apply
in a script and/or without
terminal and ability to confirm actions, you can use the --auto-approve
flag:
kubeone apply --manifest config.yaml -t tf.json --auto-approve
Reconciliation Process
The reconciliation process is explained in details in the
technical proposal. This section includes important details
about the reconciliation process that you should be aware of.
The actual state of the cluster is determined by running a set of probes
(Bash scripts and Kubernetes API requests) on the cluster instances.
The expected state is defined with the KubeOne configuration manifest, and
optionally with the Terraform state.
The general reconciliation workflow is based on:
- If cluster is not provisioned, run the installation process
(
kubeone install
). - If the cluster is provisioned:
- If there are unhealthy control plane nodes, try to repair the cluster
and/or instruct the operator about the needed steps.
- If there are not provisioned or unhealthy static worker nodes, try
to provision/repair them and/or instruct the operator about the
needed steps.
- If all control plane and static worker nodes are healthy:
- If upgrade is needed (mismatch between expected and actual Kubernetes
versions), run the upgrade process (
kubeone upgrade
).- If there are new MachineDeployments, create them.
Repairing Unhealthy Clusters
The apply command has ability to detect is cluster in an unhealthy
state. The cluster is considered unhealthy if there’s at least one
node that’s unhealthy, which can happen if:
- kubelet is failing or not running
- API server is failing or unhealthy
- etcd is failing or unhealthy
In such a case, the operator needs to manually delete a broken instance,
update the KubeOne configuration file to remove the old instance and add
a new one, and then run the apply command again.
If Terraform is used for managing the infrastructure, the taint
command
can be used to mark instance for recreation. Running terraform apply
the
next time would recreate the instance. Make sure to update the Terraform output
file by running terraform apply
again. For example:
terraform taint 'aws_instance.control_plane[<index-of-instance>]'
terraform apply
terraform output -json > tf.json
If there are multiple unhealthy instances, it might be required to replace
and repair instance by instance in order to maintain the etcd quorum. KubeOne
recommends which instances are safe to be deleted without affecting the quorum.
It’s strongly advised to follow the order or otherwise you’re risking losing
the quorum and all cluster data. If it’s not possible to repair the cluster
without affecting the quorum, KubeOne will fail to repair the cluster. In that
case, disaster recovery might be required.
Dynamic Workers (MachineDeployments) Reconciliation
The apply command doesn’t modify or delete existing MachineDeployments.
The MachineDeployments are created by the apply command only if there’s
another action to be taken, such as install or upgrade. Managing
MachineDeployments should be done by the operator either by using kubectl or
the Kubernetes API directly.
To make managing MachineDeployments easier, the operator can generate the
manifest containing all MachineDeployments defined in the KubeOne
configuration by using the kubeone config machinedeployments
command:
kubeone config machinedeployments --manifest config.yaml -t tf.json
Static Workers Reconciliation
The apply command doesn’t remove or unprovision the static worker
nodes. That can be done by removing the appropriate instance manually.
If there is CCM (cloud-controller-manager) running in the cluster, the Node for
the removed worker should be deleted automatically. If there’s no CCM, you can
remove the Node object manually using kubectl.
Features Reconciliation
Currently, the apply command doesn’t reconcile features. If you enable/disable
any feature on already provisioned cluster, you have to run the upgrade process
for changes to be in the effect. The upgrade process can be run using the
following upgrade command:
kubeone upgrade --manifest config.yaml -t . --force