The CCM/CSI migration is used to migrate your clusters using legacy in-tree
cloud provider (i.e. created without .cloudProvider.external: true
)
to external cloud controller managers (CCMs) and CSI drivers. The CCM/CSI
migration process is currently supported only for the following providers:
This guide provides the context on:
- what are in-tree providers, external CCMs, and CSI drivers
- why and when do you need to migrate
- how to migrate your clusters
- how to troubleshoot failed migration
Support for Cloud Providers in Kubernetes
Cloud providers that want to support advanced Kubernetes use cases can
implement Kubernetes controller(s) that would connect Kubernetes clusters
with their cloud provider platform/API.
There are three common controllers:
- Node controller - annotates Node objects with the information about the
instance (e.g. instance size, region, availability zone) including IP
addresses. It also deletes the Node object when the instance is removed
- Route controller - configures routes in the cloud appropriately so that
containers on different nodes in your Kubernetes cluster can communicate with
each other.
- Service controller - providers support for LoadBalancer Services backed
by cloud provider’s Load Balancing offering
In addition, cloud providers can implement controllers for volume operations
so that cloud provider’s storage offerings can be used as Kubernetes volumes.
The cloud provider can choose which controllers they want to implement.
In other words, they don’t need to implement all those controllers, and they
can also implement additional controllers.
In-tree Cloud Providers
In the early days of Kubernetes, those controllers were implemented as part of
kube-controller-manager. Those controllers were named as in-tree cloud
providers, where the in-tree part was referring to the fact that the code
for those controllers was part of the main Kubernetes repository.
This approach worked in the beginning when only very few cloud providers
supported Kubernetes. However, as number of providers that support Kubernetes
increased, many problems appeared:
- Having many in-tree cloud providers integrated in kube-controller-manager
increases the binary size, while you don’t actually need all providers
- All providers had to follow the Kubernetes release cycle. If they wanted to
change anything in controllers (e.g. they changed the API, added a new
feature), they would have to wait for a new Kubernetes release to bring
changes to their users
- Testing and maintaining all the controllers became very complicated
Therefore, it has been decided to deprecate and remove in-tree cloud
providers in favor of external cloud controller managers and CSI drivers.
The in-tree cloud providers are disabled as of Kubernetes 1.29, and
permanently removed in Kubernetes 1.30.
External Cloud Controller Managers (CCMs) and CSI Drivers
External Cloud Controller Manager (CCM) is a set of controllers implemented
by cloud providers. Those controllers are not anymore part of the main
Kubernetes repository, instead, cloud providers can host them wherever they
want. Those controllers also have the same purpose and tasks as in-tree cloud
provider controllers. External CCMs are deployed like any other Kubernetes
workload (e.g. using Deployments or DaemonSets), however, other control plane
components must be configured properly to utilize external CCMs.
However, the controllers for volume operations are not part of the external
CCM. Instead, it has been decided to use Container Storage Interface (CSI)
for handling volume operations. CSI drivers (or also called plugins) are CSI
implementations that are handling all the volume operations, which was
originally done by kube-controller-manager. Similar to external CCMs, CSI
drivers are also deployed like any other Kubernetes workload.
Who Should Migrate?
If you do not have this section in your KubeOneCluster manifest, then
you’re running in-tree cloud provider and you need to migrate.
...
cloudProvider:
external: true
...
We recommend migrating as soon as possible if your provider is supported
because many in-tree providers are not maintained any longer, so new features
and improvements are only added to the external CCMs and CSI drivers.
Migration Prerequisites
Make sure to familiarize yourself with requirements for external CCM and CSI
drivers. Those requirements are provided by cloud providers and you can usually
find them in the repositories for each components:
- Azure: there are no special prerequisites for Azure CCM and CSI drivers
- GCE: there are no special prerequisites for Azure CCM and CSI drivers
Migrating Your Clusters
The migration is done in two phases as described below. Before the migration,
you need to update your KubeOneCluster manifest as appropriate.
Phase 0 — Preparing your KubeOneCluster manifest
You must set .cloudProvider.external
to true
, so KubeOne can deploy
external CCM and CSI. For example:
apiVersion: kubeone.k8c.io/v1beta2
kind: KubeOneCluster
versions:
kubernetes: 1.29.4
cloudProvider:
openstack: {}
external: true
cloudConfig: |
...
In addition to that, specific cloud providers might require additional
configuration.
Azure and GCE
In general, no addition configuration or changes are needed for Azure and
GCE, but make sure to check the documents linked in the Migration
Prerequisites section.
Phase 1 — Deploying External CCM and CSI Plugin, with in-tree cloud provider enabled
The first phase assumes deploying the external CCM and CSI plugin, while
leaving in-tree provider enabled. Kubernetes API server and
kube-controller-manager are configured to:
- use controllers integrated in external CCM instead of in-tree cloud provider
for all cloud-related operations
- redirect all volumes-related operations to the CSI plugin
The existing worker nodes will continue to use in-tree provider, and that’s why
we still leave it enabled on API server and kube-controller-manager.
Therefore, all worker nodes managed by machine-controller must be rolled out
after phase 1 is complete.
To start the first phase of the migration, run the following command. You’ll
be asked to confirm your intention by typing yes
.
kubeone migrate to-ccm-csi --manifest kubeone.yaml -t tf.json
This command might take 5-10 minutes to finish. After it’s done, you need
to roll out all your worker nodes managed by machine-controller, so they start
using external CCM and CSI.
The Rolling Restart MachineDeploments document describes possible approaches
for rotating worker nodes. You can use the following command to rotate all
worker nodes at the same time. For additional approaches, please check the
document.
forceRestartAnnotations="{\"spec\":{\"template\":{\"metadata\":{\"annotations\":{\"forceRestart\":\"$(date +%s)\"}}}}}"
for md in $(kubectl get machinedeployments -n kube-system --no-headers | awk '{print $1}'); do
kubectl patch machinedeployment -n kube-system $md --type=merge -p $forceRestartAnnotations
done
You can watch the progress by running kubectl get nodes
. Make sure that all
worker nodes are rotated before proceeding to the next phase.
Phase 2 — Completely disabling in-tree cloud provider
The CCM/CSI migration is completed by fully-disabling in-tree provider.
To trigger the phase 2, users first need to rotate all their worker nodes
managed by machine-controller as described in the previous phase.
After all worker nodes are rotated, you can run the following command to
complete the migration:
kubeone migrate to-ccm-csi --complete --manifest kubeone.yaml -t tf.json
This command might take up to 5 minutes to finish. After this command is done,
the CCM/CSI migration is fully-completed. Congratulations!
Alternatives to the CCM/CSI migration
Alternative to the CCM/CSI migration is recreating the cluster from scratch
with .cloudProvider.external
enabled from the beginning. While that approach
would give you a fresh cluster, it might be complicated because:
- you need to recreate all the resources and re-deploy the workload. This might
cause some downtime, and it can be very complicated in case you have stateful
workload
- you need to manually migrate existing PersistentVolumes to the new cluster
Restoring from a backup in case of recreating the cluster is not an option
because that might also restore the old configuration and eventually break the
cluster.