Upgrading to KKP 2.26 is only supported from version 2.25. Do not attempt to upgrade from versions prior to that and apply the upgrade step by step over minor versions instead (e.g. from 2.24 to 2.25 and then to 2.26). It is also strongly advised to be on the latest 2.25.x patch release before upgrading to 2.26.
This guide will walk you through upgrading Kubermatic Kubernetes Platform (KKP) to version 2.26. For the full list of changes in this release, please check out the KKP changelog for v2.26. Please read the full document before proceeding with the upgrade.
Pre-Upgrade Considerations
Please review known issues before upgrading to understand if any issues might affect you.
KKP 2.26 adjusts the list of supported Kubernetes versions and removes support for Kubernetes 1.27. Existing user clusters need to be migrated to 1.28 or later before the KKP upgrade can begin.
Helm Chart Versioning
Beginning with KKP 2.26, Helm chart versions now use strict semvers without a leading “v” (i.e. 1.2.3
instead of v1.2.3
). This change was made to improve compatibility with GitOps tooling that is very strict. The Git tags and container tags have not changed.
Helm Chart Upgrades
KKP 2.26 ships a lot of major version upgrades for the Helm charts, most notably
- Loki & Promtail v2.5 to v2.9.x
- Grafana 9.x to 10.4.x
Some of these updates require manual intervention or at least checking whether a given KKP system is affected by upstream changes. Please read the following sections carefully before beginning the upgrade.
Velero 1.14
Velero was updated from 1.10 to 1.14, which includes a number of significant improvements internally. In the same breath, KKP also replaced the custom Helm chart with the official upstream Helm chart for Velero.
Due to labelling changes, and in-place upgrade of Velero is not possible. It’s recommended to delete the previous Helm release and install the chart fresh. Existing backups will not be affected by the switch, but Velero configuration created by Helm will be deleted and re-created (like BackupStorageLocations
).
Configuration Changes
The switch to the upstream Helm chart requires adjusting the values.yaml
used to install Velero. Most existing settings have a 1:1 representation in the new chart:
velero.podAnnotations
is now velero.annotations
velero.serverFlags
is now velero.configuration.*
(each CLI flag is its own field in the YAML file, e.g. serverFlags:["--log-format=json"]
would become configuration.logFormat: "json"
)velero.uploaderType
is now velero.configuration.uploaderType
; note that the default has changed from restic to Kopia, see the next section below for more information.velero.credentials
is now velero.credentials.*
velero.schedulesPath
is not available anymore, since putting additional files into a Helm chart before installing it is a rather unusual process. Instead, specify the desired schedules directly inside the values.yaml
in velero.schedules
velero.backupStorageLocations
is now velero.configuration.backupStorageLocation
velero.volumeSnapshotLocations
is now velero.configuration.volumeSnapshotLocation
velero.defaultVolumeSnapshotLocations
is now velero.configuration.defaultBackupStorageLocation
velero:
podAnnotations:
iam.amazonaws.com/role: arn:aws:iam::1234:role/velero
backupStorageLocations:
aws:
provider: aws
objectStorage:
bucket: myclusterbackups
config:
region: eu-west-1
# optionally define some of your volumeSnapshotLocations as the default;
# each element in the list must be a string of the form "provider:location"
defaultVolumeSnapshotLocations:
- aws:aws
# see https://velero.io/docs/v1.10/api-types/volumesnapshotlocation/
volumeSnapshotLocations:
aws:
provider: aws
config:
region: eu-west-1
uploaderType: restic
serverFlags:
- --log-format=json
credentials:
aws:
accessKey: itsme
secretKey: andthisismypassword
schedulesPath: schedules/*
velero:
annotations:
iam.amazonaws.com/role: arn:aws:iam::1234:role/velero
# schedules are no longer loaded from external files, but must be included inline
schedules:
hourly-cluster:
schedule: 0 * * * *
template:
includeClusterResources: true
includedNamespaces:
- '*'
snapshotVolumes: false
ttl: 168h # 7 days
configuration:
uploaderType: restic
logFormat: json
backupStorageLocation:
- name: aws
provider: aws
bucket: myclusterbackups
config:
region: eu-west-1
volumeSnapshotLocation:
- name: aws
provider: aws
config:
region: eu-west-1
defaultBackupStorageLocation: aws
credentials:
useSecret: true
name: aws-credentials
secretContents:
cloud: |
[default]
aws_access_key_id=itsme
aws_secret_access_key=andthisismypassword
Kopia replaces restic
The default file backup solution in Velero is now Kopia, replacing the previous implementation using restic. From a Velero user perspective this can be seen as an implementation detail and commands like velero backup create
will continue to work as before. However, Kopia’s data is stored in a new repository inside the backup storage location (for example if you used restic before and now switch to Kopia and use and S3 bucket for storage, you would end up with 3 directories in the bucket: backups
, restic
and kopia
).
When migrating to Kopia, new backups will be made using it, but existing backups made using restic can still be restored. Once no old restic backups are required anymore, the restic
directory in the backup storage can be deleted.
KKP’s wrapper velero
Helm chart is not configuring the backup tool, so it defaults to Kopia. If you wish to continue using restic, set velero.uploaderType
to restic
in your Helm values.yaml
file. Note that restic support will eventually be removed from Velero, so a switch will be necessary at some point.
If you decide to switch to Kopia and do not need the restic repository anymore, consider rotating the repository password by updating the velero-repo-credentials
Secret in the velero
namespace. This should only be done before a new repository is created by Velero and changing it will also make all previously created repositories unavailable.
cert-manager 1.14
The configuration syntax for cert-manager has changed slightly.
- Breaking: If you have
.featureGates
value set in values.yaml
, the features defined there will no longer be passed to cert-manager webhook, only to cert-manager controller. Use the webhook.featureGates
field instead to define features to be enabled on webhook. - Potentially breaking: Webhook validation of CertificateRequest resources is stricter now: all
KeyUsages
and ExtendedKeyUsages
must be defined directly in the CertificateRequest resource, the encoded CSR can never contain more usages that defined there.
oauth2-proxy (IAP) 7.6
This upgrade includes one breaking change:
- A change to how auth routes are evaluated using the flags
skip-auth-route
/skip-auth-regex
: the new behaviour uses the regex you specify to evaluate the full path including query parameters. For more details please read the detailed PR description. - The environment variable
OAUTH2_PROXY_GOOGLE_GROUP
has been deprecated in favor of OAUTH2_PROXY_GOOGLE_GROUPS
. Next major release will remove this option.
Loki & Promtail 2.9 (Seed MLA)
The Loki upgrade from 2.5 to 2.9 might be the most significant bump in this KKP release. Due to the amount of changes, it’s necessary to delete the existing loki
StatefulSet and letting Helm recreate it. Deleting the StatefulSet will not touch the PVCs and the new StatefulSet’s pods will reuse the existing PVCs after the upgrade.
Before upgrading, review your values.yaml
for Loki, as a number of syntax changes were made:
- Most importantly,
loki.config
is now a templated string that aggregates many other individual values specified in loki
, for example loki.tableManager
gets rendered into loki.config.table_manager
, and loki.loki.schemaConfig
gets rendered into loki.config.schema_config
. To follow these changes, if you have loki.config
in your values.yaml
, rename it to loki.loki
. Ideally you should not need to manually override the templating string in loki.config
from the upstream chart anymore. Additionally, some values are moved out or renamed slightly:loki.config.schema_config
becomes loki.loki.schemaConfig
loki.config.table_manager
becomes loki.tableManager
(sic)loki.config.server
was removed, if you need to specify something, use loki.loki.server
- The base volume path for the Loki PVC was changed from
/data/loki
to /var/loki
. - Configuration for the default image has changed, there is no
loki.image.repository
field anymore, it’s now loki.image.registry
and loki.image.repository
. loki.affinity
is now a templated string and enabled by default; if you use multiple Loki replicas, your cluster needs to have multiple nodes to host these pods.- All fields related to the Loki pod (
loki.tolerations
, loki.resources
, loki.nodeSelector
etc.) were moved below loki.singleBinary
. - Self-monitoring, Grafana Agent and selftests are disabled by default now, reducing the default resource requirements for the logging stack.
loki.singleBinary.persistence.enableStatefulSetAutoDeletePVC
is set to false
to ensure that when the StatefulSet is deleted, the PVCs will not also be deleted. This allows for easier upgrades in the
future, but if you scale down Loki, you would have to manually deleted the leftover PVCs.
Alertmanager 0.27 (Seed MLA)
This version removes the v1
API which was deprecated since 2019. If you have custom integrations with Alertmanager, ensure none of them use the now removed API.
blackbox-exporter 0.25 (Seed MLA)
This version changes the proxy_connect_header
configuration structure to match Prometheus (see PR); update your values.yaml
accordingly if you configured this option.
helm-exporter 1.2.16 (Seed MLA)
KKP 2.26 removes the custom Helm chart and instead now reuses the official upstream chart. Before upgrading you must delete the existing Helm release in your cluster:
$ helm --namespace monitoring delete helm-exporter
Afterwards you can install the new release from the chart.
kube-state-metrics 2.12 (Seed MLA)
As is typical for kube-state-metrics, the upgrade simple, but the devil is in the details. There were many minor changes since v2.8, please review the changelog carefully if you built upon metrics provided by kube-state-metrics:
- The deprecated experimental VerticalPodAutoscaler metrics are no longer supported, and have been removed. It’s recommend to use CustomResourceState metrics to gather metrics from custom resources like the Vertical Pod Autoscaler.
- Label names were regulated to adhere with OTel-Prometheus standards, so existing label names that do not follow the same may be replaced by the ones that do. Please refer to the PR for more details.
- Label and annotation metrics aren’t exposed by default anymore to reduce the memory usage of the default configuration of kube-state-metrics. Before this change, they used to only include the name and namespace of the objects which is not relevant to users not opting in these metrics.
node-exporter 1.7 (Seed MLA)
This new version comes with a few minor backwards-incompatible changes:
- metrics of offline CPUs in CPU collector were removed
- bcache cache_readaheads_totals metrics were removed
- ntp collector was deprecated
- supervisord collector was deprecated
Prometheus 2.51 (Seed MLA)
Prometheus had many improvements and some changes to the remote-write functionality that might affect you:
- Remote-write:
- raise default samples per send to 2,000
- respect
Retry-After
header on 5xx errors - error
storage.ErrTooOldSample
is now generating HTTP error 400 instead of HTTP error 500
- Scraping:
- Do experimental timestamp alignment even if tolerance is bigger than 1% of scrape interval
nginx-ingress-controller 1.10
nginx v1.10 brings quite a few potentially breaking changes:
- does not support chroot image (this will be fixed on a future minor patch release)
- dropped Opentracing and zipkin modules, just Opentelemetry is supported as of this release
- dropped support for PodSecurityPolicy
- dropped support for GeoIP (legacy), only GeoIP2 is supported
- The automatically generated
NetworkPolicy
from nginx 1.9.3 is now disabled by default, refer to https://github.com/kubernetes/ingress-nginx/pull/10238 for more information.
Dex 2.40
The validation of username and password in the LDAP connector is much more strict now. Dex uses the EscapeFilter function to check for special characters in credentials and prevent injections by denying such requests. Please ensure this is not an issue before upgrading.
Additionally, the custom oauth
Helm chart in KKP has been deprecated and will be replaced with a new Helm chart, dex
, which is based on the official upstream chart, in KKP 2.27.
Upgrade Procedure
Before starting the upgrade, make sure your KKP Master and Seed clusters are healthy with no failing or pending Pods. If any Pod is showing problems, investigate and fix the individual problems before applying the upgrade. This includes the control plane components for user clusters, unhealthy user clusters should not be submitted to an upgrade.
KKP Master Upgrade
Download the latest 2.26.x release archive for the correct edition (ce
for Community Edition, ee
for Enterprise Edition) from the release page and extract it locally on your computer. Make sure you have the values.yaml
you used to deploy KKP 2.26 available and already adjusted for any 2.26 changes (also see Pre-Upgrade Considerations), as you need to pass it to the installer. The KubermaticConfiguration
is no longer necessary (unless you are adjusting it), as the KKP operator will use its in-cluster representation. From within the extracted directory, run the installer:
$ ./kubermatic-installer deploy kubermatic-master --helm-values path/to/values.yaml
# example output for a successful upgrade
INFO[0000] 🚀 Initializing installer… edition="Enterprise Edition" version=v2.26.0
INFO[0000] 🚦 Validating the provided configuration…
WARN[0000] Helm values: kubermaticOperator.imagePullSecret is empty, setting to spec.imagePullSecret from KubermaticConfiguration
INFO[0000] âś… Provided configuration is valid.
INFO[0000] 🚦 Validating existing installation…
INFO[0001] Checking seed cluster… seed=kubermatic
INFO[0001] âś… Existing installation is valid.
INFO[0001] 🛫 Deploying KKP master stack…
INFO[0001] 💾 Deploying kubermatic-fast StorageClass…
INFO[0001] âś… StorageClass exists, nothing to do.
INFO[0001] 📦 Deploying nginx-ingress-controller…
INFO[0001] Deploying Helm chart…
INFO[0002] Updating release from 2.25.3 to 2.26.0…
INFO[0005] âś… Success.
INFO[0005] 📦 Deploying cert-manager…
INFO[0005] Deploying Custom Resource Definitions…
INFO[0006] Deploying Helm chart…
INFO[0007] Updating release from 2.25.3 to 2.26.0…
INFO[0026] âś… Success.
INFO[0026] 📦 Deploying Dex…
INFO[0027] Updating release from 2.25.3 to 2.26.0…
INFO[0030] âś… Success.
INFO[0030] 📦 Deploying Kubermatic Operator…
INFO[0030] Deploying Custom Resource Definitions…
INFO[0034] Deploying Helm chart…
INFO[0035] Updating release from 2.25.3 to 2.26.0…
INFO[0064] âś… Success.
INFO[0064] 📦 Deploying Telemetry
INFO[0065] Updating release from 2.25.3 to 2.26.0…
INFO[0066] âś… Success.
INFO[0066] 📡 Determining DNS settings…
INFO[0066] The main LoadBalancer is ready.
INFO[0066]
INFO[0066] Service : nginx-ingress-controller / nginx-ingress-controller
INFO[0066] Ingress via hostname: <Load Balancer>.eu-central-1.elb.amazonaws.com
INFO[0066]
INFO[0066] Please ensure your DNS settings for "<KKP FQDN>" include the following records:
INFO[0066]
INFO[0066] <KKP FQDN>. IN CNAME <Load Balancer>.eu-central-1.elb.amazonaws.com.
INFO[0066] *.<KKP FQDN>. IN CNAME <Load Balancer>.eu-central-1.elb.amazonaws.com.
INFO[0066]
INFO[0066] 🛬 Installation completed successfully. ✌
Upgrading seed clusters is not necessary, unless you are running the minio
Helm chart or User Cluster MLA as distributed by KKP on them. They will be automatically upgraded by KKP components.
You can follow the upgrade process by either supervising the Pods on master and seed clusters (by simply checking kubectl get pods -n kubermatic
frequently) or checking status information for the Seed
objects. A possible command to extract the current status by seed would be:
$ kubectl get seeds -A -o jsonpath="{range .items[*]}{.metadata.name} - {.status}{'\n'}{end}"
kubermatic - {"clusters":5,"conditions":{"ClusterInitialized":{"lastHeartbeatTime":"2024-03-11T10:53:34Z","message":"All KKP CRDs have been installed successfully.","reason":"CRDsUpdated","status":"True"},"KubeconfigValid":{"lastHeartbeatTime":"2024-03-11T16:50:09Z","reason":"KubeconfigValid","status":"True"},"ResourcesReconciled":{"lastHeartbeatTime":"2024-03-11T16:50:14Z","reason":"ReconcilingSuccess","status":"True"}},"phase":"Healthy","versions":{"cluster":"v1.27.11","kubermatic":"v2.25.0"}}
Of particular interest to the upgrade process is if the ResourcesReconciled
condition succeeded and if the versions.kubermatic
field is showing the target KKP version. If this is not the case yet, the upgrade is still in flight. If the upgrade is stuck, try kubectl -n kubermatic describe seed <seed name>
to see what exactly is keeping the KKP Operator from updating the Seed cluster.
Post-Upgrade Considerations
Deprecations and Removals
Some functionality of KKP has been deprecated or removed with KKP 2.26. You should review the full changelog and adjust any automation or scripts that might be using deprecated fields or features. Below is a list of changes that might affect you:
Next Steps