Known Issues

This page documents the list of known issues in Kubermatic KubeOne along with possible workarounds and recommendations.

This list applies to KubeOne 1.6 releases. For KubeOne 1.5, please consider the v1.5 version of this document. For earlier releases, please consult the appropriate changelog.

node-role.kubernetes.io/master taint not removed on upgrade when using KubeOne 1.6.0-rc.1

StatusFixed in KubeOne 1.6.0
SeverityCritical
GitHub issuehttps://github.com/kubermatic/kubeone/pull/2688

Users who:

  • used KubeOne 1.6.0-rc.1 or built KubeOne manually on commit up to 8291a9f, AND
  • provisioned clusters running Kubernetes 1.25 OR upgraded clusters running Kubernetes 1.24 to Kubernetes 1.25

are affected by this issue.

Description

Kubernetes removed the node-role.kubernetes.io/master taint in 1.25. However, we had a bug in KubeOne that enforced this taint up until Kubernetes 1.26. Even if we don’t put that taint for 1.26 clusters, Kubeadm is not going to remove it upon upgrading to 1.26. That’s because the migration logic that was removing that taint has been already removed in 1.26.

Recommendation

If you’re affected by this issue, you have to manually untaint affected control plane nodes. You can do that by using the following command:

kubectl taint nodes node-role.kubernetes.io/master- --all

Not doing so might cause a major outage as we (both KubeOne and Kubeadm) stop tolerating the node-role.kubernetes.io/master taint.

Cilium CNI is not working on clusters running CentOS 7

StatusKnown Issue
SeverityLow
GitHub issueN/A

Description

Cilium CNI is not supported on CentOS 7 because it’s using too older kernel version which is not supported by Cilium itself. For more details, consider the official Cilium documentation.

Recommendation

Please consider using an operating system with a newer kernel version, such as Ubuntu, Rocky Linux, and Flatcar. See the official Cilium documentation for a list of operating systems and versions supported by Cilium.

Pod connectivity is broken for Calico VXLAN clusters

StatusBeing Investigated
SeverityHigh for clusters using Calico VXLAN addon
GitHub issuehttps://github.com/kubermatic/kubeone/issues/2192

Description

Clusters running Calico VXLAN might not be able to reach ClusterIP Services from a node where the pod is running.

Recommendation

We do NOT recommend upgrading to KubeOne 1.6 and 1.5 at this time if you’re using Calico VXLAN. Follow the linked GitHub issue and this page for updates.

KubeOne is failing to provision a cluster on upgraded Flatcar VMs

StatusWorkaround available
SeverityLow
GitHub issuehttps://github.com/kubermatic/kubeone/issues/2318

Description

KubeOne is failing to provision a cluster on Flatcar VMs that are upgraded from a version prior to 2969.0.0 to a newer version. This only affects VMs that were never used with KubeOne; existing KubeOne clusters are not affected by this issue.

Recommendation

If you’re affected by this issue, we recommend creating VMs with a newer Flatcar version or following the cgroups v2 migration instructions.

Networking issues with Cilium and Systemd based distributions

StatusWorkaround available
SeverityHigh
GitHub issuehttps://github.com/cilium/cilium/issues/18706

Description

A KubeOne clusters with Cilium CNI running on a systemd based distribution can get into an unstable network state. We do not necessarily meet the requirements for systemd based distribution by default.

An update of systemd caused an incompatibility with Cilium. With that change systemd is managing external routes by default. On a change in the network this can cause systemd to delete Cilium owned resources.

Recommendation

  • Adjust systemd manually based on the Cilium requirements.

  • Use a custom OSP and configure systemd:

apiVersion: operatingsystemmanager.k8c.io/v1alpha1
kind: CustomOperatingSystemProfile
metadata:
  name: cilium-ubuntu
  namespace: kubermatic
spec:
  bootstrapConfig:
    files:
      - content:
          inline:
            data: |
              [Network]
              ManageForeignRoutes=no
              ManageForeignRoutingPolicyRules=no              
            encoding: b64
        path: /etc/systemd/networkd.conf
        permissions: 644
    modules:
      runcmd:
        - systemctl restart systemd-networkd.service

Internal Kubernetes endpoints unreachable on vSphere with Cilium/Canal

StatusWorkaround available
SeverityLow
GitHub issuehttps://github.com/cilium/cilium/issues/21801

Description

Symptoms

  • Unable to perform CRUD operations on resources governed by webhooks (e.g. ValidatingWebhookConfiguration, MutatingWebhookConfiguration, etc.). The following error is observed:
Internal error occurred: failed calling webhook "webhook-name": failed to call webhook: Post "https://webhook-service-name.namespace.svc:443/webhook-endpoint": context deadline exceeded
  • Unable to reach internal Kubernetes endpoints from pods/nodes.
  • ICMP is working but TCP/UDP is not.

Cause

On recent enough VMware hardware compatibility version (i.e >=15 or maybe >=14), CNI connectivity breaks because of hardware segmentation offload. cilium-health status has ICMP connectivity working, but not TCP connectivity. cilium-health status may also fail completely.

Recommendation

sudo ethtool -K ens192 tx-udp_tnl-segmentation off
sudo ethtool -K ens192 tx-udp_tnl-csum-segmentation off

These flags are related to the hardware segmentation offload done by the vSphere driver VMXNET3. We have observed this issue for both Cilium and Canal CNI running on Ubuntu 22.04.

We have two options to configure these flags for KubeOne installations:

References