Known Issues

This page documents the list of known issues in Kubermatic KubeOne along with possible workarounds and recommendations.

This list applies to KubeOne 1.5 releases. For earlier releases, please consult the appropriate changelog.

Pod connectivity is broken for Calico VXLAN clusters

StatusBeing Investigated
SeverityHigh for clusters using Calico VXLAN addon
GitHub issuehttps://github.com/kubermatic/kubeone/issues/2192

Description

Clusters running Calico VXLAN might not be able to reach ClusterIP Services from a node where the pod is running.

Recommendation

We do NOT recommend upgrading to KubeOne 1.5 at this time if you’re using Calico VXLAN. Follow the linked GitHub issue and this page for updates.

Canal CNI crashing after upgrading from KubeOne 1.x to 1.5

StatusFixed in KubeOne 1.5.4
SeverityLow; rare issue affecting only clusters created with older KubeOne versions
GitHub issuehttps://github.com/projectcalico/calico/issues/6442

Description

Cluster created with older KubeOne versions might be affected by an issue where Canal (Calico) pods are stuck in CrashLoopBackoff after upgrading to KubeOne 1.5. This is caused by upgrading Canal from an older version to v3.23.

Recommendation

This issue is fixed in Calico v3.23.4+ which is used in KubeOne 1.5.4+

If you’re encountering this issue, we strongly recommend upgrading to the latest KubeOne patch release and running kubeone apply to upgrade you Canal CNI.

Alternatively, workaround is to manually modify the default-ipv4-ippool ippools.crd.projectcalico.org object to add vxlanMode: Never. In this case, see the linked upstream GitHub issue for more details.

KubeOne is failing to provision a cluster on upgraded Flatcar VMs

StatusWorkaround available
SeverityLow
GitHub issuehttps://github.com/kubermatic/kubeone/issues/2318

Description

KubeOne is failing to provision a cluster on Flatcar VMs that are upgraded from a version prior to 2969.0.0 to a newer version. This only affects VMs that were never used with KubeOne; existing KubeOne clusters are not affected by this issue.

Recommendation

If you’re affected by this issue, we recommend creating VMs with a newer Flatcar version or following the cgroups v2 migration instructions.

vSphere CSI webhook certificates are generated with an invalid domain/FQDN

StatusFixed by #2366 in KubeOne 1.5.1
SeverityHigh
GitHub issuehttps://github.com/kubermatic/kubeone/issues/2350

Description

In KubeOne 1.5.0 we moved the vSphere CSI driver from the kube-system namespace to the vmware-system-csi namespace. However, we didn’t update the domain/FQDN in certificates for CSI webhooks to use the new namespace. This causes issues when communicating with the CSI webhooks as described in the GitHub issue.

Recommendation

This issue has been fixed in KubeOne 1.5.1, so we advise upgrading your KubeOne installation to 1.5.1 or newer. You need to run kubeone apply to regenerate certificates after upgrading KubeOne.

CoreDNS PodDisruptionBudget is not cleaned up when disabled

StatusFixed by #2364 in KubeOne 1.5.1
SeverityLow
GitHub issuehttps://github.com/kubermatic/kubeone/issues/2322

Description

If the CoreDNS PodDisruptionBudget is enabled in the KubeOneCluster API, and then disabled, kubeone apply will not remove the PDB object from the cluster.

Recommendation

This issue has been fixed in KubeOne 1.5.1, so we advise upgrading your KubeOne installation to 1.5.1 or newer.

kubeone apply might fail to recover if the SSH connection is interrupted

StatusFixed by #2345 in KubeOne 1.5.1
SeverityLow
GitHub issuehttps://github.com/kubermatic/kubeone/issues/2319

Description

kubeone apply might fail if the SSH connection is interrupted (e.g. VM is restarted while kubeone apply is running).

Recommendation

This issue has been fixed in KubeOne 1.5.1, so we advise upgrading your KubeOne installation to 1.5.1 or newer.

Internal Kubernetes endpoints unreachable on vSphere with Cilium/Canal

StatusWorkaround available
SeverityLow
GitHub issuehttps://github.com/cilium/cilium/issues/21801

Description

Symptoms

  • Unable to perform CRUD operations on resources governed by webhooks (e.g. ValidatingWebhookConfiguration, MutatingWebhookConfiguration, etc.). The following error is observed:
Internal error occurred: failed calling webhook "webhook-name": failed to call webhook: Post "https://webhook-service-name.namespace.svc:443/webhook-endpoint": context deadline exceeded
  • Unable to reach internal Kubernetes endpoints from pods/nodes.
  • ICMP is working but TCP/UDP is not.

Cause

On recent enough VMware hardware compatibility version (i.e >=15 or maybe >=14), CNI connectivity breaks because of hardware segmentation offload. cilium-health status has ICMP connectivity working, but not TCP connectivity. cilium-health status may also fail completely.

Recommendation

sudo ethtool -K ens192 tx-udp_tnl-segmentation off
sudo ethtool -K ens192 tx-udp_tnl-csum-segmentation off

These flags are related to the hardware segmentation offload done by the vSphere driver VMXNET3. We have observed this issue for both Cilium and Canal CNI running on Ubuntu 22.04.

We have two options to configure these flags for KubeOne installations:

References