Customize Operating System Profiles

One of the unique selling propositions for OSM is that it allows users to use custom OperatingSystemProfiles(OSPs). OSPs can be modified for various use cases; making them compatible with your infrastructure, installing and configuring custom/additional packages, customizing configurations for kubelet, container runtime, networking, storage etc.

For customization, new dedicated OSPs should be created by the end users. Updating the “default” OSPs is not recommended since they are managed by OSM itself and any changes will be reverted automatically.

Create a custom OSP

As an example, let’s modify the default OSP for ubuntu and enable swap memory.

apiVersion: operatingsystemmanager.k8c.io/v1alpha1
kind: OperatingSystemProfile
metadata:
  name: osp-ubuntu-swap-enabled
  namespace: kube-system
spec:
  osName: "ubuntu"
  osVersion: "20.04"
  version: "v1.0.0"
  provisioningUtility: "cloud-init"
  supportedCloudProviders:
    - name: "aws"

  bootstrapConfig:
    files:
      - path: /opt/bin/bootstrap
        permissions: 755
        content:
          inline:
            encoding: b64
            data: |
              #!/bin/bash
              set -xeuo pipefail

              export DEBIAN_FRONTEND=noninteractive
              apt update && apt install -y curl jq
              curl -s -k -v --header 'Authorization: Bearer {{ .Token }}'	{{ .ServerURL }}/api/v1/namespaces/cloud-init-settings/secrets/{{ .SecretName }} | jq '.data["cloud-config"]' -r| base64 -d > /etc/cloud/cloud.cfg.d/{{ .SecretName }}.cfg
              cloud-init clean

              {{- /* The default cloud-init configurations files have a bug on Digital Ocean that causes the machine to be in-accessible on the 2nd cloud-init and in case of Hetzner, ipv6 addresses are missing. Hence we disable network configuration. */}}
              {{- if (or (eq .CloudProviderName "digitalocean") (eq .CloudProviderName "hetzner")) }}
              rm /etc/netplan/50-cloud-init.yaml
              echo "network: {config: disabled}" > /etc/cloud/cloud.cfg.d/99-custom-networking.cfg
              {{- end }}

              cloud-init --file /etc/cloud/cloud.cfg.d/{{ .SecretName }}.cfg init
              systemctl daemon-reload

              {{- if eq .CloudProviderName "digitalocean" }}
              netplan generate
              netplan apply
              {{- end }}

              systemctl daemon-reload
              systemctl restart setup.service

      - path: /etc/systemd/system/bootstrap.service
        permissions: 644
        content:
          inline:
            encoding: b64
            data: |
              [Install]
              WantedBy=multi-user.target

              [Unit]
              Requires=network-online.target
              After=network-online.target
              [Service]
              Type=oneshot
              RemainAfterExit=true
              ExecStart=/opt/bin/bootstrap

    modules:
      runcmd:
        - systemctl restart bootstrap.service
        - systemctl daemon-reload

  provisioningConfig:
    supportedContainerRuntimes:
      - name: containerd
        files:
          - path: /etc/systemd/system/containerd.service.d/environment.conf
            content:
              inline:
                data: |
                  [Service]
                  Restart=always
                  EnvironmentFile=-/etc/environment

          - path: /etc/crictl.yaml
            content:
              inline:
                data: |
                  runtime-endpoint: unix:///run/containerd/containerd.sock

          - path: /etc/containerd/config.toml
            permissions: 600
            content:
              inline:
                encoding: b64
                data: |
                  {{ .ContainerRuntimeConfig }}
        templates:
          containerRuntimeInstallation: |-
            apt-get update
            apt-get install -y apt-transport-https ca-certificates curl software-properties-common lsb-release
            curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
            add-apt-repository "deb https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"

            apt-get install -y --allow-downgrades containerd.io=1.5*
            apt-mark hold containerd.io

            systemctl daemon-reload
            systemctl enable --now containerd

    templates:
      safeDownloadBinariesScript: |-
        {{- /* setup some common directories */}}
        opt_bin=/opt/bin
        usr_local_bin=/usr/local/bin
        cni_bin_dir=/opt/cni/bin

        {{- /* create all the necessary dirs */}}
        mkdir -p /etc/cni/net.d /etc/kubernetes/manifests "$opt_bin" "$cni_bin_dir"
        {{- /* HOST_ARCH can be defined outside of machine-controller (in kubeone for example) */}}
        arch=${HOST_ARCH-}
        if [ -z "$arch" ]
        then
        case $(uname -m) in
        x86_64)
            arch="amd64"
            ;;
        aarch64)
            arch="arm64"
            ;;
        *)
            echo "unsupported CPU architecture, exiting"
            exit 1
            ;;
        esac
        fi

        {{- /* # CNI variables */}}
        CNI_VERSION="${CNI_VERSION:-v0.8.7}"
        cni_base_url="https://github.com/containernetworking/plugins/releases/download/$CNI_VERSION"
        cni_filename="cni-plugins-linux-$arch-$CNI_VERSION.tgz"

        {{- /* download CNI */}}
        curl -Lfo "$cni_bin_dir/$cni_filename" "$cni_base_url/$cni_filename"

        {{- /* download CNI checksum */}}
        cni_sum=$(curl -Lf "$cni_base_url/$cni_filename.sha256")
        cd "$cni_bin_dir"

        {{- /* verify CNI checksum */}}
        sha256sum -c <<<"$cni_sum"

        {{- /* unpack CNI */}}
        tar xvf "$cni_filename"
        rm -f "$cni_filename"
        cd -

        {{- /* # cri-tools variables */}}
        CRI_TOOLS_RELEASE="${CRI_TOOLS_RELEASE:-v1.22.0}"
        cri_tools_base_url="https://github.com/kubernetes-sigs/cri-tools/releases/download/${CRI_TOOLS_RELEASE}"
        cri_tools_filename="crictl-${CRI_TOOLS_RELEASE}-linux-${arch}.tar.gz"

        {{- /* download cri-tools */}}
        curl -Lfo "$opt_bin/$cri_tools_filename" "$cri_tools_base_url/$cri_tools_filename"

        {{- /* download cri-tools checksum */}}
        {{- /* the cri-tools checksum file has a filename prefix that breaks sha256sum so we need to drop it with sed */}}
        cri_tools_sum=$(curl -Lf "$cri_tools_base_url/$cri_tools_filename.sha256" | sed 's/\*\///')
        cd "$opt_bin"

        {{- /* verify cri-tools checksum */}}
        sha256sum -c <<<"$cri_tools_sum"

        {{- /* unpack cri-tools and symlink to path so it's available to all users */}}
        tar xvf "$cri_tools_filename"
        rm -f "$cri_tools_filename"
        ln -sf "$opt_bin/crictl" "$usr_local_bin"/crictl || echo "symbolic link is skipped"
        cd -

        {{- /* kubelet */}}
        KUBE_VERSION="${KUBE_VERSION:-{{ .KubeVersion }}}"
        kube_dir="$opt_bin/kubernetes-$KUBE_VERSION"
        kube_base_url="https://storage.googleapis.com/kubernetes-release/release/$KUBE_VERSION/bin/linux/$arch"
        kube_sum_file="$kube_dir/sha256"

        {{- /* create versioned kube dir */}}
        mkdir -p "$kube_dir"
        : >"$kube_sum_file"

        for bin in kubelet kubeadm kubectl; do
            {{- /* download kube binary */}}
            curl -Lfo "$kube_dir/$bin" "$kube_base_url/$bin"
            chmod +x "$kube_dir/$bin"

            {{- /* download kube binary checksum */}}
            sum=$(curl -Lf "$kube_base_url/$bin.sha256")

            {{- /* save kube binary checksum */}}
            echo "$sum  $kube_dir/$bin" >>"$kube_sum_file"
        done

        {{- /* check kube binaries checksum */}}
        sha256sum -c "$kube_sum_file"

        for bin in kubelet kubeadm kubectl; do
            {{- /* link kube binaries from verioned dir to $opt_bin */}}
            ln -sf "$kube_dir/$bin" "$opt_bin"/$bin
        done

      configureProxyScript: |-
        {{- if .HTTPProxy }}
        cat <<EOF | tee -a /etc/environment
        HTTP_PROXY={{ .HTTPProxy }}
        http_proxy={{ .HTTPProxy }}
        HTTPS_PROXY={{ .HTTPProxy }}
        https_proxy={{ .HTTPProxy }}
        NO_PROXY={{ .NoProxy }}
        no_proxy={{ .NoProxy }}
        EOF
        {{- end }}

    files:
      - path: /opt/bin/health-monitor.sh
        permissions: 755
        content:
          inline:
            encoding: b64
            data: |
              #!/usr/bin/env bash

              # Copyright 2016 The Kubernetes Authors.
              #
              # Licensed under the Apache License, Version 2.0 (the "License");
              # you may not use this file except in compliance with the License.
              # You may obtain a copy of the License at
              #
              #     http://www.apache.org/licenses/LICENSE-2.0
              #
              # Unless required by applicable law or agreed to in writing, software
              # distributed under the License is distributed on an "AS IS" BASIS,
              # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
              # See the License for the specific language governing permissions and
              # limitations under the License.

              # This script is for master and node instance health monitoring, which is
              # packed in kube-manifest tarball. It is executed through a systemd service
              # in cluster/gce/gci/<master/node>.yaml. The env variables come from an env
              # file provided by the systemd service.

              # This script is a slightly adjusted version of
              # https://github.com/kubernetes/kubernetes/blob/e1a1aa211224fcd9b213420b80b2ae680669683d/cluster/gce/gci/health-monitor.sh
              # Adjustments are:
              # * Kubelet health port is 10248 not 10255
              # * Removal of all all references to the KUBE_ENV file

              set -o nounset
              set -o pipefail

              # We simply kill the process when there is a failure. Another systemd service will
              # automatically restart the process.
              function container_runtime_monitoring() {
                local -r max_attempts=5
                local attempt=1
                local -r container_runtime_name="${CONTAINER_RUNTIME_NAME:-docker}"
                # We still need to use 'docker ps' when container runtime is "docker". This is because
                # dockershim is still part of kubelet today. When kubelet is down, crictl pods
                # will also fail, and docker will be killed. This is undesirable especially when
                # docker live restore is disabled.
                local healthcheck_command="docker ps"
                if [[ "${CONTAINER_RUNTIME:-docker}" != "docker" ]]; then
                  healthcheck_command="crictl pods"
                fi
                # Container runtime startup takes time. Make initial attempts before starting
                # killing the container runtime.
                until timeout 60 ${healthcheck_command} > /dev/null; do
                  if ((attempt == max_attempts)); then
                    echo "Max attempt ${max_attempts} reached! Proceeding to monitor container runtime healthiness."
                    break
                  fi
                  echo "$attempt initial attempt \"${healthcheck_command}\"! Trying again in $attempt seconds..."
                  sleep "$((2 ** attempt++))"
                done
                while true; do
                  if ! timeout 60 ${healthcheck_command} > /dev/null; then
                    echo "Container runtime ${container_runtime_name} failed!"
                    if [[ "$container_runtime_name" == "docker" ]]; then
                      # Dump stack of docker daemon for investigation.
                      # Log file name looks like goroutine-stacks-TIMESTAMP and will be saved to
                      # the exec root directory, which is /var/run/docker/ on Ubuntu and COS.
                      pkill -SIGUSR1 dockerd
                    fi
                    systemctl kill --kill-who=main "${container_runtime_name}"
                    # Wait for a while, as we don't want to kill it again before it is really up.
                    sleep 120
                  else
                    sleep "${SLEEP_SECONDS}"
                  fi
                done
              }

              function kubelet_monitoring() {
                echo "Wait for 2 minutes for kubelet to be functional"
                sleep 120
                local -r max_seconds=10
                local output=""
                while true; do
                  local failed=false

                  if journalctl -u kubelet -n 1 | grep -q "use of closed network connection"; then
                    failed=true
                    echo "Kubelet stopped posting node status. Restarting"
                  elif ! output=$(curl -m "${max_seconds}" -f -s -S http://127.0.0.1:10248/healthz 2>&1); then
                    failed=true
                    # Print the response and/or errors.
                    echo "$output"
                  fi

                  if [[ "$failed" == "true" ]]; then
                    echo "Kubelet is unhealthy!"
                    systemctl kill kubelet
                    # Wait for a while, as we don't want to kill it again before it is really up.
                    sleep 60
                  else
                    sleep "${SLEEP_SECONDS}"
                  fi
                done
              }

              ############## Main Function ################
              if [[ "$#" -ne 1 ]]; then
                echo "Usage: health-monitor.sh <container-runtime/kubelet>"
                exit 1
              fi

              SLEEP_SECONDS=10
              component=$1
              echo "Start kubernetes health monitoring for ${component}"
              if [[ "${component}" == "container-runtime" ]]; then
                container_runtime_monitoring
              elif [[ "${component}" == "kubelet" ]]; then
                kubelet_monitoring
              else
                echo "Health monitoring for component ${component} is not supported!"
              fi

      - path: /etc/systemd/journald.conf.d/max_disk_use.conf
        content:
          inline:
            encoding: b64
            data: |
              [Journal]
              SystemMaxUse=5G

      - path: /opt/load-kernel-modules.sh
        permissions: 755
        content:
          inline:
            encoding: b64
            data: |
              #!/usr/bin/env bash
              set -euo pipefail

              modprobe ip_vs
              modprobe ip_vs_rr
              modprobe ip_vs_wrr
              modprobe ip_vs_sh

              if modinfo nf_conntrack_ipv4 &> /dev/null; then
                modprobe nf_conntrack_ipv4
              else
                modprobe nf_conntrack
              fi

      - path: /etc/sysctl.d/k8s.conf
        content:
          inline:
            encoding: b64
            data: |
              net.bridge.bridge-nf-call-ip6tables = 1
              net.bridge.bridge-nf-call-iptables = 1
              kernel.panic_on_oops = 1
              kernel.panic = 10
              net.ipv4.ip_forward = 1
              vm.overcommit_memory = 1
              fs.inotify.max_user_watches = 1048576
              fs.inotify.max_user_instances = 8192

      - path: /etc/default/grub.d/60-swap-accounting.cfg
        content:
          inline:
            encoding: b64
            data: |
              # Added by kubermatic machine-controller
              # Enable cgroups memory and swap accounting
              GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1"

      - path: /opt/bin/setup
        permissions: 755
        content:
          inline:
            encoding: b64
            data: |
              #!/bin/bash
              set -xeuo pipefail
              if systemctl is-active ufw; then systemctl stop ufw; fi
              systemctl mask ufw


              {{- /* As we added some modules and don't want to reboot, restart the service */}}
              systemctl restart systemd-modules-load.service
              sysctl --system

              # Override hostname if /etc/machine-name exists
              if [ -x "$(command -v hostnamectl)" ] && [ -s /etc/machine-name ]; then
                machine_name=$(cat /etc/machine-name)
                hostnamectl set-hostname ${machine_name}
              fi

              apt-get update

              DEBIAN_FRONTEND=noninteractive apt-get -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install -y \
                curl \
                ca-certificates \
                ceph-common \
                cifs-utils \
                conntrack \
                e2fsprogs \
                ebtables \
                ethtool \
                glusterfs-client \
                iptables \
                jq \
                kmod \
                openssh-client \
                nfs-common \
                socat \
                util-linux \
                {{- if or (eq .CloudProviderName "vsphere") (eq .CloudProviderName "vmware-cloud-director") }}
                open-vm-tools \
                {{- end }}
                {{- if eq .CloudProviderName "nutanix" }}
                open-iscsi \
                {{- end }}
                ipvsadm

              {{- /* iscsid service is required on Nutanix machines for CSI driver to attach volumes. */}}
              {{- if eq .CloudProviderName "nutanix" }}
              systemctl enable --now iscsid
              {{ end }}

              {{- template "containerRuntimeInstallation" }}

              {{- template "safeDownloadBinariesScript" }}

              {{- template "configureProxyScript" }}

              # set kubelet nodeip environment variable
              /opt/bin/setup_net_env.sh

              {{- /* fetch kubelet bootstrapping kubeconfig */}}
              curl -s -k -v --header 'Authorization: Bearer {{ .Token }}' {{ .ServerURL }}/api/v1/namespaces/cloud-init-settings/secrets/{{ .BootstrapKubeconfigSecretName }} | jq '.data["kubeconfig"]' -r| base64 -d > /etc/kubernetes/bootstrap-kubelet.conf

              systemctl enable --now kubelet
              systemctl enable --now --no-block kubelet-healthcheck.service
              systemctl disable setup.service

      - path: /opt/bin/supervise.sh
        permissions: 755
        content:
          inline:
            encoding: b64
            data: |
              #!/bin/bash
              set -xeuo pipefail
              while ! "$@"; do
                sleep 1
              done

      - path: /etc/systemd/system/kubelet.service
        content:
          inline:
            encoding: b64
            data: |
              [Unit]
              After={{ .ContainerRuntime }}.service
              Requires={{ .ContainerRuntime }}.service

              Description=kubelet: The Kubernetes Node Agent
              Documentation=https://kubernetes.io/docs/home/

              [Service]
              User=root
              Restart=always
              StartLimitInterval=0
              RestartSec=10
              CPUAccounting=true
              MemoryAccounting=true

              Environment="PATH=/opt/bin:/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin/"
              EnvironmentFile=-/etc/environment

              ExecStartPre=/bin/bash /opt/load-kernel-modules.sh
              ExecStartPre=/bin/bash /opt/bin/setup_net_env.sh
              ExecStart=/opt/bin/kubelet $KUBELET_EXTRA_ARGS \
                --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf \
                --kubeconfig=/var/lib/kubelet/kubeconfig \
                --config=/etc/kubernetes/kubelet.conf \
                --cert-dir=/etc/kubernetes/pki \
                {{- if .ExternalCloudProvider }}
                --cloud-provider=external \
                {{- else if .InTreeCCMAvailable }}
                --cloud-provider={{- .CloudProviderName }} \
                --cloud-config=/etc/kubernetes/cloud-config \
                {{- end }}
                {{- if ne .CloudProviderName "aws" }}
                --hostname-override=${KUBELET_HOSTNAME} \
                {{- else if and (eq .CloudProviderName "aws") (.ExternalCloudProvider) }}
                --hostname-override=${KUBELET_HOSTNAME} \
                {{- end }}
                {{- if semverCompare "<1.23" .KubeVersion }}
                --dynamic-config-dir=/etc/kubernetes/dynamic-config-dir \
                --feature-gates=DynamicKubeletConfig=true,NodeSwap=true \
                {{- end }}
                --exit-on-lock-contention \
                --lock-file=/tmp/kubelet.lock \
                {{- if .PauseImage }}
                --pod-infra-container-image={{ .PauseImage }} \
                {{- end }}
                {{- if .InitialTaints }}
                --register-with-taints={{- .InitialTaints }} \
                {{- end }}
                {{- if eq .ContainerRuntime "containerd" }}
                --container-runtime=remote \
                --container-runtime-endpoint=unix:///run/containerd/containerd.sock \
                {{- end }}
                {{- /* If external or in-tree CCM is in use we don't need to set --node-ip as the cloud provider will know what IPs to return.  */}}
                {{- if not (and (eq .NetworkIPFamily "IPv4+IPv6") (or (.InTreeCCMAvailable) (.ExternalCloudProvider))) }}
                --node-ip ${KUBELET_NODE_IP}
                {{- end }}

              [Install]
              WantedBy=multi-user.target

      - path: /etc/systemd/system/kubelet.service.d/extras.conf
        content:
          inline:
            encoding: b64
            data: |
              [Service]
              Environment="KUBELET_EXTRA_ARGS=--resolv-conf=/run/systemd/resolve/resolv.conf"

      - path: /etc/kubernetes/cloud-config
        permissions: 600
        content:
          inline:
            encoding: b64
            data: |
              {{ .CloudConfig }}

      - path: /opt/bin/setup_net_env.sh
        permissions: 755
        content:
          inline:
            encoding: b64
            data: |
              #!/usr/bin/env bash
              echodate() {
                echo "[$(date -Is)]" "$@"
              }

              # get the default interface IP address
              {{- if eq .NetworkIPFamily "IPv6" }}
              DEFAULT_IFC_IP=$(ip -o -6 route get  1:: | grep -oP "src \K\S+")
              {{- else if eq .NetworkIPFamily "IPv4+IPv6" }}
              DEFAULT_IFC_IPv4=$(ip -o route get  1 | grep -oP "src \K\S+")
              DEFAULT_IFC_IPv6=$(ip -o -6 route get  1:: | grep -oP "src \K\S+")
              DEFAULT_IFC_IP=$DEFAULT_IFC_IPv4,$DEFAULT_IFC_IPv6
              {{- else }}
              DEFAULT_IFC_IP=$(ip -o  route get 1 | grep -oP "src \K\S+")
              {{- end }}

              if [ -z "${DEFAULT_IFC_IP}" ]
              then
                echodate "Failed to get IP address for the default route interface"
                exit 1
              fi

              # get the full hostname
              FULL_HOSTNAME=$(hostname -f)
              # if /etc/machine-name is not empty then use the hostname from there
              if [ -s /etc/machine-name ]; then
                FULL_HOSTNAME=$(cat /etc/machine-name)
              fi

              # write the nodeip_env file
              # we need the line below because flatcar has the same string "coreos" in that file
              if grep -q coreos /etc/os-release
              then
                echo "KUBELET_NODE_IP=${DEFAULT_IFC_IP}\nKUBELET_HOSTNAME=${FULL_HOSTNAME}" > /etc/kubernetes/nodeip.conf
              elif [ ! -d /etc/systemd/system/kubelet.service.d ]
              then
                echodate "Can't find kubelet service extras directory"
                exit 1
              else
                echo -e "[Service]\nEnvironment=\"KUBELET_NODE_IP=${DEFAULT_IFC_IP}\"\nEnvironment=\"KUBELET_HOSTNAME=${FULL_HOSTNAME}\"" > /etc/systemd/system/kubelet.service.d/nodeip.conf
              fi

      - path: /etc/kubernetes/pki/ca.crt
        content:
          inline:
            encoding: b64
            data: |
              {{ .KubernetesCACert }}

      - path: /etc/systemd/system/setup.service
        permissions: 644
        content:
          inline:
            encoding: b64
            data: |
              [Install]
              WantedBy=multi-user.target

              [Unit]
              Requires=network-online.target
              After=network-online.target

              [Service]
              Type=oneshot
              RemainAfterExit=true
              EnvironmentFile=-/etc/environment
              ExecStart=/opt/bin/supervise.sh /opt/bin/setup

      - path: /etc/profile.d/opt-bin-path.sh
        permissions: 644
        content:
          inline:
            encoding: b64
            data: |
              export PATH="/opt/bin:$PATH"

      - path: /etc/kubernetes/kubelet.conf
        content:
          inline:
            encoding: b64
            data: |
              apiVersion: kubelet.config.k8s.io/v1beta1
              kind: KubeletConfiguration
              authentication:
                anonymous:
                  enabled: false
                webhook:
                  cacheTTL: 0s
                  enabled: true
                x509:
                  clientCAFile: /etc/kubernetes/pki/ca.crt
              authorization:
                mode: Webhook
                webhook:
                  cacheAuthorizedTTL: 0s
                  cacheUnauthorizedTTL: 0s
              cgroupDriver: systemd
              clusterDNS:
              {{- range .ClusterDNSIPs }}
              - "{{ . }}"
              {{- end }}
              clusterDomain: cluster.local
              {{- if .ContainerLogMaxSize }}
              containerLogMaxSize: {{ .ContainerLogMaxSize }}
              {{- else }}
              containerLogMaxSize: 100Mi
              {{- end }}
              {{- if .ContainerLogMaxFiles }}
              containerLogMaxFiles: {{ .ContainerLogMaxFiles }}
              {{- else }}
              containerLogMaxFiles: 5
              {{- end }}
              cpuManagerReconcilePeriod: 0s
              evictionPressureTransitionPeriod: 0s
              featureGates:
              {{- if .KubeletFeatureGates -}}
                {{ range $key, $val := .KubeletFeatureGates }}
                {{ $key }}: {{ $val }}
                {{- end -}}
              {{- end }}
              fileCheckFrequency: 0s
              httpCheckFrequency: 0s
              imageMinimumGCAge: 0s
              logging:
                flushFrequency: 0
                options:
                  json:
                    infoBufferSize: "0"
                verbosity: 0
              failSwapOn: false
              memorySwap:
                swapBehavior: LimitedSwap
              nodeStatusReportFrequency: 0s
              nodeStatusUpdateFrequency: 0s
              protectKernelDefaults: true
              readOnlyPort: 0
              rotateCertificates: true
              runtimeRequestTimeout: 0s
              serverTLSBootstrap: true
              shutdownGracePeriod: 0s
              shutdownGracePeriodCriticalPods: 0s
              staticPodPath: /etc/kubernetes/manifests
              streamingConnectionIdleTimeout: 0s
              syncFrequency: 0s
              kubeReserved:
              {{- if .KubeReserved -}}
                {{ range $key, $val := .KubeReserved }}
                {{ $key }}: {{ $val }}
                {{- end -}}
              {{- else }}
                cpu: 200m
                ephemeral-storage: 1Gi
                memory: 200Mi
              {{- end }}
              systemReserved:
              {{- if .SystemReserved -}}
                {{ range $key, $val := .SystemReserved }}
                {{ $key }}: {{ $val }}
                {{- end -}}
              {{- else }}
                cpu: 200m
                ephemeral-storage: 1Gi
                memory: 200Mi
              {{- end }}
              evictionHard:
              {{- if .EvictionHard -}}
                {{ range $key, $val := .EvictionHard }}
                {{ $key }}: {{ $val }}
                {{- end -}}
              {{- else }}
                imagefs.available: 15%
                memory.available: 100Mi
                nodefs.available: 10%
                nodefs.inodesFree: 5%
              {{- end }}
              {{- if .MaxPods }}
              maxPods: {{ .MaxPods }}
              {{- end }}
              tlsCipherSuites:
              - TLS_AES_128_GCM_SHA256
              - TLS_AES_256_GCM_SHA384
              - TLS_CHACHA20_POLY1305_SHA256
              - TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
              - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
              - TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
              - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
              - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
              - TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305
              volumePluginDir: /var/lib/kubelet/volumeplugins
              volumeStatsAggPeriod: 0s

      - path: /etc/systemd/system/kubelet-healthcheck.service
        permissions: 644
        content:
          inline:
            encoding: b64
            data: |
              [Unit]
              Requires=kubelet.service
              After=kubelet.service

              [Service]
              ExecStart=/opt/bin/health-monitor.sh kubelet

              [Install]
              WantedBy=multi-user.target


The changes we made to the OSP:

diff --git a/deploy/osps/default/osp-ubuntu.yml b/deploy/osps/default/osp-ubuntu.yaml
index e72fe54..73d1d08 100644
--- a/deploy/osps/default/osp-ubuntu.yml
+++ b/deploy/osps/default/osp-ubuntu.yaml
@@ -572,7 +572,6 @@ spec:
               Environment="PATH=/opt/bin:/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin/"
               EnvironmentFile=-/etc/environment

-              ExecStartPre=/bin/bash /opt/disable-swap.sh
               ExecStartPre=/bin/bash /opt/load-kernel-modules.sh
               ExecStartPre=/bin/bash /opt/bin/setup_net_env.sh
               ExecStart=/opt/bin/kubelet $KUBELET_EXTRA_ARGS \
@@ -596,7 +595,7 @@ spec:
                 {{- end }}
                 {{- if semverCompare "<1.23" .KubeVersion }}
                 --dynamic-config-dir=/etc/kubernetes/dynamic-config-dir \
-                --feature-gates=DynamicKubeletConfig=true \
+                --feature-gates=DynamicKubeletConfig=true,NodeSwap=true \
                 {{- end }}
                 --exit-on-lock-contention \
                 --lock-file=/tmp/kubelet.lock \
@@ -776,7 +775,9 @@ spec:
                   json:
                     infoBufferSize: "0"
                 verbosity: 0
-              memorySwap: {}
+              failSwapOn: false
+              memorySwap:
+                swapBehavior: LimitedSwap
               nodeStatusReportFrequency: 0s
               nodeStatusUpdateFrequency: 0s
               protectKernelDefaults: true
@@ -851,17 +852,3 @@ spec:

               [Install]
               WantedBy=multi-user.target
-
-      - path: /opt/disable-swap.sh
-        permissions: 755
-        content:
-          inline:
-            encoding: b64
-            data: |
-              #!/usr/bin/env bash
-              set -euo pipefail
-
-              # Make sure we always disable swap - Otherwise the kubelet won't start as for some cloud
-              # providers swap gets enabled on reboot or after the setup script has finished executing.
-              sed -i.orig '/.*swap.*/d' /etc/fstab
-              swapoff -a

Consume the custom OSP

Create/update your MachineDeployment to consume the custom OSP. The annotation k8c.io/operating-system-profile is used to specify the OSP for a MachineDeployment.

apiVersion: cluster.k8s.io/v1alpha1
kind: MachineDeployment
metadata:
  name: << MACHINE_NAME >>
  namespace: kube-system
  annotations:
    k8c.io/operating-system-profile: "osp-ubuntu-swap-enabled"
spec:
  replicas: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      name: << MACHINE_NAME >>
  template:
    metadata:
      labels:
        name: << MACHINE_NAME >>
    spec:
      providerSpec:
        value:
          cloudProvider: "aws"
          cloudProviderSpec:
            region: "eu-central-1"
            availabilityZone: "eu-central-1a"
            vpcId: "vpc-name"
            subnetId: "subnet-id"
            instanceType: "t2.micro"
            instanceProfile: "kubernetes-v1"
            isSpotInstance: false
            diskSize: 50
            diskType: "gp2"
            ebsVolumeEncrypted: false
            ami: "my-custom-ami"
          operatingSystem: flatcar
          operatingSystemSpec:
            # 'provisioningUtility` is only used for flatcar os, can be set to ignition or cloud-init. Defaults to ignition.
            provisioningUtility: ignition
      versions:
        kubelet: "<< KUBERNETES_VERSION >>"

If you are updating the annotation on an existing MachineDeployment then the machines will not be rotated automatically. To perform a rotation for existing MachineDeployments please follow the guide at Rolling Restart MachineDeploments.