High-Availability Deployment

High-Availability Deployment

The hardware foundation for Kubermatic Virtualization is multi-faceted, encompassing requirements for the Kubermatic
Virtualization (Kube-V) management layer, the KubeVirt infrastructure nodes that host virtual machines, in addition to various services that are running as part of the ecosystem.

Control Plane Nodes

  • Nodes: Minimum 3 control plane nodes to ensure a quorum for etcd (Kubernetes’ key-value store) and prevent a single point of failure. These should ideally be distributed across different failure domains (e.g., availability zones, racks).
  • CPU: At least 2 vCPUs per control plane node.
  • RAM: At least 4 GB RAM per control plane node. Recommended: 8-16 GB for robust performance.
  • Storage: Fast, persistent storage for etcd (SSD-backed recommended) with sufficient capacity.

Worker Nodes

  • Minimum 2 worker nodes (for KubeVirt VMs): For HA, you need more than one node to run VMs. This allows for live migration and VM rescheduling in case of a node failure.
  • CPU: A minimum of 8 CPU cores per node is suggested for testing environments. For production deployments, 16 CPU cores or more per node are recommended to accommodate multiple VMs and their workloads effectively. Each worker node must have Intel VT-x or AMD-V hardware virtualization extensions enabled in the BIOS/UEFI. This is a fundamental requirement for KubeVirt to leverage KVM (Kernel-based Virtual Machine) for efficient VM execution. Without this, KubeVirt can fall back to software emulation, but it’s significantly slower and not suitable for production HA.
  • RAM: At least 8 GB RAM per node. Recommended: 16-32 GB, depending on the number and memory requirements of your VMs.
  • Storage: SSDs or NVMe drives are highly recommended for good VM performance in addition to sufficient storage capacity based on the disk images of your VMs and any data they store.

Storage

  • CSI Driver Capabilities (Crucial for HA/Live Migration): This is perhaps the most critical component for KubeVirt HA and live migration. You need a shared storage backend that supports ReadWriteMany (RWX) access mode or Block-mode (volumeMode: Block) volumes.
  • Capacity: Sufficient storage capacity based on the disk images of your VMs and any data they store.
  • Performance: SSDs or NVMe drives are highly recommended for good VM performance where high-throughput services, low-latency, high-IOPS storage (often block storage) is critical.
  • Replication and Redundancy: To achieve HA, data must be replicated across multiple nodes or availability zones. If a node fails, the data should still be accessible from another.

Networking

A well-planned and correctly configured network infrastructure is fundamental to the stability and performance of Kubermatic Virtualization. This includes considerations for IP addressing, DNS, load balancing, and inter-component communication.

  • High-bandwidth, low-latency connections: 1 Gbps NICs are a minimum; 10 Gbps or higher is recommended for performance-sensitive workloads and efficient live migration.
  • Load Balancing: External/internal load balancers for distributing traffic across control planes and worker nodes.
  • Dedicated network for live migration (recommended): While not strictly minimal, a dedicated Multus network for live migration can significantly reduce network saturation on tenant workloads during migrations.
  • Connectivity: Full and unrestricted network connectivity is paramount between all host nodes. Firewalls and security groups must be configured to permit all necessary Kubernetes control plane traffic, KubeVirt communication, and KubeV-specific inter-cluster communication.
  • DNS: DNS resolution is crucial for the Kube-V environment, enabling all nodes to find each other and external services. A potential conflict can arise if both the KubeVirt infrastructure and guest user clusters use NodeLocal DNSCache with the same default IP address, leading to DNS resolution issues for guest VMs. This can be mitigated by adjusting the dnsConfig and dnsPolicy of the guest VMs.
ComponentPort(s)ProtocolDirectionPurpose
API Server6443TCPInboundAll API communication with the cluster
etcd2379-2380TCPInboundetcd database communication
Kubelet10250TCPInboundKubelet API for control plane communication
Kube-Scheduler10259TCPInboundKube-Scheduler component
Controller-Manager10257TCPInboundKube-Controller-Manager component
Kube-Proxy10256TCPInboundKube-Proxy health checks and service routing
NodePort Services30000-32767TCP/UDPInboundDefault range for exposing services on node IPs
KubeVirt API8443TCPInternalKubeVirt API communication
Live Migration61000-61009 (approx)TCPNode-to-NodeFor migrating VM state between nodes
OVN NB DB6641TCPInternalOVN Northbound Database
OVN SB DB6642TCPInternalOVN Southbound Database
OVN Northd6643TCPInternalOVN Northd process
OVN Raft6644TCPInternalOVN Raft consensus (for HA OVN DBs)
Geneve Tunnel6081UDPNode-to-NodeDefault overlay network for pod communication (OVN)
OVN Controller10660TCPInternalMetrics for OVN Controller
OVN Daemon10665TCPInternalMetrics for OVN Daemon (on each node)
OVN Monitor10661TCPInternalMetrics for OVN Monitor