This document describes the architecture and design principles of machine-controller.
Machine-controller is a Kubernetes controller that implements the Cluster API specification for managing worker nodes across multiple cloud providers. It provides a unified, declarative interface for machine lifecycle management.
The controller manager is the main component that runs as a Deployment in the kube-system namespace. It consists of several reconciliation loops:
Machine-controller uses three primary CRDs defined by the Cluster API:
apiVersion: cluster.k8s.io/v1alpha1
kind: MachineDeployment
Provides declarative updates for Machines, similar to Kubernetes Deployments. Manages:
apiVersion: cluster.k8s.io/v1alpha1
kind: MachineSet
Ensures a specified number of Machines are running. Typically created and managed by MachineDeployment but can be used independently.
apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
Represents a single worker node. Contains:
┌──────────────────────────────────────────────────────────────┐
│ Kubernetes API Server │
└──────────────────────────────────────────────────────────────┘
▲
│
│ Watch/Update
│
┌──────────────────────────────────────────────────────────────┐
│ Machine Controller │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ MachineDepl. │ │ MachineSet │ │ Machine │ │
│ │ Controller │─▶│ Controller │─▶│ Controller │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │
└───────────────────────────────────────────────┼──────────────┘
│
│ Cloud API
▼
┌──────────────────────────────────────┐
│ Cloud Provider (AWS, Azure, │
│ GCP, Hetzner, OpenStack, etc.) │
└──────────────────────────────────────┘
│
▼
┌──────────────────────────────────────┐
│ Cloud Instances (Worker Nodes) │
└──────────────────────────────────────┘
The machine-controller follows the standard Kubernetes controller pattern:
┌──────────┐
│ Create │
│ Machine │
└────┬─────┘
│
▼
┌─────────────────┐
│ Validating │ ◀─── Validate configuration
└────┬────────────┘
│
▼
┌─────────────────┐
│ Provisioning │ ◀─── Create cloud instance
└────┬────────────┘ Generate user-data
│ Apply cloud-init
▼
┌─────────────────┐
│ Joining │ ◀─── Configure kubelet
└────┬────────────┘ Join cluster
│ Register node
▼
┌─────────────────┐
│ Running │ ◀─── Monitor health
└────┬────────────┘ Update status
│
▼
┌─────────────────┐
│ Deleting │ ◀─── Drain node
└────┬────────────┘ Delete cloud instance
│ Clean up resources
▼
┌─────────────────┐
│ Deleted │
└─────────────────┘
Machine-controller uses a provider abstraction layer that enables support for multiple cloud platforms.
Each cloud provider implements the following interface:
type Provider interface {
// Validate validates the machine spec
Validate(spec v1alpha1.MachineSpec) error
// Create creates a new cloud instance
Create(machine *v1alpha1.Machine, data *MachineCreateDeleteData, userdata string) (Instance, error)
// Get retrieves an existing instance
Get(machine *v1alpha1.Machine) (Instance, error)
// Cleanup deletes the instance and associated resources
Cleanup(machine *v1alpha1.Machine, data *MachineCreateDeleteData) (bool, error)
// GetCloudConfig returns provider-specific cloud config
GetCloudConfig(spec v1alpha1.MachineSpec) (config string, name string, err error)
// AddDefaults adds default values to the machine spec
AddDefaults(spec v1alpha1.MachineSpec) (v1alpha1.MachineSpec, error)
}
Currently implemented providers:
See Cloud Providers for detailed configuration.
Machine-controller supports multiple operating systems through a unified provisioning mechanism.
See Operating Systems for the support matrix.
Machine-controller supports multiple methods for cloud provider authentication:
Machine-controller requires specific permissions:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: machine-controller
rules:
- apiGroups: ["cluster.k8s.io"]
resources: ["machines", "machinesets", "machinedeployments"]
verbs: ["*"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["*"]
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "patch"]
For production deployments:
replicas: 2 or moreExample HA configuration:
apiVersion: apps/v1
kind: Deployment
metadata:
name: machine-controller
namespace: kube-system
spec:
replicas: 2
template:
spec:
containers:
- name: machine-controller
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: machine-controller
namespace: kube-system
spec:
minAvailable: 1
selector:
matchLabels:
app: machine-controller
The -worker-count flag controls concurrent reconciliation operations:
Higher worker counts increase throughput but also resource usage.
Machine-controller respects cloud provider API rate limits:
Typical resource consumption:
Machine-controller exposes Prometheus metrics on port 8085:
machine_controller_machines_total{provider="aws"} - Total machines by providermachine_controller_errors_total{operation="create"} - Error count by operationmachine_controller_workers_running - Active worker countmachine_controller_machine_deployment_replicas - Desired vs actual replicasMachine-controller works seamlessly with Kubernetes Cluster Autoscaler: