This guide helps you diagnose and resolve common issues with machine-controller.
View the machine-controller logs to identify errors:
kubectl logs -n kube-system deployment/machine-controller -f
For more verbose logging, increase the log level by editing the deployment:
kubectl edit deployment machine-controller -n kube-system
Change the -v flag to a higher value (e.g., -v=6 for debug level).
Check the status of a specific machine:
kubectl describe machine <machine-name> -n kube-system
Look for:
kubectl get machines -n kube-system -o wide
Check for machines stuck in provisioning or with error states.
Symptoms:
kubectl get nodesPossible Causes and Solutions:
Cloud Provider Credentials Invalid
kubectl logs -n kube-system deployment/machine-controller | grep -i auth
Solution: Verify credentials are correct and have necessary permissions
Instance Creation Failure
kubectl describe machine <machine-name> -n kube-system
Check events for cloud provider errors (quota limits, invalid instance type, etc.)
Network Connectivity Issues
User Data Script Errors Access the instance via cloud provider console and check:
sudo journalctl -u cloud-init-output
Symptoms:
Common Solutions:
Invalid Configuration
kubectl get machine <machine-name> -n kube-system -o yaml
Verify all required fields are present and valid
Unsupported Operating System Check the operating system support matrix
Cloud Provider Quota Exceeded
Symptoms:
kubectl get nodesDebugging Steps:
Check kubelet status on the instance SSH into the instance:
systemctl status kubelet
journalctl -u kubelet -f
Verify bootstrap token Check if the token is valid:
kubectl get secrets -n kube-system | grep bootstrap-token
Check network connectivity From the instance, test connectivity to API server:
curl -k https://<api-server>:6443
Review cloud-init logs
sudo cat /var/log/cloud-init.log
sudo cat /var/log/cloud-init-output.log
Symptoms:
Solutions:
Check for finalizers
kubectl get machine <machine-name> -n kube-system -o yaml | grep finalizers
Force delete if necessary (use with caution)
kubectl patch machine <machine-name> -n kube-system -p '{"metadata":{"finalizers":[]}}' --type=merge
Manually delete cloud resources If cloud instance still exists, delete it via cloud provider console/CLI
Symptoms:
Solutions:
Check MachineDeployment events
kubectl describe machinedeployment <name> -n kube-system
Verify selector matches template labels
spec:
selector:
matchLabels:
name: my-workers # Must match template labels
template:
metadata:
labels:
name: my-workers
Check for validation errors Look for events indicating schema validation failures
Symptoms:
Solutions:
Check update strategy
kubectl get machinedeployment <name> -n kube-system -o yaml
Verify maxSurge and maxUnavailable settings
Check machine creation errors New machines might be failing to provision:
kubectl get machines -n kube-system | grep <deployment-name>
Manually delete problematic machines If machines are stuck, delete them to allow new ones to be created
Issue: Instance creation fails with “unauthorized” error
Issue: Instances created in wrong subnet
subnetId in cloud provider specIssue: Authentication failures
tenantID, clientID, clientSecret, and subscriptionIDIssue: VM size not available
az vm list-sizes --location <region> to see available sizesIssue: Rate limiting errors
Issue: Droplet creation fails with “region not available”
Issue: Service account decoding errors
cat sa.json | base64 -w0 (Linux) or cat sa.json | base64 (macOS)Issue: Quota exceeded errors
Issue: Location or server type not found
Issue: Network attachment fails
Issue: Authentication failures
Issue: Flavor or image not found
Issue: VM creation fails
Issue: Network configuration errors
Possible Causes:
Cloud Provider API Rate Limits
Low Worker Count Increase workers in machine-controller deployment:
kubectl edit deployment machine-controller -n kube-system
Change -worker-count flag to a higher value (e.g., -worker-count=20)
Slow Image Downloads
Solutions:
Edit machine-controller deployment:
kubectl edit deployment machine-controller -n kube-system
Change logging level:
args:
- -logtostderr
- -v=6 # Debug level
Create a diagnostic bundle:
# Machine-controller logs
kubectl logs -n kube-system deployment/machine-controller --tail=1000 > mc-logs.txt
# All machines
kubectl get machines -n kube-system -o yaml > machines.yaml
# All machinesets
kubectl get machinesets -n kube-system -o yaml > machinesets.yaml
# All machinedeployments
kubectl get machinedeployments -n kube-system -o yaml > machinedeployments.yaml
# Events
kubectl get events -n kube-system --sort-by='.lastTimestamp' > events.txt
If you’re still experiencing issues: