This document provides a comprehensive analysis of potential Personally Identifiable Information (PII) and personal data (indirect identifiers) that may be present in system logs from Kubernetes clusters deployed using KubeOne.
Target Audience: Platform operators, security teams, compliance officers
Prerequisites: Basic understanding of Kubernetes and KubeOne
While KubeOne inherently tries to avoid logging any PII, there are some cases where it is unavoidable and outside the control of the platform operator. This could be a component that KubeOne ships or the underlying Kubernetes components.
System logs from Kubernetes clusters may contain the following types of PII:
webapp-john-deployment, john-doe-dev namespaceworker-john-prod-01.company.comowner=john.doe@company.com/home/john/data:/data| Component | User Identity | IP Addresses | Credentials | Cloud IDs | Risk Level | 
|---|---|---|---|---|---|
| kube-apiserver | ✅ High | ✅ High | ✅ High | ❌ No | 🔴 HIGH | 
| kubelet | ⚠️ Medium | ✅ High | ✅ High | ❌ No | 🔴 HIGH | 
| etcd | ✅ High | ⚠️ Medium | ✅ High | ❌ No | 🔴 HIGH | 
| Cloud Controller Managers | ❌ No | ✅ High | ✅ High | ✅ High | 🔴 HIGH | 
| CSI Drivers | ❌ No | ⚠️ Medium | ✅ High | ✅ High | 🔴 HIGH | 
| Secrets Store CSI | ❌ No | ❌ No | ✅ High | ⚠️ Low | 🔴 HIGH | 
| Cilium | ⚠️ Medium | ✅ High | ❌ No | ❌ No | 🟡 MEDIUM-HIGH | 
| kube-controller-manager | ⚠️ Low | ⚠️ Medium | ⚠️ Medium | ⚠️ Medium | 🟡 MEDIUM | 
| kube-scheduler | ⚠️ Low | ❌ No | ❌ No | ❌ No | 🟡 MEDIUM | 
| kube-proxy | ❌ No | ✅ High | ❌ No | ❌ No | 🟡 MEDIUM | 
| CoreDNS | ⚠️ Low | ⚠️ Medium | ❌ No | ❌ No | 🟡 MEDIUM | 
| Canal | ❌ No | ✅ High | ❌ No | ❌ No | 🟡 MEDIUM | 
| WeaveNet | ❌ No | ✅ High | ⚠️ Low | ❌ No | 🟡 MEDIUM | 
| cluster-autoscaler | ⚠️ Low | ⚠️ Low | ⚠️ Low | ✅ High | 🟡 MEDIUM | 
| NodeLocalDNS | ⚠️ Low | ⚠️ Medium | ❌ No | ❌ No | 🟡 MEDIUM | 
| metrics-server | ⚠️ Low | ❌ No | ❌ No | ❌ No | 🟢 LOW-MEDIUM | 
| machine-controller | ⚠️ Low | ❌ No | ⚠️ Low | ✅ High | 🟢 LOW | 
| operating-system-manager | ⚠️ Low | ❌ No | ❌ No | ⚠️ Low | 🟢 LOW | 
Legend:
While the risk matrix provides a helpful overview of potential PII exposure, it is important to note that the risk is not always proportional to the exposure. For example, a low-risk component may have high exposure if it is combined with a high-risk component.
An example of this would be a component that logs a full Kubernetes resource in case of a validation failure. The Kubernetes resource itself may contain PII, and while the fields that might contain personal data are not directly being referred to in the logs, the full resource is being logged. This results in private data being exposed to the logs. It is always recommended to review and sanitize the logs before sharing them anywhere.
Implement automated filtering in your log aggregation pipeline to remove PII and personal data from the logs.
# Email addresses
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
# IPv4 addresses
\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b
# Basic Auth in URLs
https?://[^:]+:[^@]+@