The NVIDIA GPU Operator uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU.
For more information on the Nvidia GPU Operator, please refer to the official documentation
Nvidia GPU Operator is available as part of the KKP’s default application catalog. It can be deployed to the user cluster either during the cluster creation or after the cluster is ready(existing cluster) from the Applications tab via UI.

-> Next button.
+ Add Application to deploy the Nvidia GPU Operator application to the user cluster.To further configure the values.yaml, find more information on the Nvidia GPU Operator Helm chart documentation
DCGM (Data Center GPU Manager) metrics are health and performance measurements exported by NVIDIA software. They include useful signals such as GPU temperature, memory usage, and utilization. These metrics are ready to be consumed by Prometheus and visualized in Grafana.
The following explains how DCGM metrics are exposed when you deploy the NVIDIA GPU Operator via the KKP application catalog and how to check that everything is working.
When you deploy the Nvidia GPU Operator from the Application Catalog, DCGM metrics are enabled by default. It also deploys Node Feature Discovery (NFD), which automatically labels GPU nodes. These labels help the operator deploy a small exporter (dcgm-exporter) as a DaemonSet on those GPU nodes.
Key points:
dcgmExporter and nfd components.feature.node.kubernetes.io/pci-10de.present=true label (this is done automatically by NFD).nvidia-gpu-operator namespace are in the Running state.If you’d like more detailed, technical steps (for example, changing scrape intervals or customizing the chart values), check the official GPU Operator Helm chart and the dcgm-exporter documentation:
To support AI workloads, Kubermatic Kubernetes Platform uses the NVIDIA GPU Operator to automatically expose GPU information through node labels.
Once the operator is installed, it discovers the GPUs available on your cluster nodes and applies a set of descriptive labels.
These labels provide useful details about the hardware, such as the GPU product name and the installed CUDA driver and runtime versions.
You can view these labels on the Nodes page.
