This chapter describes the customization of the KKP Master / Seed Monitoring, Logging & Alerting Stack.
Monitoring configurations are highly dependent on the specific use case. This guide details the available customization options for the MLA stack. Customizations are available for four main areas:
Familiarity with the Installation of the Master / Seed MLA Stack is recommended before proceeding.
Each user cluster is monitored by a dedicated Prometheus instance that runs within its namespace on the seed cluster. This instance is responsible for collecting metrics from the user cluster’s control plane.
Note: The scope of this Prometheus is limited to the user cluster’s control plane. It does not collect metrics from applications or workloads running inside the user cluster.
While the lifecycle of this Prometheus is managed automatically by KKP, you can still add custom rules.
To do so, specify your desired rules in the KubermaticConfiguration custom resource.
KKP provides a default set of rules. However, new custom rules can be added, or the default set can be disabled.
Custom rules can be added by defining them as a YAML-formatted string under the spec.monitoring.customRules field.
apiVersion: kubermatic.k8c.io/v1
kind: KubermaticConfiguration
metadata:
name: <<mykubermatic>>
namespace: kubermatic
spec:
# Monitoring can be used to fine-tune to in-cluster Prometheus.
monitoring:
# CustomRules can be used to inject custom recording and alerting rules. This field
# must be a YAML-formatted string with a `group` element at its root, as documented
# on https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/.
# This value is treated as a Go template, which allows to inject dynamic values like
# the internal cluster address or the cluster ID. Refer to pkg/resources/prometheus
# and the documentation for more information on the available fields.
customRules: |
groups:
- name: my-custom-group
rules:
- alert: MyCustomAlert
annotations:
message: Something happened in {{ $labels.namespace }}
expr: |
sum(rate(machine_controller_errors_total[5m])) by (namespace) > 0.01
for: 10m
labels:
severity: warning
The default rules provided by KKP can be disabled by setting the spec.monitoring.disableDefaultRules flag to true.
apiVersion: kubermatic.k8c.io/v1
kind: KubermaticConfiguration
metadata:
name: <<mykubermatic>>
namespace: kubermatic
spec:
# Monitoring can be used to fine-tune to in-cluster Prometheus.
monitoring:
# DisableDefaultRules disables the recording and alerting rules.
disableDefaultRules: true
The scraping behavior of Prometheus can be customized. New scraping configurations can be added, and the default configurations can be disabled.
Custom scraping configurations can be specified by adding them under the spec.monitoring.customScrapingConfigs field.
apiVersion: kubermatic.k8c.io/v1
kind: KubermaticConfiguration
metadata:
name: <<mykubermatic>>
namespace: kubermatic
spec:
# Monitoring can be used to fine-tune to in-cluster Prometheus.
monitoring:
# CustomScrapingConfigs can be used to inject custom scraping rules. This must be a
# YAML-formatted string containing an array of scrape configurations as documented
# on https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config.
# This value is treated as a Go template, which allows to inject dynamic values like
# the internal cluster address or the cluster ID. Refer to pkg/resources/prometheus
# and the documentation for more information on the available fields.
customScrapingConfigs: |
- job_name: 'schnitzel'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_kubermatic_scrape]
action: keep
regex: true
The default scraping configurations provided by KKP can be disabled. This is accomplished by setting the spec.monitoring.disableDefaultScrapingConfigs flag to true.
apiVersion: kubermatic.k8c.io/v1
kind: KubermaticConfiguration
metadata:
name: <<mykubermatic>>
namespace: kubermatic
spec:
# Monitoring can be used to fine-tune to in-cluster Prometheus.
monitoring:
# DisableDefaultScrapingConfigs disables the default scraping targets.
disableDefaultScrapingConfigs: true
This Prometheus is primarily used to collect metrics from the user clusters and then provide those to Grafana. In contrast to the Prometheus mentioned above, this one is deployed via a Helm chart, and you can use Helm’s native customization options.
Additional labels can be sent to Alertmanager with each alert. These are specified by adding an externalLabels element to the values.yaml file:
prometheus:
externalLabels:
mycustomlabel: a value
rack: rack17
location: europe
Rules include recording rules (for precomputing expensive queries) and alerts. There are three different ways of customizing them.
Custom rules can be added directly to the values.yaml file:
prometheus:
rules:
groups:
- name: myrules
rules:
- alert: DatacenterIsOnFire
annotations:
message: |
The datacenter has gone up in flames, someone should quickly find an extinguisher.
You can reach the local emergency services by calling 0118 999 881 999 119 7253.
expr: temperature{server=~"kubernetes.+"} > 100
for: 5m
labels:
severity: critical
This will lead to them being written to a dedicated _customrules.yaml and included in Prometheus. This approach is recommended for adding a small number of rules.
For larger sets of rules, new YAML files can be placed inside the rules/ directory before the Helm chart is deployed. These files will be included automatically. They will be included as you would expect. To ensure future compatibility and prevent update conflicts, existing files within the chart should not be modified. The method for disabling predefined rules is described in the next section.
For large deployments with many independently managed rules, you can make use of custom volumes to mount your configuration into Prometheus. For this, to work, you need to create your own ConfigMap or Secret inside the monitoring namespace. Then configure the Prometheus chart using the values.yaml to mount those appropriately like so:
prometheus:
volumes:
- name: example-rules-volume
mountPath: /example/rules
configMap: example-rules
After mounting the files into the pod, you need to make sure that Prometheus loads them by extending the ruleFiles list:
prometheus:
ruleFiles:
- '/etc/prometheus/rules/*.yaml'
- '/example/rules/*.yaml'
Managing the ruleFiles is also the way to disable the predefined rules by just removing the applicable item from the list. You can also keep the list completely empty to disable any and all alerts.
By default, the seed Prometheus is configured to store 15 day’s worth of metrics.
It can be customized via overriding the prometheus.tsdb.retentionTime field in values.yaml used for chart installation.
If you would like to store the metrics for the long term, typically other solutions like Thanos are used. Thanos integration is a more involved process. Please read more about Thanos integration.
The Alertmanager configuration can be modified in the values.yaml file:
alertmanager:
config:
global:
slack_api_url: https://hooks.slack.com/services/YOUR_KEYS_HERE
route:
receiver: default
repeat_interval: 1h
routes:
- receiver: blackhole
match:
severity: none
receivers:
- name: blackhole
- name: default
slack_configs:
- channel: '#alerting'
send_resolved: true
Please review the Alertmanager Configuration Guide for detailed configuration syntax.
You can review the Alerting Runbook for a reference of alerts that Kubermatic Kubernetes Platform (KKP) monitoring setup can fire, alongside a short description and steps to debug.
Customizing Grafana entails three different aspects:
In all cases, you have two general approaches: Either take the Grafana Helm chart and place additional files into the existing directory structure or leave the Helm chart as-is and use the values.yaml and your own ConfigMaps/Secrets to hold your customizations. This is very similar to how customizing the seed-level Prometheus works, so if you read that chapter, you will feel right at home.
To create a new datasource, you can either put a new YAML file inside the provisioning/datasources/ directory or extend your values.yaml like so:
grafana:
provisioning:
datasources:
extra:
# list your new datasources here
- name: influxdb
type: influxdb
access: proxy
org_id: 1
url: http://influxdb.monitoring.svc.cluster.local:9090
version: 1
editable: false
You can also remove the default Prometheus datasource if you really want to by either deleting the prometheus.yaml or pointing the source directive inside your values.yaml to a different, empty directory:
grafana:
provisioning:
datasources:
source: empty/
Note that by removing the default Prometheus datasource and not providing an alternative with the same name, the default dashboards will not work anymore.
Configuring providers works much in the same way as configuring datasources: either place new files in the provisioning/dashboards/ directory or use the values.yaml accordingly:
grafana:
provisioning:
dashboards:
extra:
# list your new datasources here
- folder: "Example Resources"
name: "example"
options:
path: /example/dashboards
org_id: 1
type: file
Customizing the providers is especially important if you also want to add your own dashboards. You can point the options.path path to a newly mounted volume to load dashboards from (see below).
Just like with datasources and providers, new dashboards can be placed in the existing dashboards/ directory. Do note though that if you create a new folder (like dashboards/example/), you also must create a new dashboard provider to tell Grafana about it. Your dashboards will be loaded and included in the default ConfigMap, but without the new provider, Grafana will not see them.
Following the example above, if you put your dashboards in dashboards/example/, you need a dashboard provider with the options.path set to /grafana-dashboard-definitions/example, because the ConfigMap is mounted to /grafana-dashboard-definitions.
You can also use your own ConfigMaps or Secrets and have the Grafana deployment mount them. This is useful for larger customizations with lots of dashboards that you want to manage independently. To use an external ConfigMap, create it like so:
- apiVersion: v1
kind: ConfigMap
metadata:
name: example-dashboards
data:
dashboard1.json: |
{ ... Grafana dashboard JSON here ... }
dashboard2.json: |
{ ... Grafana dashboard JSON here ... }
Make sure to create your ConfigMap in the monitoring namespace and then use the volumes directive in your values.yaml to tell the Grafana Helm chart about your ConfigMap:
grafana:
volumes:
- name: example-dashboards-volume
mountPath: /grafana-dashboard-definitions/example
configMap: example-dashboards
Using a Secret instead of a ConfigMap works identically, just specify secretName instead of configMap in the volumes section.
Remember that you still need a custom dashboard provider to make Grafana load your new dashboards.
kube-state-metrics helm chart deployed on a seed/master cluster can be extended to get state metrics of custom resources as well. For this, we need to enable customResourceState & pass the configuration for custom state metrics.
kubeStateMetrics:
customResourceState:
enabled: true
config:
spec:
resources:
- groupVersionKind:
group: helm.toolkit.fluxcd.io
version: "v2beta2"
kind: HelmRelease
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
suspended: [spec, suspend]
ready: [status, conditions, "[type=Ready]", status]
Additionally, the RBAC rules must be updated to allow kube-state-metrics to perform the necessary operations on the custom resource(s).
kubeStateMetrics:
rbac:
extraRules:
- apiGroups:
- helm.toolkit.fluxcd.io
resources:
- helmreleases
verbs: [ "list", "watch" ]
For configuring more custom resources, refer to the example kube-state-metrics-config.yaml.