Customization of the Master / Seed MLA Stack
This chapter describes the customization of the KKP Master / Seed Monitoring, Logging & Alerting Stack.
Monitoring configurations are highly dependent on the specific use case. This guide details the available customization options for the MLA stack. Customizations are available for four main areas:
- User-cluster Prometheus
- Seed-cluster Prometheus
- Alertmanager rules
- Grafana dashboards
Familiarity with the Installation of the Master / Seed MLA Stack is recommended before proceeding.
User Cluster Prometheus
Each user cluster is monitored by a dedicated Prometheus instance that runs within its namespace on the seed cluster.
This instance is responsible for collecting metrics from the user cluster’s control plane.
Note: The scope of this Prometheus is limited to the user cluster’s control plane. It does not collect metrics from applications or workloads running inside the user cluster.
While the lifecycle of this Prometheus is managed automatically by KKP, you can still add custom rules.
To do so, specify your desired rules in the KubermaticConfiguration custom resource.
Rules
KKP provides a default set of rules. However, new custom rules can be added, or the default set can be disabled.
Custom rules
Custom rules can be added by defining them as a YAML-formatted string under the spec.monitoring.customRules
field.
apiVersion: kubermatic.k8c.io/v1
kind: KubermaticConfiguration
metadata:
name: <<mykubermatic>>
namespace: kubermatic
spec:
# Monitoring can be used to fine-tune to in-cluster Prometheus.
monitoring:
# CustomRules can be used to inject custom recording and alerting rules. This field
# must be a YAML-formatted string with a `group` element at its root, as documented
# on https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/.
# This value is treated as a Go template, which allows to inject dynamic values like
# the internal cluster address or the cluster ID. Refer to pkg/resources/prometheus
# and the documentation for more information on the available fields.
customRules: |
groups:
- name: my-custom-group
rules:
- alert: MyCustomAlert
annotations:
message: Something happened in {{ $labels.namespace }}
expr: |
sum(rate(machine_controller_errors_total[5m])) by (namespace) > 0.01
for: 10m
labels:
severity: warning
Disable the default rules
The default rules provided by KKP can be disabled by setting the spec.monitoring.disableDefaultRules
flag to true
.
apiVersion: kubermatic.k8c.io/v1
kind: KubermaticConfiguration
metadata:
name: <<mykubermatic>>
namespace: kubermatic
spec:
# Monitoring can be used to fine-tune to in-cluster Prometheus.
monitoring:
# DisableDefaultRules disables the recording and alerting rules.
disableDefaultRules: true
Scraping Configs
The scraping behavior of Prometheus can be customized. New scraping configurations can be added, and the default configurations can be disabled.
Add Custom Scraping Configurations
Custom scraping configurations can be specified by adding them under the spec.monitoring.customScrapingConfigs
field.
apiVersion: kubermatic.k8c.io/v1
kind: KubermaticConfiguration
metadata:
name: <<mykubermatic>>
namespace: kubermatic
spec:
# Monitoring can be used to fine-tune to in-cluster Prometheus.
monitoring:
# CustomScrapingConfigs can be used to inject custom scraping rules. This must be a
# YAML-formatted string containing an array of scrape configurations as documented
# on https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config.
# This value is treated as a Go template, which allows to inject dynamic values like
# the internal cluster address or the cluster ID. Refer to pkg/resources/prometheus
# and the documentation for more information on the available fields.
customScrapingConfigs: |
- job_name: 'schnitzel'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_kubermatic_scrape]
action: keep
regex: true
Disable Default Scraping Configurations
The default scraping configurations provided by KKP can be disabled. This is accomplished by setting the spec.monitoring.disableDefaultScrapingConfigs
flag to true
.
apiVersion: kubermatic.k8c.io/v1
kind: KubermaticConfiguration
metadata:
name: <<mykubermatic>>
namespace: kubermatic
spec:
# Monitoring can be used to fine-tune to in-cluster Prometheus.
monitoring:
# DisableDefaultScrapingConfigs disables the default scraping targets.
disableDefaultScrapingConfigs: true
Seed Cluster Prometheus
This Prometheus is primarily used to collect metrics from the user clusters and then provide those to Grafana. In contrast to the Prometheus mentioned above, this one is deployed via a Helm chart, and you can use Helm’s native customization options.
Labels
Additional labels can be sent to Alertmanager with each alert. These are specified by adding an externalLabels
element to the values.yaml
file:
prometheus:
externalLabels:
mycustomlabel: a value
rack: rack17
location: europe
Rules
Rules include recording rules (for precomputing expensive queries) and alerts. There are three different ways of customizing them.
values.yaml
Custom rules can be added directly to the values.yaml
file:
prometheus:
rules:
groups:
- name: myrules
rules:
- alert: DatacenterIsOnFire
annotations:
message: |
The datacenter has gone up in flames, someone should quickly find an extinguisher.
You can reach the local emergency services by calling 0118 999 881 999 119 7253.
expr: temperature{server=~"kubernetes.+"} > 100
for: 5m
labels:
severity: critical
This will lead to them being written to a dedicated _customrules.yaml
and included in Prometheus. This approach is recommended for adding a small number of rules.
Extending the Helm Chart
For larger sets of rules, new YAML files can be placed inside the rules/
directory before the Helm chart is deployed. These files will be included automatically. They will be included as you would expect. To ensure future compatibility and prevent update conflicts, existing files within the chart should not be modified. The method for disabling predefined rules is described in the next section.
Custom ConfigMaps/Secrets
For large deployments with many independently managed rules, you can make use of custom volumes to mount your configuration into Prometheus. For this, to work, you need to create your own ConfigMap or Secret inside the monitoring
namespace. Then configure the Prometheus chart using the values.yaml
to mount those appropriately like so:
prometheus:
volumes:
- name: example-rules-volume
mountPath: /example/rules
configMap: example-rules
After mounting the files into the pod, you need to make sure that Prometheus loads them by extending the ruleFiles
list:
prometheus:
ruleFiles:
- '/etc/prometheus/rules/*.yaml'
- '/example/rules/*.yaml'
Managing the ruleFiles
is also the way to disable the predefined rules by just removing the applicable item from the list. You can also keep the list completely empty to disable any and all alerts.
Long-term metrics storage
By default, the seed Prometheus is configured to store 15 day’s worth of metrics.
It can be customized via overriding the prometheus.tsdb.retentionTime
field in values.yaml
used for chart installation.
If you would like to store the metrics for the long term, typically other solutions like Thanos are used. Thanos integration is a more involved process. Please read more about Thanos integration.
Alertmanager
The Alertmanager configuration can be modified in the values.yaml
file:
alertmanager:
config:
global:
slack_api_url: https://hooks.slack.com/services/YOUR_KEYS_HERE
route:
receiver: default
repeat_interval: 1h
routes:
- receiver: blackhole
match:
severity: none
receivers:
- name: blackhole
- name: default
slack_configs:
- channel: '#alerting'
send_resolved: true
Please review the Alertmanager Configuration Guide for detailed configuration syntax.
You can review the Alerting Runbook for a reference of alerts that Kubermatic Kubernetes Platform (KKP) monitoring setup can fire, alongside a short description and steps to debug.
Grafana Dashboards
Customizing Grafana entails three different aspects:
- Datasources (like Prometheus, InfluxDB, …)
- Dashboard providers (telling Grafana where to load dashboards from)
- Dashboards themselves
In all cases, you have two general approaches: Either take the Grafana Helm chart and place additional files into the existing directory structure or leave the Helm chart as-is and use the values.yaml
and your own ConfigMaps/Secrets to hold your customizations. This is very similar to how customizing the seed-level Prometheus works, so if you read that chapter, you will feel right at home.
Datasources
To create a new datasource, you can either put a new YAML file inside the provisioning/datasources/
directory or extend your values.yaml
like so:
grafana:
provisioning:
datasources:
extra:
# list your new datasources here
- name: influxdb
type: influxdb
access: proxy
org_id: 1
url: http://influxdb.monitoring.svc.cluster.local:9090
version: 1
editable: false
You can also remove the default Prometheus datasource if you really want to by either deleting the prometheus.yaml
or pointing the source
directive inside your values.yaml
to a different, empty directory:
grafana:
provisioning:
datasources:
source: empty/
Note that by removing the default Prometheus datasource and not providing an alternative with the same name, the default dashboards will not work anymore.
Dashboard Providers
Configuring providers works much in the same way as configuring datasources: either place new files in the provisioning/dashboards/
directory or use the values.yaml
accordingly:
grafana:
provisioning:
dashboards:
extra:
# list your new datasources here
- folder: "Example Resources"
name: "example"
options:
path: /example/dashboards
org_id: 1
type: file
Customizing the providers is especially important if you also want to add your own dashboards. You can point the options.path
path to a newly mounted volume to load dashboards from (see below).
Dashboards
Just like with datasources and providers, new dashboards can be placed in the existing dashboards/
directory. Do note though that if you create a new folder (like dashboards/example/
), you also must create a new dashboard provider to tell Grafana about it. Your dashboards will be loaded and included in the default ConfigMap, but without the new provider, Grafana will not see them.
Following the example above, if you put your dashboards in dashboards/example/
, you need a dashboard provider with the options.path
set to /grafana-dashboard-definitions/example
, because the ConfigMap is mounted to /grafana-dashboard-definitions
.
You can also use your own ConfigMaps or Secrets and have the Grafana deployment mount them. This is useful for larger customizations with lots of dashboards that you want to manage independently. To use an external ConfigMap, create it like so:
- apiVersion: v1
kind: ConfigMap
metadata:
name: example-dashboards
data:
dashboard1.json: |
{ ... Grafana dashboard JSON here ... }
dashboard2.json: |
{ ... Grafana dashboard JSON here ... }
Make sure to create your ConfigMap in the monitoring
namespace and then use the volumes
directive in your values.yaml
to tell the Grafana Helm chart about your ConfigMap:
grafana:
volumes:
- name: example-dashboards-volume
mountPath: /grafana-dashboard-definitions/example
configMap: example-dashboards
Using a Secret instead of a ConfigMap works identically, just specify secretName
instead of configMap
in the volumes
section.
Remember that you still need a custom dashboard provider to make Grafana load your new dashboards.
Custom Resource State Metrics
kube-state-metrics helm chart deployed on a seed/master cluster can be extended to get state metrics of custom resources as well. For this, we need to enable customResourceState
& pass the configuration for custom state metrics.
kubeStateMetrics:
customResourceState:
enabled: true
config:
spec:
resources:
- groupVersionKind:
group: helm.toolkit.fluxcd.io
version: "v2beta2"
kind: HelmRelease
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
suspended: [spec, suspend]
ready: [status, conditions, "[type=Ready]", status]
Additionally, the RBAC rules must be updated to allow kube-state-metrics to perform the necessary operations on the custom resource(s).
kubeStateMetrics:
rbac:
extraRules:
- apiGroups:
- helm.toolkit.fluxcd.io
resources:
- helmreleases
verbs: [ "list", "watch" ]
For configuring more custom resources, refer to the example kube-state-metrics-config.yaml.