Customization of the Master / Seed MLA Stack

This chapter describes the customization of the KKP Master / Seed Monitoring, Logging & Alerting Stack.

When it comes to monitoring, no approach fits all use cases. It’s expected that you will want to adjust things to your needs and this page describes the various places where customizations can be applied. In broad terms, there are four main areas that are discussed:

  • customer-cluster Prometheus
  • seed-cluster Prometheus
  • alertmanager rules
  • Grafana dashboards

You will want to familiarize yourself with the Installation of the Master / Seed MLA Stack before reading any further.

User Cluster Prometheus

The basic source of metrics is the Prometheus inside each user cluster namespace. It will track the customer clusters control plane (IMPORTANT: it is NOT responsible for the components running in the customer clusters themselves.)

This Prometheus is deployed as part of Kubermatic Kubernetes Platform’s (KKP) cluster creation, which means you cannot directly affect its deployment.

Therefore to still allow customization of rules, KKP provides the possibility to specify rules as part of the values.yaml which gets fed to the KKP chart.

Rules

Custom rules can be added beneath the clusterNamespacePrometheus.rules key:

kubermatic:
  clusterNamespacePrometheus:
    disableDefaultRules: false
    rules:
      groups:
      - name: my-custom-group
        rules:
        - alert: MyCustomAlert
          annotations:
            message: Something happened in {{ $labels.namespace }}
          expr: |
            sum(rate(machine_controller_errors_total[5m])) by (namespace) > 0.01            
          for: 10m
          labels:
            severity: warning

If you’d like to disable the default rules coming with KKP itself, you can specify the disableDefaultRules flag:

kubermatic:
  clusterNamespacePrometheus:
    disableDefaultRules: false

Scraping Configs

Custom scraping configs can be specified by adding the corresponding entries beneath the clusterNamespacePrometheus.scrapingConfigs key in the values.yaml:

clusterNamespacePrometheus:
  scrapingConfigs:
  - job_name: 'schnitzel'
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_annotation_kubermatic_scrape]
      action: keep
      regex: true

Also, the default KKP scraping configs can be disabled in the same way:

clusterNamespacePrometheus:
  disableDefaultScrapingConfigs: true

Seed Cluster Prometheus

This Prometheus is primarily used to collect metrics from the customer clusters and then provide those to Grafana. In contrast to the Prometheus mentioned above, this one is deployed via a Helm chart and you can use Helm’s native customization options.

Labels

To specify additional labels that are sent to the alertmanager whenever an alert occurs, you can add an externalLabels element to your values.yaml and list your desired labels there:

prometheus:
  externalLabels:
    mycustomlabel: a value
    rack: rack17
    location: europe

Rules

Rules include recording rules (for precomputing expensive queries) and alerts. There are three different ways of customizing them.

values.yaml

You can add our own rules by adding them to the values.yaml like so:

prometheus:
  rules:
    groups:
      - name: myrules
        rules:
        - alert: DatacenterIsOnFire
          annotations:
            message: |
              The datacenter has gone up in flames, someone should quickly find an extinguisher.
              You can reach the local emergency services by calling 0118 999 881 999 119 7253.              
          expr: temperature{server=~"kubernetes.+"} > 100
          for: 5m
          labels:
            severity: critical

This will lead to them being written to a dedicated _customrules.yaml and included in Prometheus. Use this approach if you only have a few rules that you’d like to add.

Extending the Helm Chart

If you have more than a couple of rules, you can also place new YAML files inside the rules/ directory before you deploy the Helm chart. They will be included as you would expect. To prevent maintenance headaches further down the road you should never change the existing files inside the chart. If you need to get rid of the predefined rules, see the next section on how to achieve it.

Custom ConfigMaps/Secrets

For large deployments with many independently managed rules, you can make use of custom volumes to mount your configuration into Prometheus. For this, to work, you need to create your own ConfigMap or Secret inside the monitoring namespace. Then configure the Prometheus chart using the values.yaml to mount those appropriately like so:

prometheus:
  volumes:
  - name: example-rules-volume
    mountPath: /example/rules
    configMap: example-rules

After mounting the files into the pod you need to make sure that Prometheus loads them by extending the ruleFiles list:

prometheus:
  ruleFiles:
  - '/etc/prometheus/rules/*.yaml'
  - '/example/rules/*.yaml'

Managing the ruleFiles is also the way to disable the predefined rules by just removing the applicable item from the list. You can also keep the list completely empty to disable any and all alerts.

Long-term metrics storage

By default, the seed prometheus is configured to store 1 days worth of metrics. It can be customized via overriding prometheus.tsdb.retentionTime field in values.yaml used for chart installation.

If you would like to store the metrics for longer term, typically other solutions like Thanos are used. Thanos integration is a more involved process. Please read more about thanos integration.

Alertmanager

Alertmanager configuration can be tweaked via values.yaml like so:

alertmanager:
  config:
    global:
      slack_api_url: https://hooks.slack.com/services/YOUR_KEYS_HERE
    route:
      receiver: default
      repeat_interval: 1h
      routes:
        - receiver: blackhole
          match:
            severity: none
    receivers:
      - name: blackhole
      - name: default
        slack_configs:
          - channel: '#alerting'
            send_resolved: true

Please review the Alertmanager Configuration Guide for detailed configuration syntax.

You can review the Alerting Runbook for a reference of alerts that Kubermatic Kubernetes Platform (KKP) monitoring setup can fire, alongside a short description and steps to debug.

Grafana Dashboards

Customizing Grafana entails three different aspects:

  • Datasources (like Prometheus, InfluxDB, …)
  • Dashboard providers (telling Grafana where to load dashboards from)
  • Dashboards themselves

In all cases, you have two general approaches: Either take the Grafana Helm chart and place additional files into the existing directory structure or leave the Helm chart as-is and use the values.yaml and your own ConfigMaps/Secrets to hold your customizations. This is very similar to how customizing the seed-level Prometheus works, so if you read that chapter, you will feel right at home.

Datasources

To create a new datasource, you can either put a new YAML file inside the provisioning/datasources/ directory or extend your values.yaml like so:

grafana:
  provisioning:
    datasources:
      extra:
      # list your new datasources here
      - name: influxdb
        type: influxdb
        access: proxy
        org_id: 1
        url: http://influxdb.monitoring.svc.cluster.local:9090
        version: 1
        editable: false

You can also remove the default Prometheus datasource if you really want to by either deleting the prometheus.yaml or pointing the source directive inside your values.yaml to a different, empty directory:

grafana:
  provisioning:
    datasources:
      source: empty/

Note that by removing the default Prometheus datasource and not providing an alternative with the same name, the default dashboards will not work anymore.

Dashboard Providers

Configuring providers works much in the same way as configuring datasources: either place new files in the provisioning/dashboards/ directory or use the values.yaml accordingly:

grafana:
  provisioning:
    dashboards:
      extra:
      # list your new datasources here
      - folder: "Example Resources"
        name: "example"
        options:
          path: /example/dashboards
        org_id: 1
        type: file

Customizing the providers is especially important if you want to also add your own dashboards. You can point the options.path path to a newly mounted volume to load dashboards from (see below).

Dashboards

Just like with datasources and providers, new dashboards can be placed in the existing dashboards/ directory. Do note though that if you create a new folder (like dashboards/example/), you also must create a new dashboard provider to tell Grafana about it. Your dashboards will be loaded and included in the default ConfigMap, but without the new provider, Grafana will not see them.

Following the example above, if you put your dashboards in dashboards/example/, you need a dashboard provider with the options.path set to /grafana-dashboard-definitions/example, because the ConfigMap is mounted to /grafana-dashboard-definitions.

You can also use your own ConfigMaps or Secrets and have the Grafana deployment mount them. This is useful for larger customizations with lots of dashboards that you want to manage independently. To use an external ConfigMap, create it like so:

- apiVersion: v1
  kind: ConfigMap
  metadata:
    name: example-dashboards
  data:
    dashboard1.json: |
      { ... Grafana dashboard JSON here ... }      

    dashboard2.json: |
      { ... Grafana dashboard JSON here ... }      

Make sure to create your ConfigMap in the monitoring namespace and then use the volumes directive in your values.yaml to tell the Grafana Helm chart about your ConfigMap:

grafana:
  volumes:
  - name: example-dashboards-volume
    mountPath: /grafana-dashboard-definitions/example
    configMap: example-dashboards

Using a Secret instead of a ConfigMap works identically, just specify secretName instead of configMap in the volumes section.

Remember that you still need a custom dashboard provider to make Grafana load your new dashboards.