The Kubeflow Addon (Flowmatic) allows automated installation of Kubeflow Machine Learning Toolkit for Kubernetes in KKP, with Kubeflow authentication integrated with KKP.
The Kubeflow Addon is still under development, the current version is just a feature preview.
Before installing the Kubeflow Addon in a KKP user cluster, the following prerequisites have to be met:
This addon works with KKP version 2.16+, in user clusters with Service Account Token Volume Projection feature enabled. KKP clusters with Kubernetes version v1.20+ have this feature automatically enabled, in KKP clusters with older versions of Kubernetes this feature has to be enabled explicitly, as described in the KKP Documentation.
Before this addon can be deployed in a KKP user cluster, the KKP installation has to be configured to enable Kubeflow as an accessible addon. This needs to be done by the KKP installation administrator, once per KKP installation.
KubermaticConfiguration
as follows:spec.userClusters.addons.kubernetes.dockerRepository
to point to the provided addon Docker image repository,kubeflow
into spec.api.accessibleAddons
.For deploying Kubeflow in a KKP user cluster, please make sure that you go through the prerequisites prerequisites for running Kubeflow.
If your machine learning workloads require GPU acceleration, make sure you are using GPU-enabled machines when creating a user cluster. For more details about GPU support, please refer to the GPU Acceleration Settings section below.
Once the Kubeflow Addon is installed in KKP, it can be deployed into a user cluster via the KKP UI as shown below:
The UI will provide several options that can be used to customize the Kubeflow installation, as shown below.
These options will be described in detail in the following section.
By default, the Kubeflow dashboard is only accessible via a k8s NodePort service. This can be changed by enabling the
option Expose via LoadBalancer
, which exposes the Kubeflow dashboard using a LoadBalancer k8s service.
For a LoadBalancer service, an external IP address will be assigned by the cloud provider at which the cluster is running.
This address can be retrieved by reviewing the istio-ingressgateway
Service in istio-system
Namespace, e.g.:
$ kubectl get service istio-ingressgateway -n istio-system
NAME TYPE CLUSTER-IP EXTERNAL-IP
istio-ingressgateway LoadBalancer 10.240.28.214 a286f5a47e9564e43ab4165039e58e5e-1598660756.eu-central-1.elb.amazonaws.com
This external IP can be used to access the Kubeflow dashboard, or for DNS setup if custom Domain Name
is used
(see the Domain Name section).
By default, the connection to the Kubeflow dashboard is handled by insecure HTTP connection. To use secure HTTPS instead,
select the Enable TLS
option. When selected, the addon will automatically request a TLS certificate for the specified
Domain Name
(described in the Domain Name section) issued by the Let’s Encrypt
certificate authority, and it will use it to configure HTTPS for accessing the dashboard.
This option works only if the Expose via LoadBalancer
option is enabled and a custom Domain Name
is set.
To access the Kubeflow dashboard via a custom domain name specifically assigned to a particular Kubeflow installation,
use the Domain Name
input box.
This setting configures the provided domain name on the Kubeflow side. To actually point that domain towards the proper
Kubernetes cluster and Service, a DNS record has to be created at your domain name service provider as well.
The domain name needs to be pointed to the external IP of the istio-ingressgateway
Service in the
istio-system
Namespace, e.g. using a CNAME
DNS entry.
Expose via LoadBalancer
has to be enabled to use a custom domain name.
By default, access into the Kubeflow dashboard is secured by basic authentication with static users.
Alternatively, an external OIDC authentication provider can be specified in the OIDC Provider URL
with the secret
specified in OIDC Secret
.
Since KKP already contains an OIDC provider to authenticate users logging into the KKP itself (Dex),
it is possible to point the Kubeflow addon to this KKP OIDC service. In case that your KKP runs on the domain:
https://kubermatic.company.com/
, you can configure the OIDC provider as https://kubermatic.company.com/dex
.
This setup however requires some configuration on the KKP platform side as well. The KKP installation administrator
has to add the following section into the KKP’s Helm values.yaml
before installing the oauth
chart
(see the Securing System Services documentation for more details):
dex:
clients:
- id: kubeflow-oidc-authservice
name: kubeflow-oidc-authservice
secret: <oidc-secret-passed-into-addon>
RedirectURIs:
- 'https://<kubeflow-installation-domain-name>/login/oidc'
To enable GPU (graphical processing unit) acceleration in a Kubeflow cluster, at least some of the nodes in the cluster need to have some GPU devices installed. Depending on the GPU device vendor (NVIDIA or AMD), the Kubeflow addon provides different options on how to enable them for k8s workloads.
For NVIDIA GPUs, the Kubeflow addon provides automated installation of all necessary software components on the nodes
in the cluster using the NVIDIA GPU Operator,
by selecting the Install NVIDIA Operator
option.
The NVIDIA GPU Operator will automatically take care of the installation of drivers, container runtime, and the k8s device plugin on all nodes, where NVIDIA GPUs are detected. This also works for any new nodes that join the cluster later.
Please review the NVIDIA GPU Operator Platform Support documentation to see which GPU models, operating systems and Kubernetes versions are supported.
For AMD GPUs, the addon only provides automated installation of the AMD GPU device plugin, by selecting the
Install AMD GPU Device Plugin
option.
The installation of GPU drivers on individual Kubernetes nodes is out of the scope of the device plugin - the drivers have to be installed in a different way (e.g. manually, or by using of a base node image with pre-installed AMD GPU drivers).
Please review the documentation of the AMD GPU device plugin for the driver installation instructions, prerequisites and limitations.
By default, the Istio RBAC
(Role Based Access Control) enforcement will be disabled in the Kubeflow installation, to not hit RBAC-related issues
described in the Limitations & Known Issues section of this document. This however means
less separation between multiple Kubeflow users within the cluster. If the below listed known issue is not problematic
for your Kubeflow installation, you can still enable RBAC enforcement using the Enable Istio RBAC
option.
This section contains a list of known issues in different Kubeflow components:
Kubermatic Kubernetes Platform
Istio RBAC in Kubeflow:
Kubeflow UI issues:
Kale Pipeline:
NVIDIA GPU Operator
AMD GPU Support