Skip to content

Configure Kubeflow Notebooks

Learn how to configure Kubeflow Notebooks in deployKF. Use custom environments, GPU acceleration, faster storage, and more!


Overview

Kubeflow Notebooks allows users to spawn Pods running instances of JupyterLab, Visual Studio Code (code-server), and RStudio in profile namespaces.

As the cluster administrator, you may configure which options are available to users when spawning a Notebook Pod:

  • Container Images
  • Container Resources (CPU, Memory, GPU)
  • Storage Volumes
  • Advanced Pod Options (Affinity, Tolerations, PodDefaults)
  • Idle Notebook Culling

Kubeflow Notebooks Limitations

The current version of Kubeflow Notebooks exposes many Kubernetes-specific concepts to users, which may be confusing for non-technical users. There is an upstream proposal to abstract away these concepts in a more user-friendly way, see kubeflow/kubeflow#7156 for more information.


When the kubeflow_tools.notebooks.spawnerFormDefaults values are updated, this has no effect on existing Notebook Pods, only new Pods will use the updated values.

Container Images

Container images are the "environment" which users will be working in when using a Notebook Pod, and can be configured to provide different tools and packages to users.

The following values configure which container images are available to users when spawning a Notebook Pod:

Container Resources

Container resources directly correspond to Kubernetes Container Resources which are requested by the Notebook Pod.

The following values configure the resource requests/limits for containers in Notebook Pods:

Resource Requests

Kubernetes uses resource requests when scheduling Pods, and does not strictly enforce them at runtime. User Notebooks are not well-behaved applications (from a resource perspective), so will likely impact other Pods running on the same node.

However, setting resource limits will have unintended consequences for users, as the Notebook Pod will be terminated if it exceeds certain limits (like memory), which may result in lost work.

A common alternative is to use a dedicated node for each Notebook Pod, see Advanced Pod Options for information on how to do this with Affinity and Tolerations.

Storage Volumes

Storage volumes are used to provide persistent storage to Notebook Pods between restarts, and are implemented using Kubernetes Persistent Volumes.

The following values configure the storage volumes for Notebook Pods:

StorageClass and Performance

The kubeflow_tools.notebooks.spawnerFormDefaults.workspaceVolume.value.newPvc.spec.storageClassName value defines which Kubernetes StorageClass is used to provision the workspace volume. If a storageClassName is not specified, the cluster's default StorageClass is used.

As ML workloads are often IO-intensive, it is recommended to use a StorageClass which provides high-performance, typically this is only possible with drives which are attached to the node, rather than network-attached storage.

Advanced Pod Options

Advanced Pod Options are additional configurations for Notebook Pods which manage things like Pod Affinity, Node Tolerations, and Kubeflow's PodDefaults.

The following values configure the advanced options for Notebook Pods:

Dedicated Node for each Notebook Pod

Because Notebook Pods are not well-behaved applications (from a resource perspective), it is common to want a dedicated node for each Notebook Pod. With a combination of Pod Affinity and Node Tolerations, this can be achieved.

Note, this will require your cluster to have node-autoscaling configured (e.g. Cluster Autoscaler or Karpenter), as the cluster will need to provision a new node for each Notebook Pod.


First, you will need to make one or more groups of nodes that are tainted to prevent other Pods from being scheduled on them. In the following example, we have four groups of nodes with different CPU/Memory configurations, that are each tainted with a different value of the dedicated key with effect NoSchedule:

  • Key: dedicated, Value: kubeflow-c5.xlarge, Effect: NoSchedule
  • Key: dedicated, Value: kubeflow-c5.2xlarge, Effect: NoSchedule
  • Key: dedicated, Value: kubeflow-c5.4xlarge, Effect: NoSchedule
  • Key: dedicated, Value: kubeflow-r5.8xlarge, Effect: NoSchedule

Next, you will need to configure Pod Affinity configs that do not allow two Notebook Pods to be scheduled on the same node. In the following example, we do this by:

  • Using nodeAffinity to require a Node with label lifecycle=kubeflow-notebook
  • Using podAntiAffinity to require a Node WITHOUT an existing Pod having notebook-name label

Finally, you may use the following values to expose these options to users:

kubeflow_tools:
  notebooks:
    spawnerFormDefaults:
      ## Affinity
      ##  - note, setting `readOnly` to `true` to ensures that this affinity is always applied
      ##  - note, `namespaceSelector` was added in Kubernetes 1.22, 
      ##    so this will NOT work on older clusters
      ##
      affinityConfig:
        readOnly: true
        value: "dedicated_node_per_notebook"
        options:
          - configKey: "dedicated_node_per_notebook"
            displayName: "Dedicated Node Per Notebook"
            affinity:
              ## Require a Node with label `lifecycle=kubeflow-notebook`
              nodeAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  nodeSelectorTerms:
                    - matchExpressions:
                        - key: "lifecycle"
                          operator: "In"
                          values:
                            - "kubeflow-notebook"

              ## Require a Node WITHOUT an existing Pod having `notebook-name` label
              podAntiAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  - labelSelector:
                      matchExpressions:
                        - key: "notebook-name"
                          operator: "Exists"
                    topologyKey: "kubernetes.io/hostname"
                    namespaceSelector: {}

      ## Tolerations
      ##
      tolerationGroup:
        readOnly: false
        value: "group_1"
        options:
          - groupKey: "group_1"
            displayName: "4 CPU 8Gb Mem at ~$X.XXX USD per day"
            tolerations:
              - key: "dedicated"
                operator: "Equal"
                value: "kubeflow-c5.xlarge"
                effect: "NoSchedule"

          - groupKey: "group_2"
            displayName: "8 CPU 16Gb Mem at ~$X.XXX USD per day"
            tolerations:
              - key: "dedicated"
                operator: "Equal"
                value: "kubeflow-c5.2xlarge"
                effect: "NoSchedule"

          - groupKey: "group_3"
            displayName: "16 CPU 32Gb Mem at ~$X.XXX USD per day"
            tolerations:
              - key: "dedicated"
                operator: "Equal"
                value: "kubeflow-c5.4xlarge"
                effect: "NoSchedule"

          - groupKey: "group_4"
            displayName: "32 CPU 256Gb Mem at ~$X.XXX USD per day"
            tolerations:
              - key: "dedicated"
                operator: "Equal"
                value: "kubeflow-r5.8xlarge"
                effect: "NoSchedule"

Users will then be able to select which group of nodes they want to use by choosing the corresponding "Toleration" group when spawning their Notebook.

PodDefault for Kubeflow Pipelines Authentication

The kubeflow_tools.pipelines.profileResourceGeneration.kfpApiTokenPodDefault value configures if a PodDefault named "kubeflow-pipelines-api-token" is automatically generated in each profile namespace.

If the user selects this "configuration" when spawning their Notebook, they will be able to use the Kubeflow Pipelines Python SDK from the Notebook without needing to manually authenticate.

To have this "configuration" selected by default in the spawner, you may use the following values:

kubeflow_tools:
  notebooks:
    spawnerFormDefaults:
      configurations:
        value:
          - "kubeflow-pipelines-api-token"

For more information, see the Access Kubeflow Pipelines API user guide.

Idle Notebook Culling

Kubeflow Notebooks supports automatically culling idle Notebook Pods, which is configured by the kubeflow_tools.notebooks.notebookCulling values.

For example, the following values will enable idle culling after 1 day of inactivity:

kubeflow_tools:
  notebooks:
    notebookCulling:
      enabled: true
      idleTime: 1440 # 1 day in minutes

Jupyter Notebooks Only

Currently, only Jupyter Notebooks are supported for idle culling, see the upstream design proposal for more information.

Override Notebook Template

Sometimes, you may need to make additional changes that are not possible with the spawnerFormDefaults values. To achieve this, you may override the Notebook YAML template with the kubeflow_tools.notebooks.notebookTemplate value.

Default Notebook Template

You must include a FULL Notebook YAML template, this is because setting notebookTemplate completely replaces the default.

Find the default Notebook template by checking which version of Kubeflow Notebooks is included with your version of deployKF. Next, retrieve the default template from the kubeflow/kubeflow repository under ./components/crud-web-apps/jupyter/backend/apps/common/yaml/notebook_template.yaml (select the appropriate git tag).

For example, the following values set a container securityContext on all Notebook Pods:

kubeflow_tools:
  notebooks:
    notebookTemplate: |
      apiVersion: kubeflow.org/v1beta1
      kind: Notebook
      metadata:
        name: {name}
        namespace: "{namespace}"
        labels:
          app: {name}
        annotations:
          notebooks.kubeflow.org/server-type: ""
      spec:
        template:
          spec:
            serviceAccountName: {serviceAccount}
            containers:
              - name: {name}
                image: ""
                ## ============= BEGIN: Changes =============
                securityContext:
                  ## WARNING: these settings will NOT work until Kubeflow 1.9.0 / deployKF 0.2.0
                  ##          https://github.com/kubeflow/kubeflow/pull/7622
                  allowPrivilegeEscalation: false
                  capabilities:
                    drop:
                      - ALL
                  runAsNonRoot: true

                  ## WARNING: setting `readOnlyRootFilesystem` to `true` will NOT work,
                  ##          there are currently no plans to support this feature
                  #readOnlyRootFilesystem: true
                ## ============= END: Changes ===============
                volumeMounts: []
                env: []
                resources:
                  requests:
                    cpu: "0.1"
                    memory: "0.1Gi"
            volumes: []
            tolerations: []

Last update: 2024-06-29
Created: 2023-08-16