Configure Kubeflow Notebooks¶
Learn how to configure Kubeflow Notebooks in deployKF. Use custom environments, GPU acceleration, faster storage, and more!
Overview¶
Kubeflow Notebooks allows users to spawn Pods running instances of JupyterLab, Visual Studio Code (code-server), and RStudio in profile namespaces.
As the cluster administrator, you may configure which options are available to users when spawning a Notebook Pod:
- Container Images
- Container Resources (CPU, Memory, GPU)
- Storage Volumes
- Advanced Pod Options (Affinity, Tolerations, PodDefaults)
- Idle Notebook Culling
Kubeflow Notebooks Limitations
The current version of Kubeflow Notebooks exposes many Kubernetes-specific concepts to users, which may be confusing for non-technical users. There is an upstream proposal to abstract away these concepts in a more user-friendly way, see kubeflow/kubeflow#7156
for more information.
When the kubeflow_tools.notebooks.spawnerFormDefaults
values are updated, this has no effect on existing Notebook Pods, only new Pods will use the updated values.
Container Images¶
Container images are the "environment" which users will be working in when using a Notebook Pod, and can be configured to provide different tools and packages to users.
The following values configure which container images are available to users when spawning a Notebook Pod:
- Jupyter-Like:
kubeflow_tools.notebooks.spawnerFormDefaults.image
- VSCode-like:
kubeflow_tools.notebooks.spawnerFormDefaults.imageGroupOne
- RStudio-like:
kubeflow_tools.notebooks.spawnerFormDefaults.imageGroupTwo
Container Resources¶
Container resources directly correspond to Kubernetes Container Resources which are requested by the Notebook Pod.
The following values configure the resource requests/limits for containers in Notebook Pods:
- CPU:
kubeflow_tools.notebooks.spawnerFormDefaults.cpu
- Memory:
kubeflow_tools.notebooks.spawnerFormDefaults.memory
- GPU:
kubeflow_tools.notebooks.spawnerFormDefaults.gpu
Resource Requests
Kubernetes uses resource requests when scheduling Pods, and does not strictly enforce them at runtime. User Notebooks are not well-behaved applications (from a resource perspective), so will likely impact other Pods running on the same node.
However, setting resource limits will have unintended consequences for users, as the Notebook Pod will be terminated if it exceeds certain limits (like memory), which may result in lost work.
A common alternative is to use a dedicated node for each Notebook Pod, see Advanced Pod Options for information on how to do this with Affinity and Tolerations.
Storage Volumes¶
Storage volumes are used to provide persistent storage to Notebook Pods between restarts, and are implemented using Kubernetes Persistent Volumes.
The following values configure the storage volumes for Notebook Pods:
- Home Volume:
kubeflow_tools.notebooks.spawnerFormDefaults.workspaceVolume
- Data Volume:
kubeflow_tools.notebooks.spawnerFormDefaults.dataVolumes
StorageClass and Performance
The kubeflow_tools.notebooks.spawnerFormDefaults.workspaceVolume.value.newPvc.spec.storageClassName
value defines which Kubernetes StorageClass is used to provision the workspace volume. If a storageClassName
is not specified, the cluster's default StorageClass is used.
As ML workloads are often IO-intensive, it is recommended to use a StorageClass which provides high-performance, typically this is only possible with drives which are attached to the node, rather than network-attached storage.
Advanced Pod Options¶
Advanced Pod Options are additional configurations for Notebook Pods which manage things like Pod Affinity, Node Tolerations, and Kubeflow's PodDefaults.
The following values configure the advanced options for Notebook Pods:
- Pod Affinity:
kubeflow_tools.notebooks.spawnerFormDefaults.affinityConfig
- Node Tolerations:
kubeflow_tools.notebooks.spawnerFormDefaults.tolerationGroup
- PodDefaults:
kubeflow_tools.notebooks.spawnerFormDefaults.configurations
Dedicated Node for each Notebook Pod
Because Notebook Pods are not well-behaved applications (from a resource perspective), it is common to want a dedicated node for each Notebook Pod. With a combination of Pod Affinity and Node Tolerations, this can be achieved.
Note, this will require your cluster to have node-autoscaling configured (e.g. Cluster Autoscaler or Karpenter), as the cluster will need to provision a new node for each Notebook Pod.
First, you will need to make one or more groups of nodes that are tainted to prevent other Pods from being scheduled on them. In the following example, we have four groups of nodes with different CPU/Memory configurations, that are each tainted with a different value of the dedicated
key with effect NoSchedule
:
- Key:
dedicated
, Value:kubeflow-c5.xlarge
, Effect:NoSchedule
- Key:
dedicated
, Value:kubeflow-c5.2xlarge
, Effect:NoSchedule
- Key:
dedicated
, Value:kubeflow-c5.4xlarge
, Effect:NoSchedule
- Key:
dedicated
, Value:kubeflow-r5.8xlarge
, Effect:NoSchedule
Next, you will need to configure Pod Affinity configs that do not allow two Notebook Pods to be scheduled on the same node. In the following example, we do this by:
- Using
nodeAffinity
to require a Node with labellifecycle=kubeflow-notebook
- Using
podAntiAffinity
to require a Node WITHOUT an existing Pod havingnotebook-name
label
Finally, you may use the following values to expose these options to users:
kubeflow_tools:
notebooks:
spawnerFormDefaults:
## Affinity
## - note, setting `readOnly` to `true` to ensures that this affinity is always applied
## - note, `namespaceSelector` was added in Kubernetes 1.22,
## so this will NOT work on older clusters
##
affinityConfig:
readOnly: true
value: "dedicated_node_per_notebook"
options:
- configKey: "dedicated_node_per_notebook"
displayName: "Dedicated Node Per Notebook"
affinity:
## Require a Node with label `lifecycle=kubeflow-notebook`
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "lifecycle"
operator: "In"
values:
- "kubeflow-notebook"
## Require a Node WITHOUT an existing Pod having `notebook-name` label
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "notebook-name"
operator: "Exists"
topologyKey: "kubernetes.io/hostname"
namespaceSelector: {}
## Tolerations
##
tolerationGroup:
readOnly: false
value: "group_1"
options:
- groupKey: "group_1"
displayName: "4 CPU 8Gb Mem at ~$X.XXX USD per day"
tolerations:
- key: "dedicated"
operator: "Equal"
value: "kubeflow-c5.xlarge"
effect: "NoSchedule"
- groupKey: "group_2"
displayName: "8 CPU 16Gb Mem at ~$X.XXX USD per day"
tolerations:
- key: "dedicated"
operator: "Equal"
value: "kubeflow-c5.2xlarge"
effect: "NoSchedule"
- groupKey: "group_3"
displayName: "16 CPU 32Gb Mem at ~$X.XXX USD per day"
tolerations:
- key: "dedicated"
operator: "Equal"
value: "kubeflow-c5.4xlarge"
effect: "NoSchedule"
- groupKey: "group_4"
displayName: "32 CPU 256Gb Mem at ~$X.XXX USD per day"
tolerations:
- key: "dedicated"
operator: "Equal"
value: "kubeflow-r5.8xlarge"
effect: "NoSchedule"
Users will then be able to select which group of nodes they want to use by choosing the corresponding "Toleration" group when spawning their Notebook.
PodDefault for Kubeflow Pipelines Authentication
The kubeflow_tools.pipelines.profileResourceGeneration.kfpApiTokenPodDefault
value configures if a PodDefault
named "kubeflow-pipelines-api-token"
is automatically generated in each profile namespace.
If the user selects this "configuration" when spawning their Notebook, they will be able to use the Kubeflow Pipelines Python SDK from the Notebook without needing to manually authenticate.
To have this "configuration" selected by default in the spawner, you may use the following values:
kubeflow_tools:
notebooks:
spawnerFormDefaults:
configurations:
value:
- "kubeflow-pipelines-api-token"
For more information, see the Access Kubeflow Pipelines API user guide.
Idle Notebook Culling¶
Kubeflow Notebooks supports automatically culling idle Notebook Pods, which is configured by the kubeflow_tools.notebooks.notebookCulling
values.
For example, the following values will enable idle culling after 1 day of inactivity:
kubeflow_tools:
notebooks:
notebookCulling:
enabled: true
idleTime: 1440 # 1 day in minutes
Jupyter Notebooks Only
Currently, only Jupyter Notebooks are supported for idle culling, see the upstream design proposal for more information.
Override Notebook Template¶
Sometimes, you may need to make additional changes that are not possible with the spawnerFormDefaults
values. To achieve this, you may override the Notebook
YAML template with the kubeflow_tools.notebooks.notebookTemplate
value.
Default Notebook Template
You must include a FULL Notebook
YAML template, this is because setting notebookTemplate
completely replaces the default.
Find the default Notebook
template by checking which version of Kubeflow Notebooks is included with your version of deployKF. Next, retrieve the default template from the kubeflow/kubeflow
repository under ./components/crud-web-apps/jupyter/backend/apps/common/yaml/notebook_template.yaml
(select the appropriate git tag).
For example, the following values set a container securityContext
on all Notebook Pods:
kubeflow_tools:
notebooks:
notebookTemplate: |
apiVersion: kubeflow.org/v1beta1
kind: Notebook
metadata:
name: {name}
namespace: "{namespace}"
labels:
app: {name}
annotations:
notebooks.kubeflow.org/server-type: ""
spec:
template:
spec:
serviceAccountName: {serviceAccount}
containers:
- name: {name}
image: ""
## ============= BEGIN: Changes =============
securityContext:
## WARNING: these settings will NOT work until Kubeflow 1.9.0 / deployKF 0.2.0
## https://github.com/kubeflow/kubeflow/pull/7622
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
runAsNonRoot: true
## WARNING: setting `readOnlyRootFilesystem` to `true` will NOT work,
## there are currently no plans to support this feature
#readOnlyRootFilesystem: true
## ============= END: Changes ===============
volumeMounts: []
env: []
resources:
requests:
cpu: "0.1"
memory: "0.1Gi"
volumes: []
tolerations: []
Created: 2023-08-16