Getting Started¶
Learn how to use deployKF in production.
Easily deploy the best of Kubeflow and other MLOps tools as a complete platform!
Introduction¶
This page is about using deployKF in production. We will cover requirements, configuration, the deployment process, and basic usage of the platform.
We recommend new users start with the Introduction and Try Locally guides:
Existing Kubeflow users should use the Migration guide, and everyone is welcome to join our Slack:
1. Requirements¶
Please ensure you meet the following requirements before using deployKF in production.
Kubernetes Cluster¶
deployKF can run on any Kubernetes cluster, in any cloud or environment. See the version matrix for a list of supported Kubernetes versions.
For example, deployKF can run on the following Kubernetes distributions:
Target Platform | Kubernetes Distribution |
---|---|
Amazon Web Services | Amazon Elastic Kubernetes Service (EKS) |
Microsoft Azure | Azure Kubernetes Service (AKS) see special requirements |
Google Cloud | Google Kubernetes Engine (GKE) |
IBM Cloud | IBM Cloud Kubernetes Service (IKS) |
Self-Hosted | Rancher (RKE) // kOps // Kubespray // kubeadm |
Edge | k3s // k0s // MicroK8s |
Local Machine | k3d // Kind // Minikube |
Dedicated Cluster
We strongly recommend using a dedicated cluster for deployKF. This is because deployKF has a number of cluster-level dependencies which may conflict with other applications.
If you are unable to create a new Kubernetes cluster, you may consider using vCluster to create a virtual Kubernetes cluster within an existing one.
Argo CD Dependency¶
deployKF requires Argo CD for managing the platform.
You may either use deployKF with an existing ArgoCD, or deploy a new one (if you don't already have it), both options are covered later in this guide.
Can I use <other tool> instead of Argo CD?
Not yet.
While we believe that Argo CD is currently the best in its category, we recognize that it's not the only option. In the future, we may support other Kubernetes GitOps tools (like Flux CD), or even build a deployKF-specific solution.
deployKF will make your MLOps life so much easier, that it's still worth using, even if you don't already love Argo CD. If you want, you can largely treat Argo CD as a "black box" and just use the provided sync scripts to manage the platform.
To learn more about this decision, and participate in the discussion, see deployKF/deployKF#110
.
Kubernetes Requirements¶
Your Kubernetes cluster must meet the following requirements:
Configuration | Requirement | Notes |
---|---|---|
Node Resources | The nodes must collectively have at least 4 vCPUs and 16 GB RAM , and 64 GB Storage . | |
CPU Architecture | The cluster must have x86_64 CPU Nodes. | ARM64 Support |
Internet Access | The cluster must have internet access for pulling images and installing dependencies. | Offline Clusters |
Cluster Domain | The clusterDomain of your kubelet must be "cluster.local" . | |
Service Type | By default, the cluster must have a LoadBalancer service type. | Override Service Type |
Default StorageClass | The default StorageClass must support the ReadWriteOnce access mode. | Override StorageClass |
Existing Argo Workflows | The cluster must NOT already have Argo Workflows installed. Note, other Argo tools like Argo CD are fine. | Join the discussion: deployKF#116 |
ARM64 Support
ARM64 Support¶
Currently, deployKF only supports x86_64
architecture clusters.
The next minor version of deployKF (v0.2.0
) should have native ARM64
for all core components. However, some upstream apps like Kubeflow Pipelines will need extra work to be production ready (#10309
, #10308
).
Offline Clusters
Offline Clusters¶
deployKF can be used in offline and air-gapped clusters, but there are additional steps required.
Please see the Air-Gapped Clusters guide for more information.
Override Service Type
Override Service Type¶
By default, deployKF uses a LoadBalancer
service type for the gateway.
For real-world usage, you should review the Expose the Gateway Service guide.
In some clusters, the LoadBalancer
service type will create a public IP address. Consider the security implications before deploying, or use a different service type.
If you do not want this, you may override the service type to ClusterIP
by setting the following value:
deploykf_core:
deploykf_istio_gateway:
gatewayService:
type: "ClusterIP"
Override StorageClass
Override StorageClass¶
By default, deployKF requires a default StorageClass that supports the ReadWriteOnce
access mode.
If you do NOT have a compatible default StorageClass, you might consider the following options:
- Configure a default StorageClass that has
ReadWriteOnce
support - Explicitly set the
storageClass
value for the following components: - Disable components which require the StorageClass, and use external alternatives:
Linux Node Requirements¶
If you are self-hosting your Kubernetes cluster, you must ensure that your Linux nodes meet the following requirements:
Configuration | Requirement | Notes |
---|---|---|
Inotify Limits | Linux nodes must have sufficient inotify limits. Note, common distributions like Ubuntu do not ship with sufficient defaults. | Increase Inotify Limits |
Kernel Modules | Linux nodes must have the required kernel modules for Istio. | Istio Kernel Modules |
Increase Inotify Limits
Increase Inotify Limits¶
You may need to increase the fs.inotify.max_user_*
sysctl values on your nodes (only for Linux nodes). Otherwise, you may encounter Pod crashes with an error message like this:
too many open files
This error has been discussed in the upstream Kubeflow repo (kubeflow/manifests#2087
), to resolve it, you will need to increase your system's open/watched file limits:
-
Modify
/etc/sysctl.conf
to include the following lines:fs.inotify.max_user_instances = 1280 fs.inotify.max_user_watches = 655360
-
Now, apply immediately the changes with the following command:
sudo sysctl -p
Istio Kernel Modules
Istio Kernel Modules¶
Your nodes must have the required kernel modules for Istio. Otherwise, you may encounter crashes in the Istio sidecars or other strange network behaviour.
-
Get a list of the currently loaded kernel modules by running
lsmod
:lsmod | awk '{print $1}' | sort
-
At the time of writing, the following command will enable the required kernel modules on boot:
## NOTE: if you are using Istio ambient mode, there are additional modules required cat <<EOF | sudo tee /etc/modules-load.d/99-istio-modules.conf br_netfilter ip_tables iptable_filter iptable_mangle iptable_nat iptable_raw nf_nat x_tables xt_REDIRECT xt_conntrack xt_multiport xt_owner xt_tcpudp EOF
-
Now, either reboot your nodes or immediately load the modules with the following commands (which will also indicate if any modules are missing):
sudo modprobe br_netfilter sudo modprobe ip_tables sudo modprobe iptable_filter sudo modprobe iptable_mangle sudo modprobe iptable_nat sudo modprobe iptable_raw sudo modprobe nf_nat sudo modprobe x_tables sudo modprobe xt_REDIRECT sudo modprobe xt_conntrack sudo modprobe xt_multiport sudo modprobe xt_owner sudo modprobe xt_tcpudp
2. Platform Configuration¶
deployKF is very configurable, you can use it to deploy a wide variety of machine learning platforms and integrate with your existing infrastructure.
deployKF Values¶
All aspects of your deployKF platform are configured with YAML-based configs named "values". See the values page for more information.
deployKF Versions¶
The "version" of your platform is the version of the generator package you are using. For information about upgrading, see the upgrade guide and changelog.
Can I be notified of new releases?
Yes. Watch the deployKF/deployKF
repo on GitHub.
At the top right, click Watch
→ Custom
→ Releases
then confirm by selecting Apply
.
Cluster Dependencies¶
deployKF has a number of cluster dependencies including Istio, cert-manager, and Kyverno. See the cluster dependencies page for an overview.
Existing Cluster Dependencies
deployKF installs its own versions of the cluster dependencies by default.
If you have existing versions on the cluster, you MUST configure deployKF to use them:
- Use Existing Istio
- Use Existing cert-manager
Use Existing Kyverno(coming soon)
External Dependencies¶
deployKF has a number of external dependencies including MySQL and an Object Store. See the external dependencies page for an overview.
Connect External Dependencies
deployKF includes embedded versions of MySQL and MinIO for development and testing.
We strongly recommend connecting external versions for production use:
3. Deploy the Platform¶
Create ArgoCD Applications ¶
deployKF uses ArgoCD to manage the deployment of the platform. The process to create the ArgoCD Applications
will depend on which mode of operation you have chosen.
Step 1 - Prepare ArgoCD
You will need to have ArgoCD deployed on your cluster, this ArgoCD instance must have the deployKF ArgoCD Plugin installed. Follow the appropriate guide for your situation:
Tips
- If you use the ArgoCD "management cluster" pattern, please see: Off-Cluster ArgoCD
- If you have an offline cluster, please see: Air-Gapped Clusters
Step 2 - Learn about Values
deployKF is configured by centralized values which define the desired state of the platform.
Sample Values:
Each version of deployKF has sample values with all ML & Data tools enabled, along with some sensible security defaults. We recommend using the sample values as a starting point for your custom values.
Here are the sample-values.yaml
for deployKF 0.1.5
.
Custom Values:
In ArgoCD Plugin Mode, values can be defined inline (values
), or from a git repository (values_files
).
Both methods may be used together. When a value is defined in multiple places, the result is calculated by merging, with files listed later taking precedence, and inline values having the highest precedence.
Tip
Learn about common configuration tasks in the Configure deployKF guide.
Step 3 - Define an App-of-Apps
Create a local file named deploykf-app-of-apps.yaml
with the contents of the YAML below.
In this example, we will define an app-of-apps that:
- Clones the
deploykf/deploykf
repo at thev0.1.5
tag. - Sets the
source_version
parameter to use deployKF version0.1.5
. - Sets the
values_files
parameter to read thesample-values.yaml
from the repo. - Sets the
values
parameter with inline values that override thesample-values.yaml
.
What is an App-of-Apps?
An app-of-apps is a special ArgoCD Application
which manages other applications.
Can I read values from my own repo?
Yes. In this example, we only use the deploykf/deploykf
repo to easily read the default sample-values.yaml
file. See Step 4 to read values from a different repo.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: deploykf-app-of-apps
namespace: argocd
labels:
app.kubernetes.io/name: deploykf-app-of-apps
app.kubernetes.io/part-of: deploykf
spec:
## NOTE: if not "default", you MUST ALSO set the `argocd.project` value
project: "default"
source:
## source git repo configuration
## - we use the 'deploykf/deploykf' repo so we can read its 'sample-values.yaml'
## file, but you may use any repo (even one with no files)
##
repoURL: "https://github.com/deployKF/deployKF.git"
targetRevision: "v0.1.5"
path: "."
## plugin configuration
##
plugin:
name: "deploykf"
parameters:
## the deployKF generator version
## - available versions: https://github.com/deployKF/deployKF/releases
##
- name: "source_version"
string: "0.1.5"
## paths to values files within the `repoURL` repository
## - the values in these files are merged, with later files taking precedence
## - we strongly recommend using 'sample-values.yaml' as the base of your values
## so you can easily upgrade to newer versions of deployKF
##
- name: "values_files"
array:
- "./sample-values.yaml"
## a string containing the contents of a values file
## - this parameter allows defining values without needing to create a file in the repo
## - these values are merged with higher precedence than those defined in `values_files`
##
- name: "values"
string: |
##
## This demonstrates how you might structure overrides for the 'sample-values.yaml' file.
## For a more comprehensive example, see the 'sample-values-overrides.yaml' in the main repo.
##
## Notes:
## - YAML maps are RECURSIVELY merged across values files
## - YAML lists are REPLACED in their entirety across values files
## - Do NOT include empty/null sections, as this will remove ALL values from that section.
## To include a section without overriding any values, set it to an empty map: `{}`
##
## --------------------------------------------------------------------------------
## argocd
## --------------------------------------------------------------------------------
argocd:
namespace: argocd
project: default
## --------------------------------------------------------------------------------
## kubernetes
## --------------------------------------------------------------------------------
kubernetes:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
## --------------------------------------------------------------------------------
## deploykf-dependencies
## --------------------------------------------------------------------------------
deploykf_dependencies:
## --------------------------------------
## cert-manager
## --------------------------------------
cert_manager:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
## --------------------------------------
## istio
## --------------------------------------
istio:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
## --------------------------------------
## kyverno
## --------------------------------------
kyverno:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
## --------------------------------------------------------------------------------
## deploykf-core
## --------------------------------------------------------------------------------
deploykf_core:
## --------------------------------------
## deploykf-auth
## --------------------------------------
deploykf_auth:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
## --------------------------------------
## deploykf-istio-gateway
## --------------------------------------
deploykf_istio_gateway:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
## --------------------------------------
## deploykf-profiles-generator
## --------------------------------------
deploykf_profiles_generator:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
## --------------------------------------------------------------------------------
## deploykf-opt
## --------------------------------------------------------------------------------
deploykf_opt:
## --------------------------------------
## deploykf-minio
## --------------------------------------
deploykf_minio:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
## --------------------------------------
## deploykf-mysql
## --------------------------------------
deploykf_mysql:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
## --------------------------------------------------------------------------------
## kubeflow-tools
## --------------------------------------------------------------------------------
kubeflow_tools:
## --------------------------------------
## katib
## --------------------------------------
katib:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
## --------------------------------------
## notebooks
## --------------------------------------
notebooks:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
## --------------------------------------
## pipelines
## --------------------------------------
pipelines:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
destination:
server: "https://kubernetes.default.svc"
namespace: "argocd"
Step 4 - Read Values from Git (optional)
You may use the values_files
parameter to read values from a git repo. This lets you version your values files in git, and easily update them without changing the app-of-apps resource.
Danger
We STRONGLY RECOMMEND using a PRIVATE repo for your values files!
If your git repo is private, you must configure ArgoCD with credentials to access the repo. For example, when using a GitHub repo, you might create a Secret with a Personal Access Token (PAT) as follows:
# create a secret with your GitHub credentials
# NOTE: kubectl can't create and label a secret in one command, so we use a pipe
kubectl create secret generic --dry-run=client -o yaml \
"argocd-repository--MY_GITHUB_REPO" \
--namespace "argocd" \
--from-literal=type="git" \
--from-literal=url="https://github.com/MY_GITHUB_ORG/MY_GITHUB_REPO.git" \
--from-literal=username="MY_GITHUB_USERNAME" \
--from-literal=password="MY_GITHUB_PAT" \
| kubectl label --local --dry-run=client -o yaml -f - \
"argocd.argoproj.io/secret-type"="repository" \
| kubectl apply -f -
If you use the upstream sample-values.yaml
as a base, you will also need to push that file to your repo.
The following command will download the sample-values.yaml
file for deployKF 0.1.5
:
# download the `sample-values.yaml` file
curl -fL -o "sample-values-0.1.5.yaml" \
"https://raw.githubusercontent.com/deployKF/deployKF/v0.1.5/sample-values.yaml"
For example, say you now have the following files in your repo:
sample-values-0.1.5.yaml
values-1.yaml
values-2.yaml
Your app-of-apps resource may then be updated to look like this:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: deploykf-app-of-apps
namespace: argocd
labels:
app.kubernetes.io/name: deploykf-app-of-apps
app.kubernetes.io/part-of: deploykf
spec:
project: "default"
source:
## source git repo configuration
##
repoURL: "https://github.com/MY_GITHUB_ORG/MY_GITHUB_REPO.git"
targetRevision: "main"
path: "."
## plugin configuration
##
plugin:
name: "deploykf"
parameters:
## the deployKF generator version
##
- name: "source_version"
string: "0.1.5"
## paths to values files within the `repoURL` repository
##
- name: "values_files"
array:
- "./sample-values-0.1.5.yaml"
- "./values-1.yaml"
- "./values-2.yaml"
## a string containing the contents of a values file
## - this parameter allows defining values without needing to create a file in the repo
## - these values are merged with higher precedence than those defined in `values_files`
##
#- name: "values"
# string: |
# ...
# values file contents
# ...
destination:
server: "https://kubernetes.default.svc"
namespace: "argocd"
Step 5 - Apply App-of-Apps Resource
Apply the deploykf-app-of-apps.yaml
file to your cluster with the following command:
kubectl apply -f ./deploykf-app-of-apps.yaml
Step 1 - Prepare ArgoCD
If you have not already deployed ArgoCD on your cluster, you will need to do so.
Please see the ArgoCD Getting Started Guide for instructions.
TIP: If you use an ArgoCD "management cluster" pattern, see the off-cluster ArgoCD guide.
Step 2 - Install the deployKF CLI
If you have not already installed the deploykf
CLI on your local machine, you will need to do so.
Please see the CLI Installation Guide for instructions.
Step 3 - Prepare a Git Repo
You will need to create a git repo to store your generated manifests.
Danger
We STRONGLY RECOMMEND using a PRIVATE repo for your manifests!
If your git repo is private, you must configure ArgoCD with credentials to access the repo. For example, when using a GitHub repo, you might create a Secret with a Personal Access Token (PAT) as follows:
# create a secret with your GitHub credentials
# NOTE: kubectl can't create and label a secret in one command, so we use a pipe
kubectl create secret generic --dry-run=client -o yaml \
"argocd-repository--MY_GITHUB_REPO" \
--namespace "argocd" \
--from-literal=type="git" \
--from-literal=url="https://github.com/MY_GITHUB_ORG/MY_GITHUB_REPO.git" \
--from-literal=username="MY_GITHUB_USERNAME" \
--from-literal=password="MY_GITHUB_PAT" \
| kubectl label --local --dry-run=client -o yaml -f - \
"argocd.argoproj.io/secret-type"="repository" \
| kubectl apply -f -
Step 4 - Create Values Files
deployKF is configured by centralized values which define the desired state of the platform.
Sample Values:
Each version of deployKF has sample values with all ML & Data tools enabled, along with some sensible security defaults. We recommend using the sample values as a starting point for your custom values.
The following command will download the sample-values.yaml
file for deployKF 0.1.5
:
# download the `sample-values.yaml` file
curl -fL -o "sample-values-0.1.5.yaml" \
"https://raw.githubusercontent.com/deployKF/deployKF/v0.1.5/sample-values.yaml"
Custom Values:
In Manifests Repo Mode, values are passed to the deploykf generate
command as YAML files. When a value is defined in multiple files, the result is calculated by merging, with files listed later taking precedence.
To make upgrades easier, we recommend using the sample values as a base, and applying custom override files with only the values you want to change. This allows you to swap out the sample values for a newer version in the future.
Tip
Learn about common configuration tasks in the Configure deployKF guide.
For example, you might structure your custom-overrides.yaml
file like this:
##
## Notes:
## - YAML maps are RECURSIVELY merged across values files
## - YAML lists are REPLACED in their entirety across values files
## - Do NOT include empty/null sections, as this will remove ALL values from that section.
## To include a section without overriding any values, set it to an empty map: `{}`
##
## --------------------------------------------------------------------------------
## argocd
## --------------------------------------------------------------------------------
argocd:
namespace: argocd
project: default
source:
## the git repo where you will store your generated manifests
## - url: the URL of the git repo
## - revision: the git branch/tag/commit to read from
## - path: the repo folder path where the generated manifests are stored
##
repo:
url: "https://github.com/deployKF/examples.git"
revision: "main"
path: "./GENERATOR_OUTPUT/"
## --------------------------------------------------------------------------------
## kubernetes
## --------------------------------------------------------------------------------
kubernetes:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
## --------------------------------------------------------------------------------
## deploykf-dependencies
## --------------------------------------------------------------------------------
deploykf_dependencies:
## --------------------------------------
## cert-manager
## --------------------------------------
cert_manager:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
## --------------------------------------
## istio
## --------------------------------------
istio:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
## --------------------------------------
## kyverno
## --------------------------------------
kyverno:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
## --------------------------------------------------------------------------------
## deploykf-core
## --------------------------------------------------------------------------------
deploykf_core:
## --------------------------------------
## deploykf-auth
## --------------------------------------
deploykf_auth:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
## --------------------------------------
## deploykf-istio-gateway
## --------------------------------------
deploykf_istio_gateway:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
## --------------------------------------
## deploykf-profiles-generator
## --------------------------------------
deploykf_profiles_generator:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
## --------------------------------------------------------------------------------
## deploykf-opt
## --------------------------------------------------------------------------------
deploykf_opt:
## --------------------------------------
## deploykf-minio
## --------------------------------------
deploykf_minio:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
## --------------------------------------
## deploykf-mysql
## --------------------------------------
deploykf_mysql:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
## --------------------------------------------------------------------------------
## kubeflow-tools
## --------------------------------------------------------------------------------
kubeflow_tools:
## --------------------------------------
## katib
## --------------------------------------
katib:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
## --------------------------------------
## notebooks
## --------------------------------------
notebooks:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
## --------------------------------------
## pipelines
## --------------------------------------
pipelines:
{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!
Step 5 - Generate Manifests
The deploykf generate
command writes manifests into a folder based on your values. When more than one --values
file is provided, they are merged, with later files taking precedence.
For example, to generate manifests using deployKF version 0.1.5
under ./GENERATOR_OUTPUT/
:
deploykf generate \
--source-version "0.1.5" \
--values ./sample-values-0.1.5.yaml \
--values ./custom-overrides.yaml \
--output-dir ./GENERATOR_OUTPUT
Do NOT Edit Manifests Directly
In general, you should NOT edit the manifests generated by deployKF. Changes in the --output-dir
will be overwritten each time the deploykf generate
command runs.
If you need to change something which is not configurable via values, please raise an issue so we can understand your use-case and potentially add a new configuration option.
Step 6 - Commit Generated Manifests
After running deploykf generate
, you will need to commit the manifests to your repo, so ArgoCD can apply them to your cluster:
# for example, to directly commit changes to the 'main' branch of your repo
git add GENERATOR_OUTPUT
git commit -m "my commit message"
git push origin main
Step 7 - Apply App-of-Apps Manifest
The only manifest you need to manually apply is the app-of-apps, which creates all the other ArgoCD applications.
The app-of-apps.yaml
manifest is generated at the root of your --output-dir
folder, so you can apply it with:
kubectl apply --filename GENERATOR_OUTPUT/app-of-apps.yaml
Required Values - Azure AKS
When deploying on Azure AKS, you MUST set the following values, or the platform will not work correctly:
kubernetes:
azure:
admissionsEnforcerFix: true
For more information, please see the PR which introduced this value deployKF/deployKF#85
.
Sync ArgoCD Applications ¶
Now that your deployKF app-of-apps has been applied, you must sync the ArgoCD applications to deploy your platform. Syncing an application will cause ArgoCD to reconcile the actual state in the cluster, to match the state defined by the application resource.
Danger
DO NOT sync all the Applications
at once!!!
The deployKF Applications
depend on each other, they MUST be synced in the correct order to avoid errors. If you manually sync them all, you may need to uninstall and start over.
There are a few ways to sync the applications, you only need to use ONE of them.
The recommended way to sync the applications is with the automated script.
Step - Run the Sync Script
We provide the sync_argocd_apps.sh
script to automatically sync the applications that make up deployKF. Learn more about the automated sync script from the scripts
folder README .
For example, to run the script, you might use the following commands:
# download the latest version of the script
curl -fL -o "sync_argocd_apps.sh" \
"https://raw.githubusercontent.com/deployKF/deployKF/main/scripts/sync_argocd_apps.sh"
# ensure the script is executable
chmod +x ./sync_argocd_apps.sh
# ensure your kubectl context is set correctly
kubectl config current-context
# run the script
bash ./sync_argocd_apps.sh
About the sync script
- The script can take around 5-10 minutes to run on first install.
- If the script fails or is interrupted, you can safely re-run it, and it will pick up where it left off.
- There are a number of configuration variables at the top of the script which change the default behavior.
- Learn more about the automated sync script from the
scripts
folder README in the deployKF repo.
Please be aware of the following issue when using the automated sync script:
Bug in ArgoCD v2.9
There is a known issue (deploykf/deploykf#70
, argoproj/argo-cd#16266
) with all 2.9.X
versions of the ArgoCD CLI that will cause the sync script to fail with the following error:
==========================================================================================
Logging in to ArgoCD...
==========================================================================================
FATA[0000] cannot find pod with selector: [app.kubernetes.io/name=] - use the --{component}-name flag in this command or set the environmental variable (Refer to https://argo-cd.readthedocs.io/en/stable/user-guide/environment-variables), to change the Argo CD component name in the CLI
Please upgrade your argocd
CLI to at least version 2.10.0
to resolve this issue.
Alternatively, you can sync the applications using the ArgoCD Web UI.
Step 1 - Access ArgoCD Web UI
For production usage, you may want to expose ArgoCD with a LoadBalancer
or Ingress
.
For testing, you may use kubectl
port-forwarding to expose the ArgoCD Web UI on your local machine:
kubectl port-forward --namespace "argocd" svc/argocd-server 8090:https
The ArgoCD Web UI should now be available at the following URL:
If this is the first time you are using ArgoCD, you will need to retrieve the initial password for the admin
user:
echo $(kubectl -n argocd get secret/argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -d)
Once you log in with the admin
user and above password, the Web UI should look like this:
Step 2 - Sync Applications
You MUST sync the deployKF applications in the correct order. For each application, click the SYNC
button, and wait for the application to become "Healthy" before syncing the next.
The applications are grouped and ordered as follows:
Group 0: "app-of-apps"
First, you must sync the app-of-apps application:
deploykf-app-of-apps
deploykf-namespaces
(only exists when using off-cluster ArgoCD)
Group 1: "deploykf-dependencies"
Second, you must sync the applications with the label app.kubernetes.io/component=deploykf-dependencies
:
dkf-dep--cert-manager
(may fail on first attempt)dkf-dep--istio
dkf-dep--kyverno
WARNING: for this group, each application MUST be synced INDIVIDUALLY and the preceding application MUST be "Healthy" before syncing the next.
Group 2: "deploykf-core"
Third, you must sync the applications with the label app.kubernetes.io/component=deploykf-core
:
dkf-core--deploykf-istio-gateway
dkf-core--deploykf-auth
dkf-core--deploykf-dashboard
dkf-core--deploykf-profiles-generator
(may fail on first attempt)
Group 3: "deploykf-opt"
Fourth, you must sync the applications with the label app.kubernetes.io/component=deploykf-opt
:
dkf-opt--deploykf-minio
dkf-opt--deploykf-mysql
Group 4: "deploykf-tools"
Fifth, you must sync the applications with the label app.kubernetes.io/component=deploykf-tools
:
- (none yet)
Group 5: "kubeflow-dependencies"
Sixth, you must sync the applications with the label app.kubernetes.io/component=kubeflow-dependencies
:
kf-dep--argo-workflows
Group 6: "kubeflow-tools"
Seventh, you must sync the applications with the label app.kubernetes.io/component=kubeflow-tools
:
kf-tools--katib
kf-tools--notebooks--jupyter-web-app
kf-tools--notebooks--notebook-controller
kf-tools--pipelines
kf-tools--poddefaults-webhook
kf-tools--tensorboards--tensorboard-controller
kf-tools--tensorboards--tensorboards-web-app
kf-tools--training-operator
kf-tools--volumes--volumes-web-app
4. Use the Platform¶
Now that you have a working deployKF machine learning platform, here are some things to try out!
Expose the deployKF Dashboard ¶
The deployKF dashboard is the web-based interface for deployKF, it gives users authenticated access to tools like Kubeflow Pipelines, Kubeflow Notebooks, and Katib.
All public deployKF services (including the dashboard) are accessed via the deployKF Istio Gateway, you will need to expose its Kubernetes Service.
Step 1 - Expose the Gateway
You may expose the deployKF Istio Gateway Service in a number of ways:
- Expose with:
kubectl port-forward
(for local testing only) - Expose with:
LoadBalancer
Service - Expose with:
Ingress
Step 2 - Configure DNS
Trying to access deployKF with an IP address will NOT work, you MUST use a domain name.
See Configure DNS Records for more information.
This step is REQUIRED, you MUST configure DNS records or local /etc/hosts
entries.
Step 3 - Configure TLS (optional)
We recommend configuring valid TLS/HTTPS certificates to avoid browser warnings for your users.
See the Configure TLS Certificates guide for more information.
If you want to configure TLS later, just skip this step for now.
We use a self-signed certificate by default.
Step 4 - User Authentication (optional)
See the following guides to configure user authentication on your platform:
If you want to configure authentication later, just skip this step for now.
We provide a few static credentials by default.
Step 5 - Define Profiles (optional)
deployKF uses the concept of "Profiles" to group users and resources together. You might define profiles for different teams, projects, or even individual users.
See the User Authorization and Profile Management guide for more information.
If you want to define profiles later, just skip this step for now.
We provide default profiles named team-1
and team-1-prod
.
Step 6 - Log In
You should now be presented with a "Log In" screen when you visit the exposed URL.
Remember, you can NOT access deployKF with an IP address, you MUST use a domain name.
By default, there are a few static credentials set by the deploykf_core.deploykf_auth.dex.staticPasswords
value:
Credentials: User 1
Username: user1@example.com
Password: user1
- This account has write access to
team-1
profile. - This account has read access to
team-1-prod
.
Credentials: User 2
Username: user2@example.com
Password: user2
- This account has write access to
team-1
profile. - This account has read access to
team-1-prod
.
Credentials: Admin (DO NOT USE - will be removed in future versions)
Username: admin@example.com
Password: admin
- This account is the default "owner" of all profiles.
- This account does NOT have access to "MinIO Console" or "Argo Server UI".
- We recommend NOT using this account, and actually removing its
staticPasswords
entry. - We recommend leaving this account as the default "owner", even with
@example.com
as the domain (because profile owners can't be changed).
Step 7 - Explore the Tools
deployKF includes many tools which address different stages of the data & machine learning lifecycle:
We also provide a number of user-focused guides for these tools:
Tool | User Guide |
---|---|
Kubeflow Pipelines | Access Kubeflow Pipelines API |
Kubeflow Pipelines | GitOps for Kubeflow Pipelines Schedules |
Next Steps¶
Created: 2023-04-24