MySQL¶
Learn how and why deployKF needs MySQL. Learn how to use an external MySQL database to improve the performance and reliability of Kubeflow Pipelines and Katib.
What is MySQL?¶
MySQL is an extremely popular and open-source relational database management system. Many of the world's largest applications use MySQL to store and manage their data.
Why does deployKF use MySQL?¶
MySQL is a dependency of the following ML & Data tools, which are part of deployKF:
- Kubeflow Pipelines: stores metadata about pipelines, experiments, and runs
- Katib: stores metadata about hyperparameter tuning experiments
Connect an External MySQL¶
By default, deployKF includes an embedded MySQL instance. However, to improve the performance and reliability of Kubeflow Pipelines and Katib, we recommend using an external MySQL database.
Embedded MySQL
You should ALWAYS use an external MySQL database. The embedded MySQL is a single-instance server running in a Kubernetes Pod, with no backups or high-availability.
1. Prepare MySQL¶
You may use any MySQL database service, as long as it is accessible from the Kubernetes cluster where deployKF is running.
You may consider using one of the following services:
Platform | MySQL Service |
---|---|
Amazon Web Services | Amazon Relational Database Service (RDS) |
Microsoft Azure | Azure Database for MySQL |
Google Cloud | Cloud SQL |
Alibaba Cloud | ApsaraDB RDS for MySQL |
IBM Cloud | IBM Cloud Databases for MySQL |
Self-Hosted | MySQL Community Edition |
You must create some databases (schemas) and a user with the appropriate permissions to access them.
MySQL User Authentication
You MUST set the user's authentication plugin to mysql_native_password
, NOT caching_sha2_password
, which is the default in MySQL 8.0.4+. Kubeflow Pipelines does not support caching_sha2_password
(kubeflow/pipelines#9549
).
The following SQL command will show the authentication plugin for each user:
SELECT user, host, plugin FROM mysql.user;
For example, you might run the following SQL commands to create the databases and users:
-- create the databases
CREATE DATABASE IF NOT EXISTS `katib`;
CREATE DATABASE IF NOT EXISTS `kfp_cache`;
CREATE DATABASE IF NOT EXISTS `kfp_metadata`;
CREATE DATABASE IF NOT EXISTS `kfp_pipelines`;
-- create the 'kubeflow' user (allowing access from any host)
CREATE USER 'kubeflow'@'%' IDENTIFIED WITH mysql_native_password BY 'MY_PASSWORD';
-- grant access to the databases
GRANT ALL PRIVILEGES ON `katib`.* TO 'kubeflow'@'%';
GRANT ALL PRIVILEGES ON `kfp_cache`.* TO 'kubeflow'@'%';
GRANT ALL PRIVILEGES ON `kfp_metadata`.* TO 'kubeflow'@'%';
GRANT ALL PRIVILEGES ON `kfp_pipelines`.* TO 'kubeflow'@'%';
2. Disable the Embedded MySQL¶
The deploykf_opt.deploykf_mysql.enabled
value controls if the embedded MySQL instance is deployed.
The following values will disable the embedded MySQL instance:
deploykf_opt:
deploykf_mysql:
enabled: false
3. Connect Katib¶
To connect Katib to your external MySQL database, you will need to configure the following values:
Value | Purpose |
---|---|
kubeflow_tools.katib.mysqlDatabase | name of database/schema |
kubeflow_tools.katib.mysql | connection details & credentials |
The following values will connect Katib to an external MySQL database at mysql.example.com
on port 3306
, using the katib
database, reading the username and password from a Kubernetes secret called my-secret-name
(from the kubeflow
namespace):
kubeflow_tools:
katib:
mysqlDatabase: "katib"
mysql:
useExternal: true
host: "mysql.example.com"
port: 3306
auth:
## (OPTION 1):
## - set username/password with values (NOT RECOMMENDED)
#username: kubeflow
#password: password
## (OPTION 2):
## - read a kubernetes secret from the 'kubeflow' namespace
## - note, `existingSecret*Key` specifies the KEY NAMES in the
## secret itself, which contain the secret values
existingSecret: "my-secret-name"
existingSecretUsernameKey: "username"
existingSecretPasswordKey: "password"
4. Connect Kubeflow Pipelines¶
To connect Kubeflow Pipelines to your external MySQL database, you will need to configure the following values:
Value | Purpose |
---|---|
kubeflow_tools.pipelines.mysqlDatabases | names of databases/schemas |
kubeflow_tools.pipelines.mysql | connection details & credentials |
The following values will connect Kubeflow Pipelines to an external MySQL database at mysql.example.com
on port 3306
, using the kfp_cache
, kfp_metadata
, and kfp_pipelines
databases, reading the username and password from a Kubernetes secret called my-secret-name
(from the kubeflow
namespace):
kubeflow_tools:
pipelines:
mysqlDatabases:
cacheDatabase: kfp_cache
metadataDatabase: kfp_metadata
pipelinesDatabase: kfp_pipelines
mysql:
useExternal: true
host: "mysql.example.com"
port: 3306
auth:
## (OPTION 1):
## - set username/password with values (NOT RECOMMENDED)
#username: kubeflow
#password: password
## (OPTION 2):
## - read a kubernetes secret from the 'kubeflow' namespace
## - note, `existingSecret*Key` specifies the KEY NAMES in the
## secret itself, which contain the secret values
existingSecret: "my-secret-name"
existingSecretUsernameKey: "username"
existingSecretPasswordKey: "password"
Created: 2023-08-16