Future Tools¶
A list of ML & Data tools which may be part of future versions of deployKF.
How do I request or contribute a tool?
If you would like to request or contribute support for a tool, please raise an issue on GitHub, or join the discussion on an existing issue.
Tool Roadmap¶
The following is a roadmap of ML & Data tools which are planned for future versions of deployKF, grouped by priority.
Higher Priority¶
Name (Click for Details) | Purpose |
---|---|
MLflow Model Registry | Model Registry |
KServe | Model Serving |
Medium Priority¶
Name (Click for Details) | Purpose |
---|---|
Feast | Feature Store |
Apache Airflow | Workflow Orchestration |
Lower Priority¶
Name (Click for Details) | Purpose |
---|---|
DataHub | Data Catalog |
Airbyte | Data Integration |
Label Studio | Data Labeling |
BentoML Yatai | Model Serving |
Seldon Core | Model Serving |
Tool Details¶
The following sections provide details and descriptions of each tool which is planned for future versions of deployKF.
Details - MLflow Model Registry
MLflow Model Registry¶
Purpose | Model Registry |
---|---|
Maintainer | Databricks |
Documentation | Documentation |
Source Code | mlflow/mlflow |
Roadmap Priority | Higher |
A model registry decouples model training from model deployment, allowing you to break the model lifecycle down into three separate concerns. This separation enables you to have well-scoped pipelines, rather than trying to go from training to deployment all at once.
- Model Training: Training new versions of models and logging them into the registry.
- Model Evaluation: Evaluating versions of models and logging the results into the registry.
- Model Deployment: Making informed decisions about which models to deploy and then deploying them.
The key features of MLflow Model Registry are:
- Model Versioning: Version your model artifacts and attach metadata to each version.
- Model Stage Transitions: Transition models between stages (e.g. staging to production).
- Web UI: A graphical web interface for managing models.
- Python API: A Python API for managing models.
- REST API: A REST API for managing models.
Details - KServe
KServe¶
Purpose | Model Serving |
---|---|
Maintainer | Linux Foundation |
Documentation | Documentation |
Source Code | kserve/kserve |
Roadmap Priority | Higher |
The core features of KServe are:
- Support for Many Frameworks: KServe natively supports many ML frameworks (e.g. PyTorch, TensorFlow, scikit-learn, XGBoost).
- Autoscaling, Even to Zero: KServe can autoscale model replicas to meet demand, even scaling to zero when there are no requests.
- Model Monitoring: KServe integrates tools like Alibi Detect to provide model monitoring for drift and outlier detection.
- Model Explainability: KServe integrates tools like Alibi Explain to provide model explainability.
- Request Batching: KServe can batch requests to your model, improving throughput and reducing cost.
- Canary Deployments: KServe can deploy new versions of your model alongside old versions, and route requests to the new version based on a percentage.
- Feature Transformers: KServe can do feature pre/post processing alongside model inference (e.g. using Feast).
- Inference Graphs: KServe can chain multiple models together to form an inference graph.
Details - Feast
Feast¶
Purpose | Feature Store |
---|---|
Maintainer | Tecton |
Documentation | Documentation |
Source Code | feast-dev/feast |
Roadmap Priority | Medium |
A good way to understand the purpose of a feature store is to think about the data access patterns encountered during the model lifecycle. A feature store should somehow make these data access patterns easier.
- Feature Engineering: Accesses and transforms historical data to create features.
- Target Engineering: Accesses and transforms historical data to create targets.
- Model Training: Accesses features and targets to train and evaluate the model.
- Model Inference: Accesses features of new data to predict the target.
The key features of Feast are:
- Feature Registry: Where Feast persists feature definitions (not data) that are registered with with it (e.g. Local-Files, S3, GCS).
- Python SDK: The primary interface for managing feature definitions, and retrieving feature values from Feast.
- Offline Data Stores: A store which Feast can read feature values from, for historical data retrieval (e.g. Snowflake, BigQuery, Redshift).
- Online Data Stores: A store which Feast can materialize (write) feature values into, for online model inference (e.g. Snowflake, Redis, DynamoDB, Bigtable).
- Batch Materialization Engine: A data processing engine which Feast can use to materialize feature values from an Offline Store into an Online Store (e.g. Snowflake, Spark, Bytewax).
A good feature store is NOT a database, but rather a data access layer between your data sources and your ML models. Be very wary of any feature store that requires you to load your data into it directly.
Details - Apache Airflow
Apache Airflow¶
Purpose | Workflow Orchestration |
---|---|
Maintainer | Apache Software Foundation |
Documentation | Documentation |
Source Code | apache/airflow |
Roadmap Priority | Medium |
The versatility and extensibility of Apache Airflow make it a great fit for many different use cases, including machine learning.
The key features of Apache Airflow are:
- Python Centered: Airflow is written in Python and uses a Python DSL to define workflows.
- Dynamic Workflows: Airflow's code-driven workflow definitions enable powerful patterns like dynamically generating workflows.
- Extensive Plugins: Airflow has a rich ecosystem of plugins and integrations with other tools.
- User Interface: Airflow is known for its powerful user interface which allows users to monitor and manage workflows.
Details - DataHub
DataHub¶
Purpose | Data Catalog |
---|---|
Maintainer | Acryl Data |
Documentation | Documentation |
Source Code | datahub-project/datahub |
Roadmap Priority | Lower |
The core features of DataHub are:
- Support for Many Data Sources: DataHub supports ingestion of metadata from many sources.
- Search & Discovery: DataHub provides a search interface for discovering data.
- Data Lineage: DataHub can capture and visualize complex data lineage.
Details - Airbyte
Airbyte¶
Purpose | Data Integration |
---|---|
Maintainer | Airbyte |
Documentation | Documentation |
Source Code | airbytehq/airbyte |
Roadmap Priority | Lower |
The core features of Airbyte are:
- Comprehensive Connector Catalog: Airbyte has an extremely large catalog of connectors for data sources and destinations.
- Airbyte Web UI: Airbyte provides a graphical web interface for managing data connectors and orchestrating data syncs.
Details - Label Studio
Label Studio¶
Purpose | Data Labeling |
---|---|
Maintainer | Heartex |
Documentation | Documentation |
Source Code | heartexlabs/label-studio |
Roadmap Priority | Lower |
The core features of Label Studio are:
- Data Types: Label Studio supports a variety of data types, including text, images, audio, video, and time series.
- Task Templates: Label Studio provides many templates for common labeling tasks, including text classification, named entity recognition, and object detection.
- Label Studio Web UI: Label Studio provides a graphical web interface for labeling data and managing labeling projects.
Details - BentoML Yatai
BentoML Yatai¶
Purpose | Model Serving |
---|---|
Maintainer | BentoML |
Documentation | Documentation |
Source Code | bentoml/Yatai |
Roadmap Priority | Lower |
The core features of BentoML Yatai are:
- Model Registry: A central registry for packaged Bentos.
- Model Deployment: Managing the deployment of BentoML models to Kubernetes, including building model container images.
- Web UI: A graphical web interface for viewing, deploying, and monitoring models.
- REST APIs: A REST API for viewing, deploying, and monitoring models.
- Kubernetes CRDs: Manage the deployment of models in a DevOps-friendly way.
Details - Seldon Core
Seldon Core¶
Purpose | Model Serving |
---|---|
Maintainer | Seldon |
Documentation | Documentation |
Source Code | SeldonIO/seldon-core |
Roadmap Priority | Lower |
The core features of Seldon Core are:
- Support for Many Frameworks: Seldon Core natively supports many ML frameworks (e.g. TensorFlow, scikit-learn, XGBoost, HuggingFace, NVIDIA Triton).
- Reusable Model Servers: Seldon Core removes the need to build a container image for each model, by providing a system to download model artifacts at runtime.
- Model Deployment CRD Seldon Core provides a simple, yet powerful, Kubernetes CRD for deploying models.
Created: 2023-04-27