Name		Name	Last commit message	Last commit date
parent directory ..
_assets		_assets
pipelines		pipelines
steps		steps
utils		utils
.dockerignore		.dockerignore
.flake8		.flake8
Building and Using An MLOPs Stack With ZenML.ipynb		Building and Using An MLOPs Stack With ZenML.ipynb
README.md		README.md
current_season.csv		current_season.csv
data_analysis_config.yaml		data_analysis_config.yaml
requirements.txt		requirements.txt
requirements_kubeflow.txt		requirements_kubeflow.txt
run_pipeline.py		run_pipeline.py
season_data.csv		season_data.csv
setup_cloud.sh		setup_cloud.sh
training_config.yaml		training_config.yaml

README.md

Building and Using an MLOps Stack with ZenML

The purpose of this repository is to demonstrate how ZenML enables your machine learning projects in a multitude of ways:

By offering you a framework or template to develop within
By seamlessly integrating into the tools you love and need
By allowing you to easily switch orchestrators for your pipelines
By bringing much-needed Zen into your machine learning

ZenML is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. Built for data scientists, it has a simple, flexible syntax, is cloud- and tool-agnostic, and has interfaces/abstractions that are catered towards ML workflows.

At its core, ZenML pipelines execute ML-specific workflows from sourcing data to splitting, preprocessing, training, all the way to the evaluation of results and even serving. There are many built-in batteries to support common ML development tasks. ZenML is not here to replace the great tools that solve these individual problems. Rather, it integrates natively with popular ML tooling and gives standard abstraction to write your workflows.

Within this repo, we will use ZenML to build pipelines that seamlessly use Evidently, MLFlow, Kubeflow Pipelines and post results to our Discord.

Come watch along as Hamza Tahir, Co-Founder and CTO of ZenML showcase an early version of this repo to the MLOps.community.

💻 System Requirements

In order to run this demo you need to have some packages installed on your machine.

Currently, this will only run on UNIX systems.

package	MacOS installation	Linux installation
docker	Docker Desktop for Mac	Docker Engine for Linux
kubectl	kubectl for mac	kubectl for linux
k3d	Brew Installation of k3d	k3d installation linux

🐍 Python Requirements

Once you've got the system requirements figured out, let's jump into the Python packages you need. Within the Python environment of your choice, run:

git clone https://github.com/zenml-io/zenml-projects
cd zenml-projects/nba-pipeline
pip install -r requirements.txt

If you are running the run_pipeline.py script, install the following integrations by running the following commands in your terminal:

zenml integration install sklearn -y
zenml integration install aws -y
zenml integration install evidently -y
zenml integration install mlflow -y
zenml integration install kubeflow -y

🏀 The Task

A couple of weeks ago, we were looking for a fun project to work on for the next chapter of our ZenHacks. During our initial discussions, we realized that it would be really great to work with an NBA dataset, as we could quickly get close to a real-life application like a "3-Pointer Predictor" while simultaneously entertaining ourselves with one of the trending topics within our team.

As we were building the dataset around a "3-Pointer Predictor", we realized that there is one factor that we need to take into consideration first: Stephen Curry, The Baby Faced Assassin. In our opinion, there is no denying that he changed the way that the games are played in the NBA and we wanted to actually prove that this was the case first.

That's why our story in this ZenHack will start with a pipeline dedicated to drift detection. As the breakpoint of this drift, we will be using the famous "Double Bang" game that the Golden State Warriors played against Oklahoma City Thunder back in 2016. Following that, we will build a training pipeline which will generate a model that predicts the number of three-pointers made by a team in a single game, and ultimately, we will use these trained models and create an inference pipeline for the upcoming matches in the NBA.

📓 Diving into the code

We're ready to go now. You have two options:

Notebook

You can spin up a step-by-step guide in Building and Using An MLOPs Stack With ZenML.ipynb:

jupyter notebook

Script

You can also directly run the code. First, set up the stack:

# You register an experiment tracker
zenml experiment-tracker register local_mlflow_tracker --flavor=mlflow

# You register a data validator
zenml data-validator register local_evidently --flavor=evidently

# Now it all is combined into the local_kubeflow_stack
zenml stack register local_stack -a default -o default -e local_mlflow_tracker -dv local_evidently

# And we activate the new stack, now all pipelines will be run within this stack
zenml stack set local_stack

# Check it out, your new stack is registered
zenml stack describe

Then, run the pipelines using run_pipeline.py script:

python run_pipeline.py drift  # Run one-shot drift pipeline
python run_pipeline.py train  # Run training pipeline
python run_pipeline.py infer  # Run inference pipeline

🚀 From Local to Cloud Stack

In ZenML you can choose to run your pipeline on any infrastructure of your choice. The configuration of the infrastructure is called a Stack. By switching the Stack, you can choose to run your pipeline locally or in the cloud.

In any Stack, there must be at least two basic Stack Components:

Orchestrator - Coordinates all the steps to run in a pipeline.
Artifact Store - Stores all data that pass through the pipeline.

ZenML comes with a default local stack with a local orchestrator and local artifact store.

To transition from running our pipelines locally (see diagram above) to running them on Kubeflow Pipelines, we only need to register a new stack with some more stack components:

First, authenticate your credentials with:

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 715803424590.dkr.ecr.us-east-1.amazonaws.com
aws eks --region us-east-1 update-kubeconfig --name zenhacks-cluster --alias zenml-eks

Replace the --region, --username, --password-stdin, --region, --name, and --alias with your own.

Set the following environment variables with your namespace, username and password.

export KUBEFLOW_NAMESPACE="your-namespace"
export KUBEFLOW_USERNAME="yourusername@yours.io"
export KUBEFLOW_PASSWORD="yourpassword"

Now let's register all the stack components that we will be using.

MLflow Experiment Tracker:

zenml experiment-tracker register aws_mlflow_tracker  --flavor=mlflow --tracking_insecure_tls=true --tracking_uri="https://ac8e6c63af207436194ab675ee71d85a-1399000870.us-east-1.elb.amazonaws.com/mlflow" --tracking_username="{{mlflow_secret.tracking_username}}" --tracking_password="{{mlflow_secret.tracking_password}}"

Evidently Data Validator:

zenml data-validator register evidently --flavor=evidently

Kubeflow Orchestrator:

zenml orchestrator register multi_tenant_kubeflow \
  --flavor=kubeflow \
  --kubernetes_context=zenml-eks \
  --kubeflow_hostname=https://www.kubeflowshowcase.zenml.io/pipeline

Make sure to pass in your own --kubernetes_context and --kubeflow_hostname.

Artifact Store on Amazon S3:

zenml artifact-store register s3_store -f s3 --path=s3://zenml-projects

Include your own --path on S3.

Container registry on ECR:

zenml container-registry register ecr_registry --flavor=aws --uri=715803424590.dkr.ecr.us-east-1.amazonaws.com

Include your own --uri on ECR.

Finally, with all the stack components set up, let's register the stack:

zenml stack register kubeflow_gitflow_stack \
    -a s3_store \
    -c ecr_registry \
    -o multi_tenant_kubeflow \
    -e aws_mlflow_tracker \
    -dv evidently

Set the registered stack as the active stack

zenml stack set kubeflow_gitflow_stack

Let's register our secrets to the secrets store

zenml secret create mlflow_secret -i

You will be prompted to key in the tracking_username and tracking_password for MLflow.

Once all the configurations are done, you can provision the stack by:

zenml stack up

You are now ready to run the pipeline!

python run_pipeline.py drift  # Run drift detection pipeline
python run_pipeline.py train  # Run training pipeline
python run_pipeline.py infer  # Run inference pipeline

And head over to your Kubeflow central dashboard.

The drift detection pipeline looks like the following.

The training pipeline looks like the following.

The inference pipeline looks like the following.

🏁 Cleaning up when you're done

Once you are done running this notebook you might want to stop all running processes. For this, run the following command. (This will tear down your k3d cluster and the local docker registry.)

zenml stack set kubeflow_gitflow_stack
zenml stack down -f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nba-pipeline

nba-pipeline

README.md

Building and Using an MLOps Stack with ZenML

💻 System Requirements

🐍 Python Requirements

🏀 The Task

📓 Diving into the code

Notebook

Script

🚀 From Local to Cloud Stack

🏁 Cleaning up when you're done

Files

nba-pipeline

Directory actions

More options

Directory actions

More options

Latest commit

History

nba-pipeline

Folders and files

parent directory

README.md

Building and Using an MLOps Stack with ZenML

💻 System Requirements

🐍 Python Requirements

🏀 The Task

📓 Diving into the code

Notebook

Script

🚀 From Local to Cloud Stack

🏁 Cleaning up when you're done