forked from mlflow/mlflow
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Run MLProjects on docker containers (mlflow#555)
Add ability to specify and run MLflow projects dependent on a docker environment
- Loading branch information
1 parent
3a37840
commit d7d6d5d
Showing
21 changed files
with
5,428 additions
and
31 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,4 +7,4 @@ codecov | |
coverage | ||
pypi-publisher | ||
scikit-learn | ||
scipy | ||
scipy |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
FROM continuumio/miniconda:4.5.4 | ||
|
||
RUN pip install mlflow==0.8.1 \ | ||
&& pip install azure-storage==0.36.0 \ | ||
&& pip install numpy==1.14.3 \ | ||
&& pip install pandas==0.22.0 \ | ||
&& pip install scikit-learn==0.19.1 \ | ||
&& pip install cloudpickle |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
name: docker-example | ||
|
||
docker_env: | ||
image: mlflow-docker-example | ||
|
||
entry_points: | ||
main: | ||
parameters: | ||
alpha: float | ||
l1_ratio: {type: float, default: 0.1} | ||
command: "python train.py --alpha {alpha} --l1-ratio {l1_ratio}" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
Dockerized Model Training with MLflow | ||
------------------------------------- | ||
This directory contains an MLflow project that trains a linear regression model on the UC Irvine | ||
Wine Quality Dataset. The project uses a docker image to capture the dependencies needed to run | ||
training code. Running a project in a docker environment (as opposed to conda) allows for capturing | ||
non-Python dependencies, e.g. Java libraries. In the future, we also hope to add tools to MLflow | ||
for running dockerized projects e.g. on a Kubernetes cluster for scaleout. | ||
|
||
|
||
Running this Example | ||
^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Install MLflow via `pip install mlflow` and `docker <https://www.docker.com/get-started>`_. | ||
Then, build a docker image containing MLflow via `docker build examples/docker -t mlflow-docker-example` | ||
and run the example project via `mlflow run examples/docker -P alpha=0.5` | ||
|
||
What happens when the project is run? | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
Let's start by looking at the MLproject file, which specifies the docker image in which to run the | ||
project via a docker_env field: | ||
|
||
``` | ||
docker_env: | ||
image: mlflow-docker-example | ||
``` | ||
|
||
Here, `image` can be any valid argument to `docker run`, such as the tag, ID or | ||
URL of a docker image (see `Docker docs <https://docs.docker.com/engine/reference/run/#general-form>`_). | ||
The above example references a locally-stored image (mlflow-docker-example) by tag. | ||
|
||
Running `mlflow run examples/docker` builds a new docker image based on `mlflow-docker-example` | ||
but also containing our project code, then executes the default (main) project entry point | ||
within the container via `docker run`. | ||
This built image will be tagged as `mlflow-docker-example-<git-version>` where git-version is the git | ||
commit ID. | ||
|
||
Environment variables such as MLFLOW_TRACKING_URI are | ||
propagated inside the container during project execution. When running against a local tracking URI, | ||
e.g. a local `mlruns` directory, MLflow will mount the host system's tracking directory inside the | ||
container so that metrics and params logged during project execution are accessible afterwards. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
# The data set used in this example is from http://archive.ics.uci.edu/ml/datasets/Wine+Quality | ||
# P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. | ||
# Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009. | ||
|
||
import os | ||
import warnings | ||
import sys | ||
import argparse | ||
|
||
import pandas as pd | ||
import numpy as np | ||
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score | ||
from sklearn.model_selection import train_test_split | ||
from sklearn.linear_model import ElasticNet | ||
|
||
import mlflow | ||
import mlflow.sklearn | ||
|
||
|
||
def eval_metrics(actual, pred): | ||
rmse = np.sqrt(mean_squared_error(actual, pred)) | ||
mae = mean_absolute_error(actual, pred) | ||
r2 = r2_score(actual, pred) | ||
return rmse, mae, r2 | ||
|
||
|
||
|
||
if __name__ == "__main__": | ||
warnings.filterwarnings("ignore") | ||
np.random.seed(40) | ||
|
||
parser = argparse.ArgumentParser() | ||
parser.add_argument('--alpha') | ||
parser.add_argument('--l1-ratio') | ||
args = parser.parse_args() | ||
|
||
# Read the wine-quality csv file (make sure you're running this from the root of MLflow!) | ||
wine_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "wine-quality.csv") | ||
data = pd.read_csv(wine_path) | ||
|
||
# Split the data into training and test sets. (0.75, 0.25) split. | ||
train, test = train_test_split(data) | ||
|
||
# The predicted column is "quality" which is a scalar from [3, 9] | ||
train_x = train.drop(["quality"], axis=1) | ||
test_x = test.drop(["quality"], axis=1) | ||
train_y = train[["quality"]] | ||
test_y = test[["quality"]] | ||
|
||
alpha = float(args.alpha) | ||
l1_ratio = float(args.l1_ratio) | ||
|
||
with mlflow.start_run(): | ||
lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42) | ||
lr.fit(train_x, train_y) | ||
|
||
predicted_qualities = lr.predict(test_x) | ||
|
||
(rmse, mae, r2) = eval_metrics(test_y, predicted_qualities) | ||
|
||
print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio)) | ||
print(" RMSE: %s" % rmse) | ||
print(" MAE: %s" % mae) | ||
print(" R2: %s" % r2) | ||
|
||
mlflow.log_param("alpha", alpha) | ||
mlflow.log_param("l1_ratio", l1_ratio) | ||
mlflow.log_metric("rmse", rmse) | ||
mlflow.log_metric("r2", r2) | ||
mlflow.log_metric("mae", mae) | ||
|
||
mlflow.sklearn.log_model(lr, "model") |
Oops, something went wrong.