Skip to content

Commit

Permalink
Run MLProjects on docker containers (mlflow#555)
Browse files Browse the repository at this point in the history
Add ability to specify and run MLflow projects dependent on a docker environment
  • Loading branch information
marcusrehm authored and smurching committed Jan 18, 2019
1 parent 3a37840 commit d7d6d5d
Show file tree
Hide file tree
Showing 21 changed files with 5,428 additions and 31 deletions.
2 changes: 1 addition & 1 deletion dev-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ codecov
coverage
pypi-publisher
scikit-learn
scipy
scipy
32 changes: 25 additions & 7 deletions docs/source/projects.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,12 @@ Name
A human-readable name for the project.

Dependencies
Libraries needed to run the project. MLflow currently uses the
`Conda <https://conda.io/docs>`_ package manager, which supports both Python packages and native
libraries (for example, CuDNN or Intel MKL), to specify dependencies. MLflow will use the
Conda installation given by the ``MLFLOW_CONDA_HOME`` environment variable if specified
(e.g. running Conda commands by invoking ``$MLFLOW_CONDA_HOME/bin/conda``), and default to
running ``conda`` otherwise.
Libraries needed to run the project. MLflow supports using `Docker <https://docs.docker.com/>`_
to run projects inside a container or `Conda <https://conda.io/docs>`_ package manager,
which supports both Python packages and native libraries (for example, CuDNN or Intel MKL), to
specify dependencies. MLflow will use the Conda installation given by the ``MLFLOW_CONDA_HOME``
environment variable if specified (e.g. running Conda commands by invoking ``$MLFLOW_CONDA_HOME/bin/conda``),
and default to running ``conda`` otherwise.

Entry Points
Commands that can be executed within the project, and information about their
Expand Down Expand Up @@ -64,6 +64,10 @@ following conventions to determine its parameters:
is specified in ``conda.yaml``, if present. If no ``conda.yaml`` file is present, MLflow
will use a Conda environment containing only Python (specifically, the latest Python available to
Conda) when running the project.
* Alternatively, you may provide a Docker environment for project execution, which allows for capturing
non-Python dependencies such as Java libraries.
`See here <https://github.com/mlflow/mlflow/tree/master/examples/docker>`_ for an example of an
MLflow project with a Docker environment.
* Any ``.py`` and ``.sh`` file in the project can be an entry point, with no parameters explicitly
declared. When you execute such a command with a set of parameters, MLflow will pass each
parameter on the command line using ``--key value`` syntax.
Expand All @@ -76,6 +80,9 @@ YAML syntax. The MLproject file looks like this:
name: My Project
conda_env: my_env.yaml
# Can have a docker_env instead of a conda_env, e.g.
# docker_env:
# image: mlflow-docker-example
entry_points:
main:
Expand All @@ -88,7 +95,7 @@ YAML syntax. The MLproject file looks like this:
data_file: path
command: "python validate.py {data_file}"
As you can see, the file can specify a name and a different environment file, as well as more
As you can see, the file can specify a name and a conda or docker environment, as well as more
detailed information about each entry point. Specifically, each entry point has a *command* to
run and *parameters* (including data types). We describe these two pieces next.

Expand Down Expand Up @@ -219,6 +226,17 @@ where ``<uri>`` is a Git repository URI or a folder. You can pass Git credential
``MLFLOW_GIT_PASSWORD`` environment variables.


Execution on Docker containers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can run projects inside Docker container instead of conda environments. In order to do that
you need to specify the ``docker_env`` and ``dockerimage`` atributes in MLProject as described bellow.
It simply mounts the local directory of the project as a volume inside container in ``/mlflow/projects/code`` path.

.. code::
docker_env:
dockerimage: mlflow-run-image
Iterating Quickly
-----------------

Expand Down
2 changes: 2 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,5 @@ and stores (logs) them as MLflow artifacts.
* `sklearn_logisic_regression` is a simple MLflow example with hooks to log training data to MLflow
tracking server.
* `tensorflow` is an end-to-end one run example from train to predict.
* `docker` demonstrates how to create and run an MLflow project using docker (rather than conda)
to manage project dependencies
Empty file added examples/docker/.dockerignore
Empty file.
8 changes: 8 additions & 0 deletions examples/docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
FROM continuumio/miniconda:4.5.4

RUN pip install mlflow==0.8.1 \
&& pip install azure-storage==0.36.0 \
&& pip install numpy==1.14.3 \
&& pip install pandas==0.22.0 \
&& pip install scikit-learn==0.19.1 \
&& pip install cloudpickle
11 changes: 11 additions & 0 deletions examples/docker/MLproject
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
name: docker-example

docker_env:
image: mlflow-docker-example

entry_points:
main:
parameters:
alpha: float
l1_ratio: {type: float, default: 0.1}
command: "python train.py --alpha {alpha} --l1-ratio {l1_ratio}"
41 changes: 41 additions & 0 deletions examples/docker/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
Dockerized Model Training with MLflow
-------------------------------------
This directory contains an MLflow project that trains a linear regression model on the UC Irvine
Wine Quality Dataset. The project uses a docker image to capture the dependencies needed to run
training code. Running a project in a docker environment (as opposed to conda) allows for capturing
non-Python dependencies, e.g. Java libraries. In the future, we also hope to add tools to MLflow
for running dockerized projects e.g. on a Kubernetes cluster for scaleout.


Running this Example
^^^^^^^^^^^^^^^^^^^^

Install MLflow via `pip install mlflow` and `docker <https://www.docker.com/get-started>`_.
Then, build a docker image containing MLflow via `docker build examples/docker -t mlflow-docker-example`
and run the example project via `mlflow run examples/docker -P alpha=0.5`

What happens when the project is run?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Let's start by looking at the MLproject file, which specifies the docker image in which to run the
project via a docker_env field:

```
docker_env:
image: mlflow-docker-example
```

Here, `image` can be any valid argument to `docker run`, such as the tag, ID or
URL of a docker image (see `Docker docs <https://docs.docker.com/engine/reference/run/#general-form>`_).
The above example references a locally-stored image (mlflow-docker-example) by tag.

Running `mlflow run examples/docker` builds a new docker image based on `mlflow-docker-example`
but also containing our project code, then executes the default (main) project entry point
within the container via `docker run`.
This built image will be tagged as `mlflow-docker-example-<git-version>` where git-version is the git
commit ID.

Environment variables such as MLFLOW_TRACKING_URI are
propagated inside the container during project execution. When running against a local tracking URI,
e.g. a local `mlruns` directory, MLflow will mount the host system's tracking directory inside the
container so that metrics and params logged during project execution are accessible afterwards.

72 changes: 72 additions & 0 deletions examples/docker/train.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# The data set used in this example is from http://archive.ics.uci.edu/ml/datasets/Wine+Quality
# P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
# Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

import os
import warnings
import sys
import argparse

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

import mlflow
import mlflow.sklearn


def eval_metrics(actual, pred):
rmse = np.sqrt(mean_squared_error(actual, pred))
mae = mean_absolute_error(actual, pred)
r2 = r2_score(actual, pred)
return rmse, mae, r2



if __name__ == "__main__":
warnings.filterwarnings("ignore")
np.random.seed(40)

parser = argparse.ArgumentParser()
parser.add_argument('--alpha')
parser.add_argument('--l1-ratio')
args = parser.parse_args()

# Read the wine-quality csv file (make sure you're running this from the root of MLflow!)
wine_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "wine-quality.csv")
data = pd.read_csv(wine_path)

# Split the data into training and test sets. (0.75, 0.25) split.
train, test = train_test_split(data)

# The predicted column is "quality" which is a scalar from [3, 9]
train_x = train.drop(["quality"], axis=1)
test_x = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y = test[["quality"]]

alpha = float(args.alpha)
l1_ratio = float(args.l1_ratio)

with mlflow.start_run():
lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
lr.fit(train_x, train_y)

predicted_qualities = lr.predict(test_x)

(rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
print(" RMSE: %s" % rmse)
print(" MAE: %s" % mae)
print(" R2: %s" % r2)

mlflow.log_param("alpha", alpha)
mlflow.log_param("l1_ratio", l1_ratio)
mlflow.log_metric("rmse", rmse)
mlflow.log_metric("r2", r2)
mlflow.log_metric("mae", mae)

mlflow.sklearn.log_model(lr, "model")
Loading

0 comments on commit d7d6d5d

Please sign in to comment.