Skip to content

Commit

Permalink
Use single POV, fix typos, etc. (mlflow#42)
Browse files Browse the repository at this point in the history
  • Loading branch information
Stephanie Bodoff authored and mateiz committed Jun 14, 2018
1 parent 769f507 commit 46ee5a8
Showing 1 changed file with 56 additions and 62 deletions.
118 changes: 56 additions & 62 deletions docs/source/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,37 +3,32 @@
Tutorial
========

What We're Building
-------------------
This tutorial showcases how you can use MLflow end-to-end to:

In this tutorial, we will showcase how a data scientist can use MLflow end to end to create a
linear regression model; how we can use MLflow to package the code
which trains this model in a reusable and reproducible model format; and finally how we can use
MLflow to create a simple HTTP server which will enable us to score predictions.
- Create a linear regression model
- Package the code that trains the model in a reusable and reproducible model format
- Load the model into a simple HTTP server that will enable you to score predictions

For this tutorial we will use a dataset where we attempt to predict the quality of wine based on
quantative features like the wine's "fixed acidity", "pH", "residual sugar", etc. The data-set
we are using for this tutorial is from UCI's `machine learning repository <http://archive.ics.uci.edu/ml/datasets/Wine+Quality>`_.
This tutorial uses a dataset to predict the quality of wine based on quantitative features
like the wine's "fixed acidity", "pH", "residual sugar", and so on. The dataset
is from UCI's `machine learning repository <http://archive.ics.uci.edu/ml/datasets/Wine+Quality>`_.
[Ref]_

What You'll Need
----------------
For this tutorial, we'll be using MLflow, ``conda``, and the tutorial code located at
This tutorial uses MLflow, `conda <https://conda.io/docs/user-guide/install/index.html#>`_, and the tutorial code located at
``example/tutorial`` in the MLflow repository. To download the tutorial code run::

git clone https://github.com/databricks/mlflow

Training the Model
------------------
The first thing we'll do is train a linear regression model which takes two hyperparameters:
alpha and l1_ratio.

The code which we will use is located at ``example/tutorial/train.py`` and is reproduced
below.
First, train a linear regression model that takes two hyperparameters: ``alpha`` and ``l1_ratio``. The code is located at ``example/tutorial/train.py`` and is reproduced below.

.. code:: python
# Read the wine-quality csv file (make sure you're running this from the root of MLflow!)
# Run from the root of MLflow
# Read the wine-quality csv file
wine_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "wine-quality.csv")
data = pd.read_csv(wine_path)
Expand Down Expand Up @@ -70,48 +65,44 @@ below.
mlflow.sklearn.log_model(lr, "model")
In this code, we use the familiar pandas, numpy, and sklearn APIs to create a simple machine learning
model. In addition, we also use the :doc:`MLflow tracking APIs<tracking/>` to log information about each
training run, like the hyperparameters ``alpha`` and ``l1_ratio`` we used to train the model and metrics like
the root mean square error which we will use to evaluate the model. In addition, we serialize the
model which we produced in a format that MLflow knows how to deploy.
This example uses the familiar pandas, numpy, and sklearn APIs to create a simple machine learning
model. The :doc:`MLflow tracking APIs<tracking/>` log information about each
training run, like the hyperparameters ``alpha`` and ``l1_ratio``, used to train the model and metrics, like
the root mean square error, used to evaluate the model. The example also serializes the
model in a format that MLflow knows how to deploy.

To run this example execute::
You can run the example with default hyperparameters as follows::

python example/tutorial/train.py

Try out some other values for alpha and l1_ratio by passing them as arguments to ``train.py``.::
Try out some other values for ``alpha`` and ``l1_ratio`` by passing them as arguments to ``train.py``::

python example/tutorial/train.py <alpha> <l1_ratio>

After running this, MLflow has logged information about your experiment runs in the directory called
``mlruns``.
Each time you run the example, MLflow logs information about your experiment runs in the directory ``mlruns``.

Comparing the Models
--------------------
Next we will use the MLflow UI to compare the models which we have produced. Run ``mlflow ui``
in the same current working directory as the one which contains the ``mlruns`` directory and
navigate your browser to http://localhost:5000.

On this page, we can see the metrics we can use to compare our models.
Next, use the MLflow UI to compare the models that you have produced. Run ``mlflow ui``
in the same current working directory as the one that contains the ``mlruns`` directory and
open http://localhost:5000 in your browser.

On this page, you can see a list of experiment runs with metrics you can use to compare the models.

.. image:: _static/images/tutorial-compare.png

Using this page, we can see that the lower ``alpha`` is the better our model. We can also
use the search feature to quickly filter out many models. For example the query ``metrics.rmse < 0.8``
would return all the models with root mean squared error less than 0.8. For more complex manipulations,
we can download this table as a CSV and use our favorite data munging software to analyze it.
You can see that the lower ``alpha`` is, the better the model. You can also
use the search feature to quickly filter out many models. For example, the query ``metrics.rmse < 0.8``
returns all the models with root mean squared error less than 0.8. For more complex manipulations,
you can download this table as a CSV and use your favorite data munging software to analyze it.

Packaging the Training Code
---------------------------
Now that we have our training code written, we would like to package it so that
other data scientists can easily reuse our model, or so that we can run the training remotely e.g. on
Databricks. To do this, we use the :doc:`projects` conventions to specify the
dependencies and entry points to our code. In the ``example/tutorial/MLproject`` file we specify
that our project has the dependencies located in the
Now that you have your training code, you can package it so that other data scientists can easily reuse the model, or so that you can run the training remotely, for example on Databricks. You do this by using :doc:`projects` conventions to specify the
dependencies and entry points to your code. The ``example/tutorial/MLproject`` file specifies that the project has the dependencies located in a
`Conda environment file <https://conda.io/docs/user-guide/tasks/manage-environments.html#creating-an-environment-file-manually>`_
called ``conda.yaml`` and that our project has one entry point which takes two parameters:
alpha and l1_ratio.
called ``conda.yaml`` and has one entry point that takes two parameters: ``alpha`` and ``l1_ratio``.

.. code:: yaml
Expand All @@ -127,6 +118,9 @@ alpha and l1_ratio.
alpha: float
l1_ratio: {type: float, default: 0.1}
command: "python train.py {alpha} {l1_ratio}"
The Conda file lists the dependencies:

.. code:: yaml
Expand All @@ -142,68 +136,68 @@ alpha and l1_ratio.
- pip:
- mlflow
To run this project, we simply invoke ``mlflow run example/tutorial -P alpha=0.42``. After running
this command, MLflow will run your training code in a new conda environment with the dependencies
To run this project, invoke ``mlflow run example/tutorial -P alpha=0.42``. After running
this command, MLflow will run your training code in a new Conda environment with the dependencies
specified in ``conda.yaml``.

Projects can also be run directly from Github if the repository has a ``MLproject`` file in the
root. We've duplicated this tutorial to the https://github.com/databricks/mlflow-example repository
If the repository has an ``MLproject`` file in the root you can also run a project directly from GitHub. This tutorial is duplicated in the https://github.com/databricks/mlflow-example repository
which can be run with ``mlflow run git@github.com:databricks/mlflow-example.git -P alpha=0.42``.

Serving the Model
-----------------
Now that we have packaged our model using the MLproject convention and have identified the best model,
Now that you have packaged your model using the MLproject convention and have identified the best model,
it is time to deploy the model using :doc:`models`. An MLflow Model is a standard format for
packaging machine learning models that can be used in a variety of downstream tools — for example,
real-time serving through a REST API or batch inference on Apache Spark.

In our example training code, after training the linear regression model, we invoked a function
in MLflow which saved the model as an artifact within the run.
In the example training code, after training the linear regression model, a function
in MLflow saved the model as an artifact within the run.

.. code::
mlflow.sklearn.log_model(lr, "model")
To view this artifact, we can use the UI again. By clicking on a row in the listing of experiment
runs we'll see this page.
To view this artifact, you can use the UI again. When you click a date in the list of experiment
runs you'll see this page.

.. image:: _static/images/tutorial-artifact.png

At the bottom, we can see that the call to ``mlflow.sklearn.log_model`` produced two files in
At the bottom, you can see that the call to ``mlflow.sklearn.log_model`` produced two files in
``/Users/mlflow/mlflow-prototype/mlruns/0/7c1a0d5c42844dcdb8f5191146925174/artifacts/model``.
The first file, ``MLmodel`` is a metadata file which tells MLflow how to load the model. The
second file, ``model.pkl`` is a serialized version of the linear regression model which we trained.
The first file, ``MLmodel``, is a metadata file that tells MLflow how to load the model. The
second file, ``model.pkl``, is a serialized version of the linear regression model that you trained.

In our example, we'll demonstrate how we can use this MLmodel format with MLflow to deploy a local
REST server which can serve predictions.
In this example, you can use this MLmodel format with MLflow to deploy a local REST server that can serve predictions.

To deploy the server run:
To deploy the server, run:

.. code::
mlflow sklearn serve /Users/mlflow/mlflow-prototype/mlruns/0/7c1a0d5c42844dcdb8f5191146925174/artifacts/model -p 1234
.. note::

The version of Python used to create the model must be the same as the one which is running
``mlflow sklearn``.
If this is not the case, you may run into the error
The version of Python used to create the model must be the same as the one running ``mlflow sklearn``.
If this is not the case, you may see the error
``UnicodeDecodeError: 'ascii' codec can't decode byte 0x9f in position 1: ordinal not in range(128)``
or ``raise ValueError, "unsupported pickle protocol: %d"``.

To serve a prediction run:
To serve a prediction, run:

.. code::
curl -X POST -H "Content-Type:application/json" --data '[{"fixed acidity": 6.2, "volatile acidity": 0.66, "citric acid": 0.48, "residual sugar": 1.2, "chlorides": 0.029, "free sulfur dioxide": 29, "total sulfur dioxide": 75, "density": 0.98, "pH": 3.33, "sulphates": 0.39, "alcohol": 12.8}]' http://127.0.0.1:1234/invocations
# RESPONSE
# {"predictions": [6.379428821398614]}
which should return something like:

.. code::
{"predictions": [6.379428821398614]}
More Resources
--------------
Congratulations on finishing the tutorial! For more reading reference :doc:`tracking`, :doc:`projects`, :doc:`models`,
Congratulations on finishing the tutorial! For more reading, see :doc:`tracking`, :doc:`projects`, :doc:`models`,
and more.


Expand Down

0 comments on commit 46ee5a8

Please sign in to comment.