Skip to content

Commit

Permalink
Change pyfunc scoring server pandas format to split (mlflow#690)
Browse files Browse the repository at this point in the history
* Use 'split' record format

* Fix azure test

* Format

* Test helper funcs fix

* Scoring server handle

* Print traceback when handling pyfunc server exception

* pyfunc scoring tests

* scoring server test file

* More tests

* Add scoring server tests

* Lint fix

* Azure var name

* Shorten java test name

* Docs and remove legacy sklearn serve_model

* Docs update

* Docs and new header

* Fix sagemaker scoring tests

* Azure docs update

* Return pandas record oriented frame

* Lint

* Docs, lint

* Docs improvement

* Adjust content type naming and semantics

* Content type adjustments

* Lint

* Address comments

* Address comments

* Address more comments

* Lint

* Include stacktrace as json key in exception text rather than formatted string for easier parsing

* Message fix

* Doc fixes

* Doc tweak

* remove redundant comments

* Another docs fix

* Fix content types

* Address docs comments

* Only log content type warning once

* Doc formatting

* Remove another instance of

* Spacing fix

* Address docs comments

* Address more docs comments

* Docs and java comments

* python docs tweaks

* Fix test and lint issue

* Lint and test fixes

* Tweak content type supported error response

* Spark test fix

* Remove unused sklearn imports

* Fix lint errors, sagemaker test

* Fix tests
  • Loading branch information
dbczumar committed Nov 9, 2018
1 parent 1b09159 commit 1e0c2bd
Show file tree
Hide file tree
Showing 26 changed files with 732 additions and 272 deletions.
106 changes: 80 additions & 26 deletions docs/source/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -247,12 +247,34 @@ MLflow provides tools for deploying models on a local machine and to several pro
Not all deployment methods are available for all model flavors. Deployment is supported for the
Python Function format and all compatible formats.

.. _pyfunc_deployment:

Deploy a ``python_function`` model as a local REST API endpoint
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

MLflow can deploy models locally as local REST API endpoints or to directly score CSV files.
This functionality is a convenient way of testing models before deploying to a remote model server.
You deploy the Python Function flavor locally using the CLI interface to the :py:mod:`mlflow.pyfunc` module.
The local REST API server accepts the following data formats as inputs:

* JSON-serialized Pandas DataFrames in the ``split`` orientation. For example,
``data = pandas_df.to_json(orient='split')``. This format is specified using a ``Content-Type``
request header value of ``application/json; format=pandas-split``. Starting in MLflow 0.9.0,
this will be the default format if ``Content-Type`` is ``application/json`` (i.e, with no format
specification).

* JSON-serialized Pandas DataFrames in the ``records`` orientation. *We do not recommend using
this format because it is not guaranteed to preserve column ordering.* Currently, this format is
specified using a ``Content-Type`` request header value of ``application/json; format=pandas-records``
or ``application/json``. Starting in MLflow 0.9.0, ``application/json`` will refer to the
``split`` format instead. For forwards compatibility, we recommend using the ``split`` format
or specifying the ``application/json; format=pandas-records`` content type.

* CSV-serialized Pandas DataFrames. For example, ``data = pandas_df.to_csv()``. This format is
specified using a ``Content-Type`` request header value of ``text/csv``.

For more information about serializing Pandas DataFrames, see
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_json.html

* :py:func:`serve <mlflow.pyfunc.cli.serve>` deploys the model as a local REST API server.
* :py:func:`predict <mlflow.pyfunc.cli.predict>` uses the model to generate a prediction for a local
Expand All @@ -266,16 +288,23 @@ For more info, see:
mlflow pyfunc serve --help
mlflow pyfunc predict --help
.. _azureml_deployment:

Microsoft Azure ML
^^^^^^^^^^^^^^^^^^
The :py:mod:`mlflow.azureml` module can package ``python_function`` models into Azure ML container images.
These images can be deployed to Azure Kubernetes Service (AKS) and the Azure Container Instances (ACI)
platform for real-time serving.
platform for real-time serving. The resulting Azure ML ContainerImage will contain a webserver that
accepts the following data formats as input:

* JSON-serialized Pandas DataFrames in the ``split`` orientation. For example,
``data = pandas_df.to_json(orient='split')``. This format is specified using a ``Content-Type``
request header value of ``application/json``.

* :py:func:`build_image <mlflow.azureml.build_image>` registers an MLflow model with an existing Azure ML
workspace and builds an Azure ML container image for deployment to AKS and ACI. The `Azure ML SDK`_ is
required in order to use this function. **The Azure ML SDK requires Python 3. It cannot be installed with
earlier versions of Python.**
required in order to use this function. *The Azure ML SDK requires Python 3. It cannot be installed with
earlier versions of Python.*

.. _Azure ML SDK: https://docs.microsoft.com/en-us/python/api/overview/azure/ml/intro?view=azure-ml-py

Expand Down Expand Up @@ -324,18 +353,25 @@ platform for real-time serving.
import requests
import json
# `sample_input` is a JSON-serialized Pandas DatFrame with the `split` orientation
sample_input = {
"residual sugar": {"0": 20.7},
"alcohol": {"0": 8.8},
"chlorides": {"0": 0.045},
"density": {"0": 1.001},
"sulphates": {"0": 0.45},
"total sulfur dioxide": {"0": 170.0},
"fixed acidity": {"0": 7.0},
"citric acid": {"0": 0.36},
"pH": {"0": 3.0},
"volatile acidity": {"0": 0.27},
"free sulfur dioxide": {"0": 45.0}
"columns": [
"alcohol",
"chlorides",
"citric acid",
"density",
"fixed acidity",
"free sulfur dioxide",
"pH",
"residual sugar",
"sulphates",
"total sulfur dioxide",
"volatile acidity"
],
"data": [
[8.8, 0.045, 0.36, 1.001, 7, 45, 3, 20.7, 0.45, 170, 0.27]
]
}
response = requests.post(
url=webservice.scoring_uri, data=json.dumps(sample_input),
Expand All @@ -358,19 +394,25 @@ platform for real-time serving.
scoring_uri=$(az ml service show --name <deployment-name> -v | jq -r ".scoringUri")
# `sample_input` is a JSON-serialized Pandas DatFrame with the `split` orientation
sample_input='
{
"residual sugar": {"0": 20.7},
"alcohol": {"0": 8.8},
"chlorides": {"0": 0.045},
"density": {"0": 1.001},
"sulphates": {"0": 0.45},
"total sulfur dioxide": {"0": 170.0},
"fixed acidity": {"0": 7.0},
"citric acid": {"0": 0.36},
"pH": {"0": 3.0},
"volatile acidity": {"0": 0.27},
"free sulfur dioxide": {"0": 45.0}
"columns": [
"alcohol",
"chlorides",
"citric acid",
"density",
"fixed acidity",
"free sulfur dioxide",
"pH",
"residual sugar",
"sulphates",
"total sulfur dioxide",
"volatile acidity"
],
"data": [
[8.8, 0.045, 0.36, 1.001, 7, 45, 3, 20.7, 0.45, 170, 0.27]
]
}'
echo $sample_input | curl -s -X POST $scoring_uri\
Expand All @@ -385,6 +427,8 @@ For more info, see:
mlflow azureml --help
mlflow azureml build-image --help
.. _sagemaker_deployment:

Deploy a ``python_function`` model on Amazon SageMaker
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand All @@ -394,7 +438,17 @@ To deploy remotely to SageMaker you need to set up your environment and user acc
To export a custom model to SageMaker, you need a MLflow-compatible Docker image to be available on Amazon ECR.
MLflow provides a default Docker image definition; however, it is up to you to build the image and upload it to ECR.
MLflow includes the utility function ``build_and_push_container`` to perform this step. Once built and uploaded, you can use the MLflow
container for all MLflow models.
container for all MLflow models. Model webservers deployed using the :py:mod:`mlflow.sagemaker`
module accept the following data formats as input, depending on the deployment flavor:

* ``python_function``: For this deployment flavor, The endpoint accepts the same formats
as the pyfunc server. These formats are described in the
:ref:`pyfunc deployment documentation <pyfunc_deployment>`.

* ``mleap``: For this deployment flavor, the endpoint accepts `only`
JSON-serialized Pandas DataFrames in the ``split`` orientation. For example,
``data = pandas_df.to_json(orient='split')``. This format is specified using a ``Content-Type``
request header value of ``application/json``.

* :py:func:`run-local <mlflow.sagemaker.run_local>` deploys the model locally in a Docker
container. The image and the environment should be identical to how the model would be run
Expand Down
18 changes: 11 additions & 7 deletions docs/source/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -153,23 +153,27 @@ When you run the example, it outputs an MLflow run ID for that experiment. If yo
``mlflow ui``, you will also see that the run saved a ``model`` folder containing an ``MLmodel``
description file and a pickled scikit-learn model. You can pass the run ID and the path of the model
within the artifacts directory (here "model") to various tools. For example, MLflow includes a
simple REST server for scikit-learn models:
simple REST server for python-based models:

.. code:: bash
mlflow sklearn serve -r <RUN_ID> -m model
mlflow pyfunc serve -r <RUN_ID> -m model
.. note::

By default the server runs on port 5000. If that port is already in use, use the `--port` option to
specify a different port. For example: ``mlflow sklearn serve --port 1234 -r <RUN_ID> -m model``
specify a different port. For example: ``mlflow pyfunc serve --port 1234 -r <RUN_ID> -m model``

Once you have started the server, you can pass it some sample data with ``curl`` and see the
predictions:
Once you have started the server, you can pass it some sample data and see the
predictions.

The following example uses ``curl`` to send a JSON-serialized Pandas DataFrame with the ``split``
orientation to the pyfunc server. For more information about the input data formats accepted by
the pyfunc model server, see the :ref:`MLflow deployment tools documentation <pyfunc_deployment>`.

.. code:: bash
curl -d '[{"x": 1}, {"x": -1}]' -H 'Content-Type: application/json' -X POST localhost:5000/invocations
curl -d '{"columns":["x"], "data":[[1], [-1]]}' -H 'Content-Type: application/json; format=pandas-split' -X POST localhost:5000/invocations
which returns::

Expand All @@ -178,7 +182,7 @@ which returns::
.. note::

The ``sklearn_logistic_regression/train.py`` script must be run with the same Python version as
the version of Python that runs ``mlflow sklearn serve``. If they are not the same version,
the version of Python that runs ``mlflow pyfunc serve``. If they are not the same version,
the stacktrace below may appear::

File "/usr/local/lib/python3.6/site-packages/mlflow/sklearn.py", line 54, in _load_model_from_local_file
Expand Down
20 changes: 12 additions & 8 deletions docs/source/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,8 @@ Training the Model
------------------


First, train a linear regression model that takes two hyperparameters: ``alpha`` and ``l1_ratio``.
First, train a linear regression model that takes two hyperparameters: ``alpha`` and ``l1_ratio``.

.. plain-section::

.. container:: python
Expand Down Expand Up @@ -220,7 +220,7 @@ On this page, you can see a list of experiment runs with metrics you can use to
.. image:: _static/images/tutorial-compare.png

.. container:: R

.. image:: _static/images/tutorial-compare-R.png

You can use the search feature to quickly filter out many models. For example, the query ``metrics.rmse < 0.8``
Expand Down Expand Up @@ -367,7 +367,7 @@ in MLflow saved the model as an artifact within the run.

.. code::
mlflow sklearn serve /Users/mlflow/mlflow-prototype/mlruns/0/7c1a0d5c42844dcdb8f5191146925174/artifacts/model -p 1234
mlflow pyfunc serve /Users/mlflow/mlflow-prototype/mlruns/0/7c1a0d5c42844dcdb8f5191146925174/artifacts/model -p 1234
.. note::

Expand All @@ -376,13 +376,17 @@ in MLflow saved the model as an artifact within the run.
``UnicodeDecodeError: 'ascii' codec can't decode byte 0x9f in position 1: ordinal not in range(128)``
or ``raise ValueError, "unsupported pickle protocol: %d"``.

To serve a prediction, run:
Once you have deployed the server, you can pass it some sample data and see the
predictions. The following example uses ``curl`` to send a JSON-serialized Pandas DataFrame
with the ``split`` orientation to the pyfunc server. For more information about the input data
formats accepted by the pyfunc model server, see the
:ref:`MLflow deployment tools documentation <pyfunc_deployment>`.

.. code::
curl -X POST -H "Content-Type:application/json" --data '[{"fixed acidity": 6.2, "volatile acidity": 0.66, "citric acid": 0.48, "residual sugar": 1.2, "chlorides": 0.029, "free sulfur dioxide": 29, "total sulfur dioxide": 75, "density": 0.98, "pH": 3.33, "sulphates": 0.39, "alcohol": 12.8}]' http://127.0.0.1:1234/invocations
curl -X POST -H "Content-Type:application/json; format=pandas-split" --data '{"columns":["alcohol", "chlorides", "citric acid", "density", "fixed acidity", "free sulfur dioxide", "pH", "residual sugar", "sulphates", "total sulfur dioxide", "volatile acidity"],"data":[[12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]}' http://127.0.0.1:1234/invocations
which should return something like::
the server should respond with output similar to::

{"predictions": [6.379428821398614]}

Expand Down Expand Up @@ -416,7 +420,7 @@ in MLflow saved the model as an artifact within the run.
.. image:: _static/images/tutorial-serving-r.png

.. note::

By default, a model is served using the R packages available. To ensure the environment serving
the prediction function matches the model, set ``restore = TRUE`` when calling
``mlflow_rfunc_serve()``.
Expand Down
9 changes: 7 additions & 2 deletions mlflow/azureml/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@ def build_image(model_path, workspace, run_id=None, image_name=None, model_name=
The resulting image can be deployed as a web service to Azure Container Instances (ACI) or
Azure Kubernetes Service (AKS).
The resulting Azure ML ContainerImage will contain a webserver that processes model queries.
For information about the input data formats accepted by this webserver, see the
:ref:`MLflow deployment tools documentation <azureml_deployment>`.
:param model_path: The path to MLflow model for which the image will be built. If a run id
is specified, this is should be a run-relative path. Otherwise, it
should be a local path.
Expand Down Expand Up @@ -307,6 +311,7 @@ def _get_mlflow_azure_resource_name():
from azureml.core.model import Model
from mlflow.pyfunc import load_pyfunc
from mlflow.pyfunc.scoring_server import parse_json_input
from mlflow.utils import get_jsonable_obj
Expand All @@ -316,8 +321,8 @@ def init():
model = load_pyfunc(model_path)
def run(s):
input_df = pd.read_json(s, orient="records")
def run(json_input):
input_df = parse_json_input(json_input=json_input, orientation="split")
return get_jsonable_obj(model.predict(input_df))
"""
4 changes: 4 additions & 0 deletions mlflow/azureml/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,10 @@ def build_image(model_path, workspace_name, subscription_id, run_id, image_name,
Register an MLflow model with Azure ML and build an Azure ML ContainerImage for deployment.
The resulting image can be deployed as a web service to Azure Container Instances (ACI) or
Azure Kubernetes Service (AKS).
The resulting Azure ML ContainerImage will contain a webserver that processes model queries.
For information about the input data formats accepted by this webserver, see the following
documentation: https://www.mlflow.org/docs/latest/models.html#azureml-deployment.
"""
# The Azure ML SDK is only compatible with Python 3. However, this CLI should still be
# accessible for inspection rom Python 2. Therefore, we will only import from the SDK
Expand Down
2 changes: 0 additions & 2 deletions mlflow/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@

import mlflow.azureml.cli
import mlflow.projects as projects
import mlflow.sklearn
import mlflow.data
import mlflow.experiments
import mlflow.pyfunc.cli
Expand Down Expand Up @@ -204,7 +203,6 @@ def server(file_store, default_artifact_root, host, port, workers, static_prefix
sys.exit(1)


cli.add_command(mlflow.sklearn.commands)
cli.add_command(mlflow.data.download)
cli.add_command(mlflow.pyfunc.cli.commands)
cli.add_command(mlflow.rfunc.cli.commands)
Expand Down
16 changes: 14 additions & 2 deletions mlflow/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,28 @@ class MlflowException(Exception):
for debugging purposes. If the error text is sensitive, raise a generic `Exception` object
instead.
"""
def __init__(self, message, error_code=INTERNAL_ERROR):
def __init__(self, message, error_code=INTERNAL_ERROR, **kwargs):
"""
:param message: The message describing the error that occured. This will be included in the
exception's serialized JSON representation.
:param error_code: An appropriate error code for the error that occured; it will be included
in the exception's serialized JSON representation. This should be one of
the codes listed in the `mlflow.protos.databricks_pb2` proto.
:param kwargs: Additional key-value pairs to include in the serialized JSON representation
of the MlflowException.
"""
try:
self.error_code = ErrorCode.Name(error_code)
except (ValueError, TypeError):
self.error_code = ErrorCode.Name(INTERNAL_ERROR)
self.message = message
self.json_kwargs = kwargs
super(MlflowException, self).__init__(message)

def serialize_as_json(self):
return json.dumps({'error_code': self.error_code, 'message': self.message})
exception_dict = {'error_code': self.error_code, 'message': self.message}
exception_dict.update(self.json_kwargs)
return json.dumps(exception_dict)


class RestException(MlflowException):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,17 +55,37 @@ public MLeapPredictor(String modelDataPath, String inputSchemaPath) {
@Override
protected PredictorDataWrapper predict(PredictorDataWrapper input)
throws PredictorEvaluationException {
PandasRecordOrientedDataFrame pandasFrame = null;
PandasSplitOrientedDataFrame pandasFrame = null;
try {
pandasFrame = PandasRecordOrientedDataFrame.fromJson(input.toJson());
pandasFrame = PandasSplitOrientedDataFrame.fromJson(input.toJson());
} catch (IOException e) {
logger.error(
"Encountered a JSON conversion error during conversion of Pandas dataframe to LeapFrame.",
"Encountered a JSON parsing error during conversion of input to a Pandas DataFrame"
+ " representation.",
e);
throw new PredictorEvaluationException(
"Failed to transform input into a JSON representation of an MLeap dataframe."
+ " Please ensure that the input is a JSON-serialized Pandas Dataframe"
+ " with the `record` orientation.",
"Encountered a JSON parsing error while transforming input into a Pandas DataFrame"
+ " representation. Ensure that the input is a JSON-serialized Pandas DataFrame"
+ " with the `split` orientation.",
e);
} catch (InvalidSchemaException e) {
logger.error(
"Encountered a schema mismatch while transforming input into a Pandas DataFrame"
+ " representation.",
e);
throw new PredictorEvaluationException(
"Encountered a schema mismatch while transforming input into a Pandas DataFrame"
+ " representation. Ensure that the input is a JSON-serialized Pandas DataFrame"
+ " with the `split` orientation.",
e);
} catch (IllegalArgumentException e) {
logger.error(
"Failed to transform input into a Pandas DataFrame because the parsed frame is invalid.",
e);
throw new PredictorEvaluationException(
"Failed to transform input into a Pandas DataFrame because the parsed frame is invalid."
+ " Ensure that the input is a JSON-serialized Pandas DataFrame with the `split`"
+ " orientation.",
e);
}

Expand Down
Loading

0 comments on commit 1e0c2bd

Please sign in to comment.