forked from LineaLabs/lineapy
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Lin 673 lineapy.get for MLflow (LineaLabs#829)
* LIN-674, LIN-671 add mlflow configs (LineaLabs#825) * LIN-674 Add mlflow related configs in lineapy config * LIN-671-enable-pip-install-lineapy[mlflow] * Use Enum instead of Literal for ML_MODELS_STORAGE_BACKEND * Add test for mlflow config This reverts commit 879ffa9. * LIN-672 lineapy.save for mlflow (LineaLabs#828) * LIN-668 Add metadata for mlflow storage backend * Add mlflow_registry_uri into config items * LIN-672 Implement lineapy.save for mlflow models * LIN-670 Add test for lineapy.save to mlflow backend * WIP-lineapy-get-metadata * WIP - Implement Artifact.get_value and Artifact.get_metadata * Implement delete for MLflow * Add statsmodels and xgboost serializer/deserializer for MLflow * Add doc * Add RTD for MLflow * Update docs to address PR review * Address PR feedback * Change mlflow deletion db logic * refactor common code for different storage backend saving logic * Add doc for backend storage
- Loading branch information
Showing
26 changed files
with
1,188 additions
and
164 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -179,3 +179,6 @@ Untitled*.ipynb | |
tests/outputs | ||
*.pickle | ||
.linea/linea_pickles | ||
|
||
# mlflow | ||
mlruns/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
lineapy.plugins.serializers package | ||
=================================== | ||
|
||
Submodules | ||
---------- | ||
|
||
lineapy.plugins.serializers.mlflow\_io module | ||
--------------------------------------------- | ||
|
||
.. automodule:: lineapy.plugins.serializers.mlflow_io | ||
:members: | ||
|
||
Module contents | ||
--------------- | ||
|
||
.. automodule:: lineapy.plugins.serializers | ||
:members: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
14 changes: 14 additions & 0 deletions
14
docs/source/guide/manage_artifacts/storage_backend/index.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
Changing Storage Backend | ||
======================== | ||
|
||
Out of the box, LineaPy is the default storage backend for all artifacts. | ||
For some existing storage systems(MLflow, database ...) used to save artifacts; saving one more copy in LineaPy causes syncing issue between the two systems. | ||
Thus, LineaPy supports using different storage backends for some data types. | ||
This support is essential for users to leverage functionalities from both LineaPy and their familiar tools. | ||
|
||
Currently, LineaPy supports MLflow as a storage backend for ML models. | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
mlflow |
110 changes: 110 additions & 0 deletions
110
docs/source/guide/manage_artifacts/storage_backend/mlflow.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
.. _mlflow: | ||
|
||
Using MLflow as Storage Backend to Save ML Models | ||
================================================= | ||
|
||
.. include:: ../../../snippets/slack_support.rstinc | ||
|
||
By default, LineaPy uses LineaPy to save artifacts for all object types. | ||
However, for users who have access to MLflow, MLflow might be their first choice to save the ML model. | ||
Thus, we enable using MLflow as the backend storage for ML models. | ||
|
||
Configure MLflow | ||
---------------- | ||
|
||
Depend on how our MLflow is configured. We might need to specify ``tracking URI`` and (optional) ``registry URI``in MLflow to start using MLflow. | ||
.. code:: python | ||
mlflow.set_tracking_uri('your_mlflow_tracking_uri') | ||
mlflow.set_registry_uri('your_mlflow_registry_uri') | ||
To let LineaPy be aware of the existence of MLflow, we need to set corresponding config items if we want to use MLflow as the storage backend for ML models. | ||
.. code:: python | ||
lineapy.options.set('mlflow_tracking_uri','your_mlflow_tracking_uri') | ||
lineapy.options.set('mlflow_registry_uri','your_mlflow_registry_uri') | ||
.. note:: | ||
For objects not supported by MLflow, it will fall back to using LineaPy as the storage backend as usual. | ||
Set Default Storage Backend for ML Models | ||
----------------------------------------- | ||
Each user might have a different usage pattern for MLflow; some might use it for logging purposes and record all developing models. Some might treat it as a public space and only publish models that meet specific criteria to MLflow. | ||
In the first case, users want to use MLflow to save artifacts(ML models) by default, and in the second case, users only want to use MLflow to save artifacts when they want. | ||
Thus, we provide an option(``default_ml_models_storage_backend``) to let users decide the default storage backend for ML models when ``mlflow_tracking_uri`` has been set. | ||
|
||
Here are behaviors about which storage backend to use for ML models: | ||
|
||
* Only set ``mlflow_tracking_uri`` but not ``default_ml_models_storage_backend`` | ||
|
||
.. code:: python | ||
lineapy.options.set("mlflow_tracking_uri", "databricks") | ||
lineapy.save(model, 'model') # Use MLflow (if mlflow_tracking_uri is set, default value of default_ml_models_storage_backend is mlflow ) | ||
lineapy.save(model, 'model', storage_backend='mlflow') # Use MLflow | ||
lineapy.save(model, 'model', storage_backend='lineapy') # Use LineaPy | ||
* Set ``mlflow_tracking_uri`` and ``default_ml_models_storage_backend=='mlflow'`` | ||
|
||
.. code:: python | ||
lineapy.options.set("mlflow_tracking_uri", "databricks") | ||
lineapy.options.set("default_ml_models_storage_backend", "mlflow") | ||
lineapy.save(model, 'model') # Use MLflow | ||
lineapy.save(model, 'model', storage_backend='mlflow') # Use MLflow | ||
lineapy.save(model, 'model', storage_backend='lineapy') # Use LineaPy | ||
* Set ``mlflow_tracking_uri`` and ``default_ml_models_storage_backend=='lineapy'`` | ||
|
||
.. code:: python | ||
lineapy.options.set("mlflow_tracking_uri", "databricks") | ||
lineapy.options.set("default_ml_models_storage_backend", "lineapy") | ||
lineapy.save(model, 'model') # Use LineaPy | ||
lineapy.save(model, 'model', storage_backend='mlflow') # Use MLflow | ||
lineapy.save(model, 'model', storage_backend='lineapy') # Use LineaPy | ||
Note that when using MLflow as storage backend, ``lineapy.save`` is wrapping ``mlflow.flavor.log_model`` under the hood. | ||
Users can use all the arguments in ``mlflow.flavor.log_model`` in ``lineapy.save`` as well. | ||
For instance, if we want to specify ``registered_model_name``, we can write the save statement as: | ||
|
||
.. code:: python | ||
lineapy.save(model, name="model", storage_backend="mlflow", registered_model_name="clf") | ||
Retrieve Artifact from Both LineaPy and MLflow | ||
---------------------------------------------- | ||
|
||
Depend on what users want to do (or be familiar with). | ||
Users can retrieve the same artifact(ML model) from LineaPy API and MLflow API once users execute ``lineapy.save`` with ``mlflow`` as the storage backend to save the artifact. | ||
|
||
* Retrieve artifact(model) with LineaPy API | ||
|
||
.. code:: python | ||
artifact = lineapy.get('model') | ||
lineapy_model = artifact.get_value() | ||
* Retrieve artifact(model) with Mlflow API | ||
|
||
.. code:: python | ||
client = mlflow.MlflowClient() | ||
latest_version = client.search_model_versions("name='clf'")[0].version | ||
# This is exactly the same object as `lineapy_model` in previous session | ||
mlflow_model = mlflow.sklearn.load_model(f'models:/clf/{latest_version}') | ||
Which MLflow Model Flavor is Supported | ||
-------------------------------------- | ||
|
||
Currently, we are supporting following flavors: ``sklearn``, ``xgboost``, ``prophet`` and ``statsmodels``. | ||
We plan to support all MLflow supported model flavors soon. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
42 changes: 42 additions & 0 deletions
42
lineapy/_alembic/versions/07d0db31e15f_mlflow_integration.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
"""mlflow_integration | ||
Revision ID: 07d0db31e15f | ||
Revises: 4907800d9126 | ||
Create Date: 2022-11-03 16:26:37.217174 | ||
""" | ||
import sqlalchemy as sa | ||
from alembic import op | ||
|
||
# revision identifiers, used by Alembic. | ||
revision = "07d0db31e15f" | ||
down_revision = "4907800d9126" | ||
branch_labels = None | ||
depends_on = None | ||
|
||
|
||
def upgrade() -> None: | ||
# ### commands auto generated by Alembic - please adjust! ### | ||
op.create_table( | ||
"mlflow_artifact_storage", | ||
sa.Column("id", sa.Integer(), autoincrement=True, nullable=False), | ||
sa.Column("artifact_id", sa.Integer(), nullable=False), | ||
sa.Column("backend", sa.String(), nullable=False), | ||
sa.Column("tracking_uri", sa.String(), nullable=False), | ||
sa.Column("registry_uri", sa.String(), nullable=True), | ||
sa.Column("model_uri", sa.String(), nullable=False), | ||
sa.Column("model_flavor", sa.String(), nullable=False), | ||
sa.Column("delete_time", sa.DateTime(), nullable=True), | ||
sa.ForeignKeyConstraint( | ||
["artifact_id"], | ||
["artifact.id"], | ||
), | ||
sa.PrimaryKeyConstraint("id"), | ||
) | ||
# ### end Alembic commands ### | ||
|
||
|
||
def downgrade() -> None: | ||
# ### commands auto generated by Alembic - please adjust! ### | ||
op.drop_table("mlflow_artifact_storage") | ||
# ### end Alembic commands ### |
Oops, something went wrong.