Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python fluent API search_runs auto-pagination #1548

Merged
merged 4 commits into from
Jul 8, 2019

Conversation

max-allen-db
Copy link
Contributor

@max-allen-db max-allen-db commented Jul 3, 2019

What changes are proposed in this pull request?

The mlflow.search_runs() API in the Python fluent client returns a pandas Dataframe with the active experiment's runs in it. Currently, it will return up to 50,000 runs by default and won't return more since current database implementations won't return more than 50,000 runs back in one request.

This PR changes this behavior to return up to 100,000 runs by default, with the option to get more by passing in a larger value for max_results. The changes in this PR automatically break up the requests to the database into pages of 10,000 runs, and will concatenate them together before returning the DataFrame.

How is this patch tested?

Unit Tests in test_fluent.py

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

Related to #1483. Allows users to get all their experiment data in a pandas dataframe format, with filterstring, order by, and max results supported.

What component(s) does this PR affect?

  • UI
  • CLI
  • API
  • REST-API
  • Examples
  • Docs
  • Tracking
  • Projects
  • Artifacts
  • Models
  • Scoring
  • Serving
  • R
  • Java
  • Python

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes


def test_get_paginated_runs_lt_maxresults_notoken():
# num runs is less than max_results, only one page
# the list returned is not a PagedList, no token attribute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this possible actually?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might not be the case anymore now that tokens are implemented in OSS stores. Before the interface said that there would be a "token attribute" if the tracking server supported pagination. But at this version of MLflow, they should all support it now.



def test_get_paginated_maxresults_lt_runs_per_page():
# should only get max_result number of runs, should only call search_runs once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason we don't combine this with test_get_paginated_runs_lt_maxresults_notoken?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got rid of the notoken tests. This should now have a more descriptive test docstring

@max-allen-db max-allen-db merged commit 5a60ab3 into mlflow:master Jul 8, 2019
@andrewmchen andrewmchen added the rn/feature Mention under Features in Changelogs. label Jul 16, 2019
avflor pushed a commit to avflor/mlflow that referenced this pull request Aug 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LGTM rn/feature Mention under Features in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants