Python fluent API search_runs auto-pagination #1548

max-allen-db · 2019-07-03T21:28:17Z

What changes are proposed in this pull request?

The mlflow.search_runs() API in the Python fluent client returns a pandas Dataframe with the active experiment's runs in it. Currently, it will return up to 50,000 runs by default and won't return more since current database implementations won't return more than 50,000 runs back in one request.

This PR changes this behavior to return up to 100,000 runs by default, with the option to get more by passing in a larger value for max_results. The changes in this PR automatically break up the requests to the database into pages of 10,000 runs, and will concatenate them together before returning the DataFrame.

How is this patch tested?

Unit Tests in test_fluent.py

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

Related to #1483. Allows users to get all their experiment data in a pandas dataframe format, with filterstring, order by, and max results supported.

What component(s) does this PR affect?

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

tests/tracking/test_fluent.py

andrewmchen · 2019-07-03T23:26:41Z

tests/tracking/test_fluent.py

+
+def test_get_paginated_runs_lt_maxresults_notoken():
+    # num runs is less than max_results, only one page
+    # the list returned is not a PagedList, no token attribute


Why is this possible actually?

I think this might not be the case anymore now that tokens are implemented in OSS stores. Before the interface said that there would be a "token attribute" if the tracking server supported pagination. But at this version of MLflow, they should all support it now.

andrewmchen · 2019-07-03T23:28:28Z

tests/tracking/test_fluent.py

+
+
+def test_get_paginated_maxresults_lt_runs_per_page():
+    # should only get max_result number of runs, should only call search_runs once


What's the reason we don't combine this with test_get_paginated_runs_lt_maxresults_notoken?

Got rid of the notoken tests. This should now have a more descriptive test docstring

* auto-pagination and tests

max-allen-db added 2 commits July 3, 2019 10:14

auto-pagination and tests

0a0a2a3

Update tests and lint

a75b5e7

andrewmchen reviewed Jul 3, 2019

View reviewed changes

tests/tracking/test_fluent.py Outdated Show resolved Hide resolved

andrewmchen reviewed Jul 3, 2019

View reviewed changes

max-allen-db added 2 commits July 8, 2019 11:16

Update tests

e46f440

Update docstring

1d35c52

andrewmchen added the LGTM label Jul 8, 2019

max-allen-db merged commit 5a60ab3 into mlflow:master Jul 8, 2019

andrewmchen added the rn/feature Mention under Features in Changelogs. label Jul 16, 2019

avflor pushed a commit to avflor/mlflow that referenced this pull request Aug 22, 2020

Python fluent API search_runs auto-pagination (mlflow#1548)

2fce0be

* auto-pagination and tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python fluent API search_runs auto-pagination #1548

Python fluent API search_runs auto-pagination #1548

max-allen-db commented Jul 3, 2019 •

edited by andrewmchen

Loading

andrewmchen Jul 3, 2019

max-allen-db Jul 8, 2019

andrewmchen Jul 3, 2019

max-allen-db Jul 8, 2019



		def test_get_paginated_maxresults_lt_runs_per_page():
		# should only get max_result number of runs, should only call search_runs once

Python fluent API search_runs auto-pagination #1548

Python fluent API search_runs auto-pagination #1548

Conversation

max-allen-db commented Jul 3, 2019 • edited by andrewmchen Loading

What changes are proposed in this pull request?

How is this patch tested?

Release Notes

Is this a user-facing change?

What component(s) does this PR affect?

How should the PR be classified in the release notes? Choose one:

andrewmchen Jul 3, 2019

Choose a reason for hiding this comment

max-allen-db Jul 8, 2019

Choose a reason for hiding this comment

andrewmchen Jul 3, 2019

Choose a reason for hiding this comment

max-allen-db Jul 8, 2019

Choose a reason for hiding this comment

max-allen-db commented Jul 3, 2019 •

edited by andrewmchen

Loading