-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python fluent API search_runs auto-pagination #1548
Python fluent API search_runs auto-pagination #1548
Conversation
tests/tracking/test_fluent.py
Outdated
|
||
def test_get_paginated_runs_lt_maxresults_notoken(): | ||
# num runs is less than max_results, only one page | ||
# the list returned is not a PagedList, no token attribute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this possible actually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might not be the case anymore now that tokens are implemented in OSS stores. Before the interface said that there would be a "token attribute" if the tracking server supported pagination. But at this version of MLflow, they should all support it now.
tests/tracking/test_fluent.py
Outdated
|
||
|
||
def test_get_paginated_maxresults_lt_runs_per_page(): | ||
# should only get max_result number of runs, should only call search_runs once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason we don't combine this with test_get_paginated_runs_lt_maxresults_notoken
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got rid of the notoken
tests. This should now have a more descriptive test docstring
* auto-pagination and tests
What changes are proposed in this pull request?
The
mlflow.search_runs()
API in the Python fluent client returns a pandas Dataframe with the active experiment's runs in it. Currently, it will return up to 50,000 runs by default and won't return more since current database implementations won't return more than 50,000 runs back in one request.This PR changes this behavior to return up to 100,000 runs by default, with the option to get more by passing in a larger value for
max_results
. The changes in this PR automatically break up the requests to the database into pages of 10,000 runs, and will concatenate them together before returning the DataFrame.How is this patch tested?
Unit Tests in
test_fluent.py
Release Notes
Is this a user-facing change?
(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)
Related to #1483. Allows users to get all their experiment data in a pandas dataframe format, with filterstring, order by, and max results supported.
What component(s) does this PR affect?
How should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes