-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filter: Rewrite priority queue logic with pandas functions #809
Draft
victorlin
wants to merge
9
commits into
master
Choose a base branch
from
victorlin/filter/priority-speedup
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Commits on Dec 10, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 1b106d5 - Browse repository at this point
Copy the full SHA 1b106d5View commit details -
rewrite PriorityQueue logic with pandas functions
- remove `class PriorityQueue` - use `prioritized_metadata` DataFrame in place of `queues_per_group` - repurpose `create_queues_per_group` to `create_sizes_per_group` - other logical refactoring: - use global dummy group key and value - key is `list`: pd.DataFrame.groupby does not take a tuple as grouping key, also our `--group-by` is stored as list already. - value is `tuple: `get_groups_for_subsampling` currently returns group values in this form. - use records_per_group for _dummy - replace conditional logic of `records_per_group is not None` with `group_by` - add functional tests
Configuration menu - View commit details
-
Copy full SHA for dc0eda2 - Browse repository at this point
Copy the full SHA dc0eda2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4e3e155 - Browse repository at this point
Copy the full SHA 4e3e155View commit details
Commits on Dec 11, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 9ac13ea - Browse repository at this point
Copy the full SHA 9ac13eaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9cf2264 - Browse repository at this point
Copy the full SHA 9cf2264View commit details -
Configuration menu - View commit details
-
Copy full SHA for e01d302 - Browse repository at this point
Copy the full SHA e01d302View commit details
Commits on Dec 17, 2021
-
Add test for grouping by month alone
This test currently fails with a pandas-specific index error.
Configuration menu - View commit details
-
Copy full SHA for 897d00e - Browse repository at this point
Copy the full SHA 897d00eView commit details -
Implicitly group by year and month for month group
Instead of calculating a new (year, month) tuple when users group by month, add a "year" key to the list of group fields. This fixes a pandas indexing bug where calling `nlargest` on a SeriesGroupBy object that has a year and month tuple key for month causes pandas to think the single month key is a MultiIndex that should be a list. Although this commit is motivated to fix this pandas issue, this implementation of the year/month disambiguation is simpler and a more idiomatic pandas implementation that wouldn't have been possible in the original augur filter implementation (before we switched to pandas for metadata parsing).
Configuration menu - View commit details
-
Copy full SHA for 966da1d - Browse repository at this point
Copy the full SHA 966da1dView commit details -
Update unit and doc tests to match new month group
Simplifies unit tests and doctests by expecting a single value for each month instead of a tuple.
Configuration menu - View commit details
-
Copy full SHA for eea96fb - Browse repository at this point
Copy the full SHA eea96fbView commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.