-
Notifications
You must be signed in to change notification settings - Fork 891
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This PR refactors most sorting APIs of Frame and its subclasses. To support these changes, it also refactors the implementation of `take`. New Features: - DataFrame nlargest/nsmallest will accept multiple columns. Previously this would fail unexpectedly. - BaseIndex.sort_values now accepts na_position to be consistent with other sorts. - DataFrame.argsort now accepts an (optional) by parameter to indicate what columns to order by. Performance: - DataFrame nlargest/nsmallest are up to 10x faster for small inputs. - take is significantly faster for all classes. For instance I see about a 2x speedup for Series. - DataFrame.sort_values is ~10% faster for small inputs. Deprecations/Removals/Breaking Changes: - Deprecating arguments to take other than numerical indexes. Boolean masks are deprecated and will no longer be supported in the future. This matches pandas behavior and allows us to simplify our code. - The parameter for take has been renamed to `indices` from `positions` for consistency with pandas. This is a breaking change. If reviewers think it's important to still support positions as a kwarg we could add a backwards compatibility layer. My thinking is that this is probably not a frequently used API, and where it is used it's almost always used with a positional argument so renaming the first argument is not a huge issue. There's one additional note that fits under a couple of the headings. While unifying implementations of argsort it made sense to change the behavior of DataFrame.argsort to return a cupy array instead of a Series. There's no corresponding pandas API so we have some freedom to choose the appropriate output, and I think an array makes more sense. However, `Column.values` is not that fast (yet, I plan to optimize soon), so it's actually slower right now to return the array than to return a Series constructed via `_from_data`. I think this is OK for now, but if reviewers feel strongly about it I can change it back to returning a Series. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #9464
- Loading branch information
Showing
10 changed files
with
408 additions
and
458 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.