Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable numpy ufuncs for DataFrame #10287

Merged
merged 10 commits into from
Feb 15, 2022

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Feb 14, 2022

This PR addresses the primary issue in #9083, enabling all numpy ufuncs for DataFrame objects. It builds on the work in #10217, generalizing that code path to support multiple columns and moving the method up to IndexedFrame to share the logic with DataFrame. The custom preprocessing of inputs before handing off to cupy that was implemented in #10217 has been replaced by reusing parts of the existing binop machinery for greater generality, which is especially important for DataFrame binops since they support a wider range of alternative operand types. The current internal refactor is intentionally minimal to leave the focus on the new ufunc features. I will make a follow-up to clean up the internal functions by adding a proper set of hooks into the binop and ufunc implementations so that we can share these implementations with Index types as well, at which point we will be able to remove the extraneous APIs discussed in #9083 (comment).

@vyasr vyasr added feature request New feature or request 3 - Ready for Review Ready for review by team Python Affects Python cuDF API. non-breaking Non-breaking change labels Feb 14, 2022
@vyasr vyasr added this to the CuDF Python Refactoring milestone Feb 14, 2022
@vyasr vyasr requested a review from a team as a code owner February 14, 2022 22:24
@vyasr vyasr self-assigned this Feb 14, 2022
@codecov
Copy link

codecov bot commented Feb 14, 2022

Codecov Report

Merging #10287 (4928004) into branch-22.04 (a7d88cd) will increase coverage by 0.24%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-22.04   #10287      +/-   ##
================================================
+ Coverage         10.42%   10.67%   +0.24%     
================================================
  Files               119      122       +3     
  Lines             20603    20874     +271     
================================================
+ Hits               2148     2228      +80     
- Misses            18455    18646     +191     
Impacted Files Coverage Δ
python/cudf/cudf/_fuzz_testing/fuzzer.py 0.00% <ø> (ø)
python/cudf/cudf/_fuzz_testing/io.py 0.00% <ø> (ø)
python/cudf/cudf/_fuzz_testing/main.py 0.00% <ø> (ø)
python/cudf/cudf/_version.py 0.00% <ø> (ø)
python/cudf/cudf/comm/gpuarrow.py 0.00% <ø> (ø)
python/cudf/cudf/core/_base_index.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/categorical.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/column.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/datetime.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/methods.py 0.00% <ø> (ø)
... and 63 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 374b387...4928004. Read the comment docs.

Copy link
Contributor

@shwina shwina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@vyasr
Copy link
Contributor Author

vyasr commented Feb 15, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 8b0737d into rapidsai:branch-22.04 Feb 15, 2022
rapids-bot bot pushed a commit that referenced this pull request Feb 25, 2022
This PR builds on #10217 and #10287 to bring full ufunc support for Index types, expanding well beyond the small set previously supported in the `cudf.core.ops` namespace. By using most of the machinery introduced for IndexedFrame in the prior two PRs we avoid duplicating much logic so that all ufunc dispatches flow through a relatively standard path of known methods prior to a common cupy dispatch. With this change we are also able to deprecate the various ufunc operations defined in cudf/core/ops.py that exist only for this purpose as well as a number of Frame methods that are not defined for the corresponding pandas types. Users of those APIs are recommended to calling the corresponding numpy/cupy ufuncs instead to leverage the new dispatch.

This PR also fixes a bug where index binary operations that output booleans would previously return instances of GenericIndex, whereas those pandas operations would return numpy arrays. cudf now returns cupy arrays in those cases.

Resolves #9083. Contributes to #9038.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #10346
@vyasr vyasr mentioned this pull request Feb 25, 2022
rapids-bot bot pushed a commit that referenced this pull request Mar 2, 2022
This PR cleans up the implementation of `__array_function__` for `Series` and `DataFrame` to bring them further into alignment. It also inlines a number of functions defined in `utils/utils.py` that were previously used only in `Series.__array_ufunc__`, building on the improvements in #10217, #10287, and #10346 to clear out methods related to the old `__array_ufunc__` dispatch that are now only used by this `__array_function__` implementation. Inlining these methods also allows significant simplification since they were handling cases that are no longer relevant or possible. Unlike those previous PRs, this one does not actually enable any new features. Although it should marginally accelerate array functions by simplifying the dispatch logic, the fact that this API makes few promises about the nature of the function being applied and our desire to have it "just work" as much as possible means that we must simply adopt an EAFP approach and return `NotImplemented` if any part of the process fails.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #10364
@vyasr vyasr deleted the refactor/dataframe_array_ufunc branch March 9, 2022 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team feature request New feature or request non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants