Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change aggregation class hierarchy to allow per-algorithm type enforcement. #8052

Merged
merged 15 commits into from
May 7, 2021

Conversation

nvdbaranec
Copy link
Contributor

Partially addresses #7106

Fundamentally, this changes the aggregation class hierarchy in the following ways:

  • The base aggregation class becomes abstract, with the clone() and finalize() functions being pure virtual.
  • Every aggregation type now has a concrete class associated with it, derived from aggregation.
  • "Intermediate" classes such as rolling_aggregation are used to allow individual algorithms to only accept aggregation types that are valid for it (as opposed to enforcing this internally at runtime).

All of the rolling_window interfaces have been updated to take a rolling_aggregation. Other algorithms such as groupby are not yet converted and still take generic aggregation objects.

Marking this as Do Not Merge for now since this is a breaking change with immediately implications for Spark.

@nvdbaranec nvdbaranec added libcudf Affects libcudf (C++/CUDA) code. 5 - DO NOT MERGE Hold off on merging; see PR for details improvement Improvement / enhancement to an existing function breaking Breaking change labels Apr 24, 2021
@nvdbaranec nvdbaranec requested a review from a team as a code owner April 24, 2021 01:13
@nvdbaranec
Copy link
Contributor Author

Pinging @revans2. Let me know how you want to handle the timing on merging this once it gets approved.

@nvdbaranec
Copy link
Contributor Author

rerun tests

@revans2
Copy link
Contributor

revans2 commented Apr 26, 2021

Pinging @revans2. Let me know how you want to handle the timing on merging this once it gets approved.

I am working on a patch for the JNI code right now. Will post it when I am done.

virtual std::vector<aggregation::Kind> get_simple_aggregations(data_type col_type) const;
virtual void finalize(cudf::detail::aggregation_finalizer& finalizer);
protected:
rolling_aggregation(){};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is a ctor needed at all in this type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gets implicitly called by all the derived classes. eg

class count_aggregation final : public rolling_aggregation {
 public:
  count_aggregation(aggregation::Kind kind) : aggregation(kind) {}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but why isn't the compiler provided one sufficient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@revans2
Copy link
Contributor

revans2 commented Apr 26, 2021

#8069 is to get the java code working properly

@nvdbaranec nvdbaranec requested a review from a team as a code owner April 29, 2021 22:08
@nvdbaranec nvdbaranec added 4 - Needs cuDF (Python) Reviewer and removed 5 - DO NOT MERGE Hold off on merging; see PR for details labels Apr 29, 2021
@github-actions github-actions bot added the Python Affects Python cuDF API. label Apr 29, 2021
@codecov
Copy link

codecov bot commented Apr 30, 2021

Codecov Report

Merging #8052 (d4cc46e) into branch-0.20 (51336df) will decrease coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.20    #8052      +/-   ##
===============================================
- Coverage        82.88%   82.88%   -0.01%     
===============================================
  Files              103      104       +1     
  Lines            17668    17894     +226     
===============================================
+ Hits             14645    14832     +187     
- Misses            3023     3062      +39     
Impacted Files Coverage Δ
python/cudf/cudf/core/tools/datetimes.py 80.42% <0.00%> (-4.11%) ⬇️
python/cudf/cudf/core/column/decimal.py 91.04% <0.00%> (-1.89%) ⬇️
python/cudf/cudf/core/column/datetime.py 88.03% <0.00%> (-1.88%) ⬇️
python/cudf/cudf/core/column/struct.py 94.73% <0.00%> (-1.56%) ⬇️
python/cudf/cudf/utils/dtypes.py 82.20% <0.00%> (-1.24%) ⬇️
python/dask_cudf/dask_cudf/groupby.py 91.28% <0.00%> (-0.88%) ⬇️
python/cudf/cudf/core/series.py 91.17% <0.00%> (-0.56%) ⬇️
python/cudf/cudf/core/index.py 92.52% <0.00%> (-0.55%) ⬇️
python/cudf/cudf/core/column/column.py 88.20% <0.00%> (-0.44%) ⬇️
python/cudf/cudf/core/column/lists.py 86.98% <0.00%> (-0.43%) ⬇️
... and 28 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8ae73d5...d4cc46e. Read the comment docs.

Copy link
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit. Looks good to me.

cpp/src/groupby/hash/groupby.cu Outdated Show resolved Hide resolved
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving Python/Cython.

It's a shame that we'll have to carbon copy nearly everything once for every aggregation type as #7106 progresses. Since the set of possible aggregation types is known a priori, we should see if we can use Cython's fused types to more transparently wrap at least some of the templates. Templates and fused types don't play nearly as nicely together as I'd like, but we may at least be able to merge the cdef classes like RollingAggregation/Aggregation. That's out of scope for this PR though.

@shwina
Copy link
Contributor

shwina commented May 7, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit e2c7067 into rapidsai:branch-0.20 May 7, 2021
rapids-bot bot pushed a commit that referenced this pull request May 7, 2021
This depends on #8052 

It adds in similar changes to the java APIs to have the compiler be able to check if an aggregation will work for rolling window or not.

Authors:
  - Robert (Bobby) Evans (https://github.com/revans2)
  - https://github.com/nvdbaranec

Approvers:
  - Jason Lowe (https://github.com/jlowe)

URL: #8069
rapids-bot bot pushed a commit that referenced this pull request Aug 19, 2021
…e their usage. (#8906)

Followup to #8052
Partially addresses #7106

Adds the `groupby_aggregation` class and forces usage of that type when calling `groupby::aggregate()`

Adds the `groupby_scan_aggregation` class and forces usage of that type when calling `groupby::scan()`

Authors:
  - https://github.com/nvdbaranec

Approvers:
  - Robert (Bobby) Evans (https://github.com/revans2)
  - Jake Hemstad (https://github.com/jrhemstad)
  - David Wendt (https://github.com/davidwendt)
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Nghia Truong (https://github.com/ttnghia)
  - Devavret Makkar (https://github.com/devavret)

URL: #8906
@vyasr vyasr added 4 - Needs Review Waiting for reviewer to review or respond and removed 4 - Needs cuDF (Python) Reviewer labels Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Needs Review Waiting for reviewer to review or respond breaking Breaking change improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants