Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove legacy Arrow interop APIs #16590

Merged
merged 15 commits into from
Aug 22, 2024

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Aug 17, 2024

Description

Contributes to #15193.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vyasr vyasr added improvement Improvement / enhancement to an existing function breaking Breaking change labels Aug 17, 2024
@vyasr vyasr self-assigned this Aug 17, 2024
@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels Aug 17, 2024
@github-actions github-actions bot added the Java Affects Java cuDF API. label Aug 19, 2024
@vyasr vyasr mentioned this pull request Aug 19, 2024
3 tasks
@vyasr vyasr marked this pull request as ready for review August 19, 2024 21:48
@vyasr vyasr requested review from a team as code owners August 19, 2024 21:48
Copy link
Contributor

@KyleFromNVIDIA KyleFromNVIDIA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved CMake changes

Copy link
Contributor

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure what is happening, but we are getting failures trying to use this in Spark when we write data to python processes for python UDF processing.

E                   Caused by: ai.rapids.cudf.CudfException: writer failed to write table with the following error: Invalid: Tried to write record batch with different schema
E                       at ai.rapids.cudf.Table.writeArrowIPCArrowChunk(Native Method)

java/src/main/native/src/ColumnVectorJni.cpp Show resolved Hide resolved
@revans2
Copy link
Contributor

revans2 commented Aug 20, 2024

I did some more debugging and the issue appears to happen when after writing batches with non-null values out, we then try to write a batch with only null values in it.

DEBUG ARROW WRITE Table{columns=[ColumnVector{rows=4, type=INT8, nullCount=Optional.empty, offHeap=(ID: 283 7ebea0322ac0)}], cudfTable=139357196914992, rows=4}
GPU COLUMN 0 - NC: 0 DATA: DeviceMemoryBufferView{address=0x40a006ec0, length=4, id=-1} VAL: DeviceMemoryBufferView{address=0x40a006e80, length=64, id=-1}
COLUMN 0 - INT8
0 48
1 -68
2 -107
3 6
...
DEBUG ARROW WRITE Table{columns=[ColumnVector{rows=1, type=INT8, nullCount=Optional.empty, offHeap=(ID: 464 7ebea038cab0)}], cudfTable=139357196742224, rows=1}
GPU COLUMN 0 - NC: 0 DATA: DeviceMemoryBufferView{address=0x40a00d2c0, length=1, id=-1} VAL: DeviceMemoryBufferView{address=0x40a00d280, length=64, id=-1}
COLUMN 0 - INT8
0 88
DEBUG ARROW WRITE Table{columns=[ColumnVector{rows=1, type=INT8, nullCount=Optional.empty, offHeap=(ID: 474 7ebea038cbf0)}], cudfTable=139357196750432, rows=1}
GPU COLUMN 0 - NC: 1 DATA: DeviceMemoryBufferView{address=0x40a00d4c0, length=1, id=-1} VAL: DeviceMemoryBufferView{address=0x40a00d480, length=64, id=-1}
COLUMN 0 - INT8
0 NULL

That is when the error happens. We end up doing this somewhat regularly when we do groupby aggregations using python UDFS. Spark has a kind of crappy way of doing UDFS in python where each grouping key gets a separate arrow batch written out for it. This happens when we generate a group by where all of the values we are computing the agg for ended up being null (in this case it is a single row).

I think the problem is that the schema translation either expects to never see nulls if it didn't see it in the first batch, or that it has some kind of issue with the batch being all nulls. Not really sure. All I know is the schema discovery is not happy with this situation.

@vyasr
Copy link
Contributor Author

vyasr commented Aug 20, 2024

OK got it, thanks for tracking this case down. If you have time to write a minimal C++ repro that would be amazing, but no worries if not; I will try and create one myself by EOD today.

@vyasr
Copy link
Contributor Author

vyasr commented Aug 20, 2024

OK, I pushed a fix. We've had a few issues crop up around the new API around the nullability of columns when converting from cudf if the cudf column has no nulls. We may want to rethink defaults, but when I tried fiddling with them here I kept finding edge cases in other scenarios where things broke, so I suspect that the defaults in other libraries like arrow are not entirely consistent. For now, I've just patched the implementation for the JNI since we presumably want to rewrite this to avoid libarrow altogether. I'll open a new issue to discuss the default behavior, but I think this solution ought to be OK for now.

@vyasr
Copy link
Contributor Author

vyasr commented Aug 20, 2024

Opened #16621 for further discussion.

@vyasr vyasr requested a review from revans2 August 20, 2024 23:37
@revans2
Copy link
Contributor

revans2 commented Aug 21, 2024

I just tested and I am still getting the same error. I'll do some more debugging and see what I can come up with for this.

Copy link
Contributor

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That did it

@vyasr
Copy link
Contributor Author

vyasr commented Aug 21, 2024

Thanks for the quick turnaround on testing!

@vyasr vyasr mentioned this pull request Aug 21, 2024
3 tasks
Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small question otherwise looks good to me.

cpp/tests/interop/arrow_utils.hpp Outdated Show resolved Hide resolved
@vyasr
Copy link
Contributor Author

vyasr commented Aug 22, 2024

/merge

@rapids-bot rapids-bot bot merged commit 6c4905d into rapidsai:branch-24.10 Aug 22, 2024
79 checks passed
@vyasr vyasr deleted the chore/remove_legacy_interop branch August 22, 2024 04:13
rapids-bot bot pushed a commit that referenced this pull request Aug 22, 2024
I noticed from #16590 (comment) that there was one other file where `#pragma once` was not at the top. This PR fixes that.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #16636
vyasr added a commit to vyasr/cudf that referenced this pull request Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking change CMake CMake build issue improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants