Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the issue of exporting Column RDD [databricks] #4335

Merged
merged 1 commit into from
Dec 13, 2021

Conversation

wbo4958
Copy link
Collaborator

@wbo4958 wbo4958 commented Dec 9, 2021

This PR is trying to fix #4334.

After 3.1.x (included), ColumnarRDD(df) can't extract RDD[Table] directly, instead it will involve columnar to row and row to column which causes perf bad. It turned out the exportColumnRdd is not passed to GpuColumnarToRowExecParent

@wbo4958
Copy link
Collaborator Author

wbo4958 commented Dec 9, 2021

build

revans2
revans2 previously approved these changes Dec 9, 2021
jlowe
jlowe previously approved these changes Dec 9, 2021
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the test should be written as an integration test using the public ColumnarRdd API to replicate what user code would do, but I'm OK with this being a followup.

Copy link
Collaborator

@sameerz sameerz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please retarget to 22.02, as we are in code freeze for 21.12 and this is not a data corruption, crash, or customer critical fix (at this time).

@wbo4958 wbo4958 changed the base branch from branch-21.12 to branch-22.02 December 10, 2021 01:49
@wbo4958 wbo4958 dismissed stale reviews from jlowe and revans2 December 10, 2021 01:49

The base branch was changed.

Signed-off-by: Bobby Wang <wbo4958@gmail.com>
@wbo4958
Copy link
Collaborator Author

wbo4958 commented Dec 10, 2021

Ideally the test should be written as an integration test using the public ColumnarRdd API to replicate what user code would do, but I'm OK with this being a followup.

Hmm, Seems the user can't get ColumnarRdd API from PySpark.

@wbo4958
Copy link
Collaborator Author

wbo4958 commented Dec 10, 2021

build

Copy link
Collaborator

@sameerz sameerz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for moving to 22.02.

@sameerz sameerz added the bug Something isn't working label Dec 10, 2021
@jlowe
Copy link
Member

jlowe commented Dec 10, 2021

Premerge failed in what appears to be an unrelated Python env error. Rekicking.

@jlowe
Copy link
Member

jlowe commented Dec 10, 2021

build

@wbo4958
Copy link
Collaborator Author

wbo4958 commented Dec 13, 2021

build

@wbo4958 wbo4958 merged commit dab062e into NVIDIA:branch-22.02 Dec 13, 2021
@wbo4958 wbo4958 deleted the columnar-rdd branch December 13, 2021 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] GpuColumnarToRowExec will always be tagged False for exportColumnarRdd after Spark311
4 participants