Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved GpuArrowEvalPythonExec #783

Merged
merged 2 commits into from
Sep 17, 2020
Merged

Conversation

revans2
Copy link
Collaborator

@revans2 revans2 commented Sep 16, 2020

This gets GpuArrowEvalPythonExec to the point that I think it can be on by default. From my testing with this the performance is either better than the original version (for fast python UDFs) or is as good (for slow UDFs simulated using a sleep) but with much lower CPU utilization so it is likely that that it would end up being faster because there would be less CPU contention.

I did fix one bug in the existing code to make this work. In SpillableColumnarBatch if we try to create the batch from an batch of GpuColumnVectorFromBuffer I messed up the reference counting and things were being deleted before they should be, which ended up resulting in NullPointerExceptions instead of the double free or other similar error messages that are simpler to debug.

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
@revans2 revans2 added the performance A performance related task/issue label Sep 16, 2020
@revans2 revans2 added this to the Sep 14 - Sep 25 milestone Sep 16, 2020
@revans2 revans2 self-assigned this Sep 16, 2020
@revans2
Copy link
Collaborator Author

revans2 commented Sep 16, 2020

build

" when calling cuDF APIs in the UDF, also accelerates the data transfer between the" +
" Java process and Python process",
"The backend of the Scalar Pandas UDFs. Accelerates the data transfer between the" +
" Java process and Python process. It also supports running the Python UDFs code on" +
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably needs some work and a reference to how to enable/configure the cudf on GPU feature, but I am nto sure what to say because we are not 100% ready to promote that yet.

@revans2
Copy link
Collaborator Author

revans2 commented Sep 17, 2020

build

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found some time to review the rest, looks ok. Just a minor naming nit which isn't must-fix.

@revans2 revans2 merged commit 624344a into NVIDIA:branch-0.3 Sep 17, 2020
@revans2 revans2 deleted the python_udf branch September 17, 2020 15:12
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Improved GpuArrowEvalPythonExec

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Improved GpuArrowEvalPythonExec

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
…IDIA#783)

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants