Improved GpuArrowEvalPythonExec #783

revans2 · 2020-09-16T21:15:34Z

This gets GpuArrowEvalPythonExec to the point that I think it can be on by default. From my testing with this the performance is either better than the original version (for fast python UDFs) or is as good (for slow UDFs simulated using a sleep) but with much lower CPU utilization so it is likely that that it would end up being faster because there would be less CPU contention.

I did fix one bug in the existing code to make this work. In SpillableColumnarBatch if we try to create the batch from an batch of GpuColumnVectorFromBuffer I messed up the reference counting and things were being deleted before they should be, which ended up resulting in NullPointerExceptions instead of the double free or other similar error messages that are simpler to debug.

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

revans2 · 2020-09-16T21:15:45Z

build

revans2 · 2020-09-16T21:17:20Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala

-        " when calling cuDF APIs in the UDF, also accelerates the data transfer between the" +
-        " Java process and Python process",
+      "The backend of the Scalar Pandas UDFs. Accelerates the data transfer between the" +
+          " Java process and Python process. It also supports running the Python UDFs code on" +


This probably needs some work and a reference to how to enable/configure the cudf on GPU feature, but I am nto sure what to say because we are not 100% ready to promote that yet.

sql-plugin/src/main/scala/com/nvidia/spark/rapids/SpillableColumnarBatch.scala

revans2 · 2020-09-17T13:05:43Z

build

jlowe

Found some time to review the rest, looks ok. Just a minor naming nit which isn't must-fix.

* Improved GpuArrowEvalPythonExec Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

…IDIA#783) Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com> Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

Improved GpuArrowEvalPythonExec

5095d27

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

revans2 added the performance A performance related task/issue label Sep 16, 2020

revans2 added this to the Sep 14 - Sep 25 milestone Sep 16, 2020

revans2 self-assigned this Sep 16, 2020

revans2 commented Sep 16, 2020

View reviewed changes

jlowe reviewed Sep 17, 2020

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/SpillableColumnarBatch.scala Show resolved Hide resolved

Addressed review comments

3580294

jlowe approved these changes Sep 17, 2020

View reviewed changes

revans2 merged commit 624344a into NVIDIA:branch-0.3 Sep 17, 2020

revans2 deleted the python_udf branch September 17, 2020 15:12

nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021

Improved GpuArrowEvalPythonExec (NVIDIA#783)

985589a

* Improved GpuArrowEvalPythonExec Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021

Improved GpuArrowEvalPythonExec (NVIDIA#783)

5f21b71

* Improved GpuArrowEvalPythonExec Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved GpuArrowEvalPythonExec #783

Improved GpuArrowEvalPythonExec #783

revans2 commented Sep 16, 2020

revans2 commented Sep 16, 2020

revans2 Sep 16, 2020

revans2 commented Sep 17, 2020

jlowe left a comment

Improved GpuArrowEvalPythonExec #783

Improved GpuArrowEvalPythonExec #783

Conversation

revans2 commented Sep 16, 2020

revans2 commented Sep 16, 2020

revans2 Sep 16, 2020

Choose a reason for hiding this comment

revans2 commented Sep 17, 2020

jlowe left a comment

Choose a reason for hiding this comment