-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support barrier mode for mapInPandas/mapInArrow #10343
Conversation
Signed-off-by: Bobby Wang <wbo4958@gmail.com>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, but some nits.
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/python/GpuMapInBatchExec.scala
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/python/GpuMapInPandasExec.scala
Show resolved
Hide resolved
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to test these on DBs before merging.
build |
@@ -33,7 +34,7 @@ | |||
raise AssertionError("incorrect pyarrow version during required testing " + str(e)) | |||
pytestmark = pytest.mark.skip(reason=str(e)) | |||
|
|||
from asserts import assert_gpu_and_cpu_are_equal_collect, assert_gpu_fallback_collect | |||
from asserts import assert_gpu_and_cpu_are_equal_collect, assert_gpu_fallback_collect, assert_equal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: An useless import to assert_equal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good finding. Let's remove it in other PR that affects this file.
This reverts commit 6723a68. Signed-off-by: Bobby Wang <wbo4958@gmail.com>
This reverts commit 6723a68.
To fix #10344
Spark 3.5 has introduced a new feature supporting barrier mode for mapInPandas/mapInArrow, more detail can be found at https://issues.apache.org/jira/browse/SPARK-42896. However, spark-rapids missed this feature which resulted in unexpected behavior. For example
The same tasks of the above code will run on barrier mode on the CPU, while on
non-barrier
mode on the GPU with spark-rapids.