Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support barrier mode for mapInPandas/mapInArrow #10343

Merged
merged 3 commits into from
Feb 1, 2024

Conversation

wbo4958
Copy link
Collaborator

@wbo4958 wbo4958 commented Jan 31, 2024

To fix #10344

Spark 3.5 has introduced a new feature supporting barrier mode for mapInPandas/mapInArrow, more detail can be found at https://issues.apache.org/jira/browse/SPARK-42896. However, spark-rapids missed this feature which resulted in unexpected behavior. For example

spark.range(10).mapInPandas(lambda x: x, "id long", True)

The same tasks of the above code will run on barrier mode on the CPU, while on non-barrier mode on the GPU with spark-rapids.

Signed-off-by: Bobby Wang <wbo4958@gmail.com>
@wbo4958
Copy link
Collaborator Author

wbo4958 commented Jan 31, 2024

build

Copy link
Collaborator

@firestarman firestarman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, but some nits.

@wbo4958 wbo4958 marked this pull request as draft January 31, 2024 07:21
@wbo4958
Copy link
Collaborator Author

wbo4958 commented Jan 31, 2024

build

@wbo4958 wbo4958 marked this pull request as ready for review January 31, 2024 08:50
Copy link
Collaborator

@firestarman firestarman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to test these on DBs before merging.

@wbo4958
Copy link
Collaborator Author

wbo4958 commented Feb 1, 2024

build

@@ -33,7 +34,7 @@
raise AssertionError("incorrect pyarrow version during required testing " + str(e))
pytestmark = pytest.mark.skip(reason=str(e))

from asserts import assert_gpu_and_cpu_are_equal_collect, assert_gpu_fallback_collect
from asserts import assert_gpu_and_cpu_are_equal_collect, assert_gpu_fallback_collect, assert_equal
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: An useless import to assert_equal

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good finding. Let's remove it in other PR that affects this file.

@wbo4958 wbo4958 merged commit 6723a68 into NVIDIA:branch-24.02 Feb 1, 2024
39 of 40 checks passed
@wbo4958 wbo4958 deleted the barrier branch February 1, 2024 05:19
wbo4958 added a commit to wbo4958/spark-rapids that referenced this pull request Feb 1, 2024
This reverts commit 6723a68.

Signed-off-by: Bobby Wang <wbo4958@gmail.com>
wbo4958 added a commit to wbo4958/spark-rapids that referenced this pull request Feb 1, 2024
jlowe pushed a commit that referenced this pull request Feb 1, 2024
…0355)

This reverts commit 6723a68.

Signed-off-by: Bobby Wang <wbo4958@gmail.com>
@sameerz sameerz added the task Work required that improves the product but is not user facing label Feb 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] support barrier mode for mapInPandas/mapInArrow
3 participants