Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerate data transfer for map Pandas UDF plan #2035

Merged
merged 5 commits into from
Apr 8, 2021

Conversation

firestarman
Copy link
Collaborator

This PR is to accelerate the data transfer between JVM and Python for the plan GpuMapInPandas, by implementing the columnar way.

Also add the related integration tests.

For issue #305

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

Will run some benchmark locally before marking ready for review.

@firestarman firestarman added feature request New feature or request performance A performance related task/issue labels Mar 29, 2021
@firestarman firestarman marked this pull request as ready for review March 30, 2021 04:22
@firestarman
Copy link
Collaborator Author

build

@firestarman
Copy link
Collaborator Author

Will run some benchmark locally before marking ready for review.

Got some positive numbers and tests passed on databricks. Mark as ready for review.

@firestarman firestarman requested review from revans2, gerashegalov and jlowe and removed request for gerashegalov March 31, 2021 02:13
@jlowe
Copy link
Member

jlowe commented Mar 31, 2021

Got some positive numbers

Can you elaborate here what those numbers were? What was the use-case and how much improvement did you see?

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll defer to @revans2 for reviewing the details of the iterator changes since he's much more familiar with that code, but it does seem like we could do a better job of testing the various type combinations.

integration_tests/src/main/python/udf_test.py Outdated Show resolved Hide resolved
@firestarman
Copy link
Collaborator Author

I'll defer to @revans2 for reviewing the details of the iterator changes since he's much more familiar with that code, but it does seem like we could do a better job of testing the various type combinations.

Thanks for review. I will try to cover more supported types.

@firestarman
Copy link
Collaborator Author

Got some positive numbers

Can you elaborate here what those numbers were? What was the use-case and how much improvement did you see?

Not easy to show the numbers here, so here is the link.

I just ran some benchmark for data transfer locally to verify the columnar way will get better perf, and this PR is for the issue #305 .

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

firestarman commented Apr 2, 2021

Forced push is only for the commit signature issue. Since the latest merge commit has no signature by default. I added it by a forced push.

@firestarman
Copy link
Collaborator Author

build

@firestarman
Copy link
Collaborator Author

@revans2 Hi, Could you take a look at this PR ? Thanks in advance.

@firestarman
Copy link
Collaborator Author

I am going to merge this, if any concern, I will make new PRs.

@firestarman firestarman merged commit 0f9b30e into NVIDIA:branch-0.5 Apr 8, 2021
@firestarman firestarman deleted the map-columnar branch April 8, 2021 01:11
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Accelerate data transfer for map Pandas UDF node.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Accelerate data transfer for map Pandas UDF node.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants