-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accelerate data transfer for map Pandas UDF plan #2035
Conversation
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Will run some benchmark locally before marking ready for review. |
build |
Got some positive numbers and tests passed on databricks. Mark as ready for review. |
Can you elaborate here what those numbers were? What was the use-case and how much improvement did you see? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll defer to @revans2 for reviewing the details of the iterator changes since he's much more familiar with that code, but it does seem like we could do a better job of testing the various type combinations.
Thanks for review. I will try to cover more supported types. |
Not easy to show the numbers here, so here is the link. I just ran some benchmark for data transfer locally to verify the columnar way will get better perf, and this PR is for the issue #305 . |
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
274078c
to
8cc73e3
Compare
Forced push is only for the commit signature issue. Since the latest merge commit has no signature by default. I added it by a forced push. |
build |
@revans2 Hi, Could you take a look at this PR ? Thanks in advance. |
I am going to merge this, if any concern, I will make new PRs. |
* Accelerate data transfer for map Pandas UDF node. Signed-off-by: Firestarman <firestarmanllc@gmail.com>
* Accelerate data transfer for map Pandas UDF node. Signed-off-by: Firestarman <firestarmanllc@gmail.com>
This PR is to accelerate the data transfer between JVM and Python for the plan
GpuMapInPandas
, by implementing the columnar way.Also add the related integration tests.
For issue #305