Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup some instances of excess closure serialization #1097

Merged
merged 1 commit into from
Nov 11, 2020

Conversation

jlowe
Copy link
Member

@jlowe jlowe commented Nov 11, 2020

I ran across an instance in GpuCoalesceBatches where the code was referencing child.output within the mapPartitions call which requires the entire child instance to be serialized. That essentially serializes the entire Catalyst plan up to that point.

This removes many instances of child.output being referenced in mapPartitions along with some cleanup of other member variables being accessed within the mapPartitions closure, requiring the entire object to be serialized.

Signed-off-by: Jason Lowe <jlowe@nvidia.com>
@jlowe jlowe added the SQL part of the SQL/Dataframe plugin label Nov 11, 2020
@jlowe jlowe added this to the Nov 23 - Dec 4 milestone Nov 11, 2020
@jlowe jlowe self-assigned this Nov 11, 2020
@jlowe
Copy link
Member Author

jlowe commented Nov 11, 2020

Note that I believe there are still many instances where class constructor arguments are being referenced within mapPartitions closures and probably requiring the entire class to be serialized. I suggest tackling those cases in a followup issue if that's agreeable.

@jlowe
Copy link
Member Author

jlowe commented Nov 11, 2020

build

@jlowe jlowe merged commit 53e7976 into NVIDIA:branch-0.3 Nov 11, 2020
sperlingxx pushed a commit to sperlingxx/spark-rapids that referenced this pull request Nov 20, 2020
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
@jlowe jlowe deleted the closure-cleanup branch September 10, 2021 15:41
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
…IDIA#1097)

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SQL part of the SQL/Dataframe plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants