Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU support for DynamicPruningExpression and InSubqueryExec #9091

Merged
merged 8 commits into from
Aug 28, 2023

Conversation

jlowe
Copy link
Member

@jlowe jlowe commented Aug 22, 2023

This adds GPU support for DynamicPruningExpression and InSubqueryExec which has been seen outside of the normal DPP filter expressions within a scan on some Spark platforms. The implementations are relatively straightforward with the caveat that GpuInSubqueryExec needs to ensure the results are serialized to the executor since we do not participate in codegen like the CPU version does which the CPU relies on to get the results to the executor (i.e.: results are in the generated source code sent to the executors).

The support is only added for Spark 3.3+ on non-Databricks platforms. Databricks has a different number of arguments for InSubqueryExec that would need to be investigated, and Spark versions prior to 3.3 also have differing arguments. We don't see InSubqueryExec on any Spark platforms before Spark 3.3+, so not having a GPU replacement on these platforms should not be an issue in practice.

Signed-off-by: Jason Lowe <jlowe@nvidia.com>
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
@jlowe jlowe self-assigned this Aug 22, 2023
@jlowe
Copy link
Member Author

jlowe commented Aug 22, 2023

build

@jlowe
Copy link
Member Author

jlowe commented Aug 22, 2023

build

@jlowe
Copy link
Member Author

jlowe commented Aug 22, 2023

Manually tested that this PR resolves the fallback issues in the NDS benchmrak for DynamicPruningExpression and InSubqueryExec on AWS EMR 6.12.

revans2
revans2 previously approved these changes Aug 22, 2023
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really just some nits

@jlowe
Copy link
Member Author

jlowe commented Aug 22, 2023

build

1 similar comment
@jlowe
Copy link
Member Author

jlowe commented Aug 22, 2023

build

@sameerz sameerz added the performance A performance related task/issue label Aug 23, 2023
@jlowe jlowe changed the title GPU support for DynamicPruningExpression and InSubqueryExec [databricks] GPU support for DynamicPruningExpression and InSubqueryExec Aug 23, 2023
@jlowe
Copy link
Member Author

jlowe commented Aug 23, 2023

build

revans2
revans2 previously approved these changes Aug 23, 2023
@jlowe
Copy link
Member Author

jlowe commented Aug 23, 2023

build

revans2
revans2 previously approved these changes Aug 23, 2023
@jlowe jlowe marked this pull request as draft August 23, 2023 20:53
@jlowe
Copy link
Member Author

jlowe commented Aug 23, 2023

Seeing some odd results when benchmarking this at scale, marking this as draft while it is investigated.

@jlowe
Copy link
Member Author

jlowe commented Aug 25, 2023

The runs at scale are quite noisy. This seems to significantly help NDS query 37 and 82, but the results for the remaining queries that had DynamicPruningExpression and InSubqueryExec fallbacks on AWS EMR 6.12 were too noisy to tell.

@jlowe jlowe marked this pull request as ready for review August 25, 2023 20:00
@jlowe
Copy link
Member Author

jlowe commented Aug 25, 2023

build

1 similar comment
@jlowe
Copy link
Member Author

jlowe commented Aug 28, 2023

build

@jlowe jlowe merged commit 3717cd7 into NVIDIA:branch-23.10 Aug 28, 2023
27 of 28 checks passed
@jlowe jlowe deleted the insubquery branch August 28, 2023 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants