-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU support for DynamicPruningExpression and InSubqueryExec #9091
Conversation
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
build |
build |
Manually tested that this PR resolves the fallback issues in the NDS benchmrak for DynamicPruningExpression and InSubqueryExec on AWS EMR 6.12. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really just some nits
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuDynamicPruningExpression.scala
Show resolved
Hide resolved
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/GpuInSubqueryExecSuite.scala
Show resolved
Hide resolved
build |
1 similar comment
build |
build |
build |
Seeing some odd results when benchmarking this at scale, marking this as draft while it is investigated. |
The runs at scale are quite noisy. This seems to significantly help NDS query 37 and 82, but the results for the remaining queries that had DynamicPruningExpression and InSubqueryExec fallbacks on AWS EMR 6.12 were too noisy to tell. |
build |
1 similar comment
build |
This adds GPU support for DynamicPruningExpression and InSubqueryExec which has been seen outside of the normal DPP filter expressions within a scan on some Spark platforms. The implementations are relatively straightforward with the caveat that GpuInSubqueryExec needs to ensure the results are serialized to the executor since we do not participate in codegen like the CPU version does which the CPU relies on to get the results to the executor (i.e.: results are in the generated source code sent to the executors).
The support is only added for Spark 3.3+ on non-Databricks platforms. Databricks has a different number of arguments for InSubqueryExec that would need to be investigated, and Spark versions prior to 3.3 also have differing arguments. We don't see InSubqueryExec on any Spark platforms before Spark 3.3+, so not having a GPU replacement on these platforms should not be an issue in practice.