-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test for skewed join optimization when AQE is enabled #536
Conversation
Signed-off-by: Andy Grove <andygrove@nvidia.com>
6b866aa
to
79cdc87
Compare
Signed-off-by: Andy Grove <andygrove@nvidia.com>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks really good. Just have a few nits with naming and things
tests/src/test/scala/com/nvidia/spark/rapids/AdaptiveQueryExecSuite.scala
Show resolved
Hide resolved
tests/src/test/scala/com/nvidia/spark/rapids/AdaptiveQueryExecSuite.scala
Show resolved
Hide resolved
tests/src/test/scala/com/nvidia/spark/rapids/AdaptiveQueryExecSuite.scala
Show resolved
Hide resolved
Signed-off-by: Andy Grove <andygrove@nvidia.com>
@revans2 Thanks for the review. I've added the comments as suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
build |
The test failed against Spark 3.1:
The test works for me locally against Spark 3.1 (probably a different version). I will look into this. |
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
build |
Although the tests passed this time, they are not passing consistently, so please don't merge. |
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
5105e42
to
2ce1d45
Compare
build |
Signed-off-by: Andy Grove <andygrove@nvidia.com>
build |
@revans2 @jlowe This is ready for review / re-review. The only change since the last approval is increasing the size of the skew partition in the test to fix the intermittent failures caused by compression sometimes creating a partition that fell below the threshold that would trigger the optimization. |
* Implement skewed join test for AQE Signed-off-by: Andy Grove <andygrove@nvidia.com> * fix merge conflict + use named parameters Signed-off-by: Andy Grove <andygrove@nvidia.com> * Add comments to address PR feedback Signed-off-by: Andy Grove <andygrove@nvidia.com> * split test into three tests Signed-off-by: Andy Grove <andygrove@nvidia.com> * rig the skew amount to compensate for compression when running on GPU Signed-off-by: Andy Grove <andygrove@nvidia.com> * add comment re adjusted skew amount Signed-off-by: Andy Grove <andygrove@nvidia.com>
* Implement skewed join test for AQE Signed-off-by: Andy Grove <andygrove@nvidia.com> * fix merge conflict + use named parameters Signed-off-by: Andy Grove <andygrove@nvidia.com> * Add comments to address PR feedback Signed-off-by: Andy Grove <andygrove@nvidia.com> * split test into three tests Signed-off-by: Andy Grove <andygrove@nvidia.com> * rig the skew amount to compensate for compression when running on GPU Signed-off-by: Andy Grove <andygrove@nvidia.com> * add comment re adjusted skew amount Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com> Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
This ports a skewed join test over from Spark and closes #515.
The original test had a skew size on the right-hand side that was only just over the threshold that would cause the skew join optimization to become effective. This worked intermittently on GPU because compression would sometimes cause the resulting partition to fall below the threshold, depending on the ordering of data within the columns being written to the shuffle files. The test is modified to use a larger skew partition to avoid this.