Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for skewed join optimization when AQE is enabled #536

Merged
merged 8 commits into from
Aug 21, 2020

Conversation

andygrove
Copy link
Contributor

@andygrove andygrove commented Aug 10, 2020

This ports a skewed join test over from Spark and closes #515.

The original test had a skew size on the right-hand side that was only just over the threshold that would cause the skew join optimization to become effective. This worked intermittently on GPU because compression would sometimes cause the resulting partition to fall below the threshold, depending on the ordering of data within the columns being written to the shuffle files. The test is modified to use a larger skew partition to avoid this.

@andygrove andygrove added the test Only impacts tests label Aug 10, 2020
@andygrove andygrove added this to the Aug 3 - Aug 14 milestone Aug 10, 2020
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove
Copy link
Contributor Author

build

@andygrove andygrove changed the title [WIP] Add test for skewed join optimization when AQE is enabled Add test for skewed join optimization when AQE is enabled Aug 13, 2020
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks really good. Just have a few nits with naming and things

Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove
Copy link
Contributor Author

@revans2 Thanks for the review. I've added the comments as suggested.

revans2
revans2 previously approved these changes Aug 14, 2020
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@revans2
Copy link
Collaborator

revans2 commented Aug 14, 2020

build

@andygrove
Copy link
Contributor Author

The test failed against Spark 3.1:

09:45:23  AdaptiveQueryExecSuite:
09:45:25  - SPARK-29544: adaptive skew join with different join types *** FAILED ***
09:45:25    ArrayBuffer() had length 0 instead of expected length 1 (AdaptiveQueryExecSuite.scala:91)

The test works for me locally against Spark 3.1 (probably a different version). I will look into this.

Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove andygrove changed the title Add test for skewed join optimization when AQE is enabled [WIP] Add test for skewed join optimization when AQE is enabled Aug 18, 2020
@andygrove
Copy link
Contributor Author

build

@andygrove
Copy link
Contributor Author

Although the tests passed this time, they are not passing consistently, so please don't merge.

Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove
Copy link
Contributor Author

build

Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove
Copy link
Contributor Author

build

@andygrove andygrove changed the title [WIP] Add test for skewed join optimization when AQE is enabled Add test for skewed join optimization when AQE is enabled Aug 21, 2020
@andygrove
Copy link
Contributor Author

@revans2 @jlowe This is ready for review / re-review. The only change since the last approval is increasing the size of the skew partition in the test to fix the intermittent failures caused by compression sometimes creating a partition that fell below the threshold that would trigger the optimization.

@jlowe jlowe merged commit e65684a into NVIDIA:branch-0.2 Aug 21, 2020
@andygrove andygrove deleted the skewed-join-test branch December 17, 2020 15:24
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Implement skewed join test for AQE

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* fix merge conflict + use named parameters

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add comments to address PR feedback

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* split test into three tests

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* rig the skew amount to compensate for compression when running on GPU

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* add comment re adjusted skew amount

Signed-off-by: Andy Grove <andygrove@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Implement skewed join test for AQE

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* fix merge conflict + use named parameters

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add comments to address PR feedback

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* split test into three tests

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* rig the skew amount to compensate for compression when running on GPU

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* add comment re adjusted skew amount

Signed-off-by: Andy Grove <andygrove@nvidia.com>
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Only impacts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Write tests for AQE skewed join optimization
4 participants