Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize shuffle before coalesce #338

Open
andygrove opened this issue Oct 10, 2022 · 3 comments
Open

Optimize shuffle before coalesce #338

andygrove opened this issue Oct 10, 2022 · 3 comments
Labels
enhancement New feature or request performance

Comments

@andygrove
Copy link
Member

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

This looks inefficient. We are writing lots of shuffle files, reading them, and coalescing them into a single partition. Can we do the coalesce step before the shuffle write in this case?

opt-coalesce

Describe the solution you'd like
Optimize

Describe alternatives you've considered
None

Additional context
None

@andygrove andygrove added enhancement New feature or request performance labels Oct 10, 2022
@mingmwang
Copy link
Contributor

Could you please share me the SQL to reproduce the issue ?

@andygrove
Copy link
Member Author

This is from benchmark q2, but I now think that I may be mistaken about this being an issue. The final step needs to coalesce for a sort and we want the parallelism in the previous stage.

@mingmwang
Copy link
Contributor

Ok, if it is not a bug, I think maybe you can close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
Development

No branches or pull requests

2 participants