Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Can we reduce the cost of the partial sort agg heuristic #8623

Open
revans2 opened this issue Jun 27, 2023 · 0 comments
Open

[FEA] Can we reduce the cost of the partial sort agg heuristic #8623

revans2 opened this issue Jun 27, 2023 · 0 comments
Labels
feature request New feature or request performance A performance related task/issue

Comments

@revans2
Copy link
Collaborator

revans2 commented Jun 27, 2023

Is your feature request related to a problem? Please describe.
I don't know how important this is. When I was doing performance testing I didn't see many cases where the performance was significantly impacted by the performance of this. But it also is not free, which is why I added a metric for it in #8618.

One such example is doing a simple SUM on an int. Where the key has a decent amount of overlap and is small. Because the SUM of an int is a long and because there are only two columns it looks like the size of the output is growing by more than 10%, but the in practice that is not true, and the cost of computing the distinct count might be more than the cost of doing the aggregation itself. I saw a 9% performance degradation in this one case. I don't know if my magic number of 10% should be closer to 200% or something, or if there is a better way to decide that the number of aggregations is just too small to even bother with trying to do the heuristic.

@revans2 revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify labels Jun 27, 2023
@mattahrens mattahrens added performance A performance related task/issue and removed ? - Needs Triage Need team to review and classify labels Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request performance A performance related task/issue
Projects
None yet
Development

No branches or pull requests

2 participants