[FEA] Can we reduce the cost of the partial sort agg heuristic #8623

revans2 · 2023-06-27T22:09:31Z

Is your feature request related to a problem? Please describe.
I don't know how important this is. When I was doing performance testing I didn't see many cases where the performance was significantly impacted by the performance of this. But it also is not free, which is why I added a metric for it in #8618.

One such example is doing a simple SUM on an int. Where the key has a decent amount of overlap and is small. Because the SUM of an int is a long and because there are only two columns it looks like the size of the output is growing by more than 10%, but the in practice that is not true, and the cost of computing the distinct count might be more than the cost of doing the aggregation itself. I saw a 9% performance degradation in this one case. I don't know if my magic number of 10% should be closer to 200% or something, or if there is a better way to decide that the number of aggregations is just too small to even bother with trying to do the heuristic.

revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify labels Jun 27, 2023

mattahrens added performance A performance related task/issue and removed ? - Needs Triage Need team to review and classify labels Jul 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Can we reduce the cost of the partial sort agg heuristic #8623

[FEA] Can we reduce the cost of the partial sort agg heuristic #8623

revans2 commented Jun 27, 2023

[FEA] Can we reduce the cost of the partial sort agg heuristic #8623

[FEA] Can we reduce the cost of the partial sort agg heuristic #8623

Comments

revans2 commented Jun 27, 2023