You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
In Spark, distinct will introduce an expand operator before aggregation. And the things done there are mostly around populating some null columns. However, in current implementation, it shows significant performance issue (e.g., 9X longer time than doing this on 16 CPU cores). We need to catch up performance at least similar level performance.
Describe the solution you'd like #10560 was already mentioning an approach there. Besides that, from nsys trace, lots of making null columns bring significant negative impacts to overall performance. Also other optimizations came up from @binmahone around introducing a post coalesce after expand to increase batch size for aggregation.
The text was updated successfully, but these errors were encountered:
winningsix
changed the title
[FEA] Optimize count distinct performance optimization
[FEA] Optimize count distinct performance optimization with null columns reuse and post expand coalesce
May 13, 2024
Is your feature request related to a problem? Please describe.
In Spark, distinct will introduce an expand operator before aggregation. And the things done there are mostly around populating some null columns. However, in current implementation, it shows significant performance issue (e.g., 9X longer time than doing this on 16 CPU cores). We need to catch up performance at least similar level performance.
Describe the solution you'd like
#10560 was already mentioning an approach there. Besides that, from nsys trace, lots of making null columns bring significant negative impacts to overall performance. Also other optimizations came up from @binmahone around introducing a post coalesce after expand to increase batch size for aggregation.
The text was updated successfully, but these errors were encountered: