Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve unparsing to handle aggregations with grouping set #36

Merged
merged 3 commits into from
Oct 1, 2024

Conversation

sgrebnov
Copy link

@sgrebnov sgrebnov commented Oct 1, 2024

Which issue does this PR close?

PR improves Aggregation unparsing logic to handle plans with grouping sets. When Aggregation is created (below) grouping_set_to_exprlist is used to retrieve grouping set expressions and populate schema (qualified_fields)

https://github.com/apache/datafusion/blob/f54712d8c2545fc0dd60cdc693704992e9d82eef/datafusion/expr/src/logical_plan/plan.rs#L2953C9-L2977C6

        let group_expr = enumerate_grouping_sets(group_expr)?;

        let is_grouping_set = matches!(group_expr.as_slice(), [Expr::GroupingSet(_)]);

        let grouping_expr: Vec<&Expr> = grouping_set_to_exprlist(group_expr.as_slice())?;

        let mut qualified_fields = exprlist_to_fields(grouping_expr, &input)?;
..
        qualified_fields.extend(exprlist_to_fields(aggr_expr.as_slice(), &input)?);

        let schema = DFSchema::new_with_metadata(
            qualified_fields,
            input.schema().metadata().clone(),
        )?;

        Self::try_new_with_schema(input, group_expr, aggr_expr, Arc::new(schema))
    }

PR adds the same step/logic to correctly retrieve all aggregation expressions when unproject columns. W/o this logic current unproject logic fails as schema does not match agg.group_exp + agg.aggr_expr (must be grouping_set_to_exprlist(agg.group_exp) + agg.aggr_expr) : can't get and then unwrap unprojected expression by index: https://github.com/apache/datafusion/blob/f54712d8c2545fc0dd60cdc693704992e9d82eef/datafusion/sql/src/unparser/utils.rs#L92

Concern

grouping_set_to_exprlist is used directly in find_agg_exp so invoked for each column to unproject (in case of grouping set). This could be optimized to do this only once. Decided to keep this logic this way as pre-calculating and passing such expressions as argument or changing find_agg_exp signature will make code less clean and grouping_set_to_exprlist does NOT create expressions copies (operates by Vec<&Expr>)

@sgrebnov sgrebnov changed the title Sgrebnov/improve unparsing grouping set Improve unparsing to handle aggregations with grouping set Oct 1, 2024
@sgrebnov sgrebnov self-assigned this Oct 1, 2024
@sgrebnov sgrebnov merged commit b15debb into spiceai-42 Oct 1, 2024
@sgrebnov sgrebnov deleted the sgrebnov/improve-unparsing-grouping-set branch October 1, 2024 23:45
@sgrebnov
Copy link
Author

sgrebnov commented Oct 1, 2024

Created PR to datafusion combining this and previous improvement. Merged this PR to unblock TPC-DS queries, will add required changes based on review/feedback if any:
apache#12705

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants