Refactor TreeNode recursions #7942

peter-toth · 2023-10-26T17:43:57Z

Which issue does this PR close?

This PR is a proof of concept to refactor TreeNode recursions and offer better alternatives to current tree visit and transform/rewrite functions. Currently the PR contains multiple realted ideas that can be splitted into smaller changes if any of those look reasonable for the community.

Rationale for this change

This PR introduces TreeNodeTransformer trait (to replace TreeNodeRewriter):
```
pub trait TreeNodeTransformer: Sized {
    /// The node type which is visitable.
    type Node: TreeNode;

    /// Invoked before any inner children or children of a node are modified.
    fn pre_transform(&mut self, node: &mut Self::Node) -> Result<TreeNodeRecursion>;

    /// Invoked after all inner children and children of a node are modified.
    fn post_transform(&mut self, node: &mut Self::Node) -> Result<TreeNodeRecursion>;
}
```
The main changes in the behavior of the TreeNodeTransformer compared to the old TreeNodeRewriter is that the pre_transform() and post_transform() methods are mutating the nodes in place (node: &mut Self::Node).
This change has the advantages over the value consuming and producing fn mutate(&mut self, node: Self::N) -> Result<Self::N> that the self mutating behaviour encourage developers to reuse the exsinting objects / memory allocations so as to write more effective transformation closures.
The current implementation of fn map_children<F>(self, transform: F) -> Result<Self> method of Expr is a good example of the issue:
https://github.com/apache/arrow-datafusion/blob/4578f3daeefd84305d3f055c243827856e2c036c/datafusion/expr/src/tree_node/expr.rs#L153-L425
An Expr tree uses Vecs and Boxes. The problem is that TreeNode.rewrite() call on an expresion tree basically creates a whole new tree regardless if any change was made to any node due to how transform_boxed(), transform_option_box(), transform_option_vec() and transform_vec() work.
As the type of the tree node can't change during rewrite and Rust prevents data races at compile time, an in place mutation seems more reasonable for such transformations. Also, the size of a reference to a tree node is usually smaller than the size of a node so deeper recursion can be supported with the same stack size.

Please note that not all TreeNode trees suffer from the above issue. E.g. LogicalPlan tree uses Vecs and Arcs. Cloning an Arc is cheap compared to a Box. Actaully in this case the proposed self mutating transform doesn't bring much improvement as an Arc can't be mutated and a new one needs to be created anyways, but Vecs can be reused.

Update: The above analysis about memory reuse is not correct. @sadboy showed in Perf: avoid unnecessary allocations when transforming Expr #8591 that current expression transformation functions do reuse memory due to Rust compiler optimizations.
But the suggested &mut Self::Node based transform functions still seem to make sense as Refactor TreeNode recursions #7942 (comment) and Refactor TreeNode recursions #7942 (comment) mini benchmarks show considerable performance improvement.
Let's wait for Benchmarks for planning queries #8638 to measure more concreate effects of the suggested.
This PR unifies the 2 recursion related enums (VisitRecursion and RewriteRecursion) as they are a bit confusing.
Currently VisitRecursion controls TreeNode.apply() and TreeNode.visit() and RewriteRecursion controls TreeNode.rewrite(). The Stop element of both behave differently as it fully stops the recursion in case of visit, but it doesn't do so in case of rewrite. Also, the Skip element prevents recursion into childrens in case of visit, but it doesn't in case of rewrite.
In this PR I'm proposing to use a new TreeNodeRecursion that can be used with both visit and transform/rewrite:
```
pub enum TreeNodeRecursion {
    /// Continue the visit to the next node.
    Continue,

    /// Prune the current subtree.
    /// If a preorder visit of a tree node returns [`TreeNodeRecursion::Prune`] then inner
    /// children and children will not be visited and postorder visit of the node will not
    /// be invoked.
    Prune,

    /// Stop recursion on current tree.
    /// If recursion runs on an inner tree then returning [`TreeNodeRecursion::Stop`] doesn't
    /// stop recursion on the outer tree.
    Stop,

    /// Stop recursion on all (including outer) trees.
    StopAll,
}
```
This PR also proposes to remove RewriteRecursion::Mutate as it doesn't seem to add any value. The pre_visit() method during rewrite could return the modified node (or mutate the node in place as suggested in 1.) and return how the recursion should continue.
This PR adds a new default value Nop to Expr enum.
This new expressions does nothing and will not occur in any valid plans but it is sometimes useful to be able to replace expressions to a dummy one. Please see Expr.unalias() for an example.

Update: This is not needed. See discussion: Refactor TreeNode recursions #7942 (comment)
This PR proposes to adds transform_down_with_payload(), transform_up_with_payload() and transform_with_payload() methods to TreeNodes to be able to propagate down/up additional payloads during transformation. These new methods make EnforceSorting, EnforceDistribution and similar rules much simpler as there is no need to create special tree nodes like SortPushDown and PlanWithKeyRequirements.

Update: This is idea is moved to issue: Get rid of special TreeNodes #8663 and PR: Transform with payload #8664

What changes are included in this PR?

This PR:

Adds TreeNodeTransformer and TreeNode.transform() method as a better alternative to TreeNodeRewriter and TreeNode.rewrite(). Some of the TreeNodeRewriter usages are refactored to TreeNodeTransformer as examples, the remaining occurances can be refactored in follow-up PRs if this PR gets accepted.
This PR modifies TreeNode.transform_up() and TreeNode.transform_down() methods to be self mutating ones and refactors a few usages as examples. (The old methods are still kept as TreeNode.transform_up_old() and TreeNode.transform_down_old()).
Adds TreeNodeRecursion enum to control tree recursions. Modifies TreeNode methods to use the new enum.

Are these changes tested?

Using exinsting tests.

Are there any user-facing changes?

No.

peter-toth · 2023-10-26T17:47:34Z

#5609 has been closed so I can open a new issue for the PR if needed.

alamb · 2023-10-27T21:13:01Z

It might help to open a new ticket with a description of what you hope to do, but if you already have a PR that is probably good enough to get feedback

Thank you for helping to make DataFusion better 🙏

peter-toth · 2023-10-27T21:28:46Z

Thanks @alamb! I will open a ticket and describe my goals. I will also update this PR with a few fixes in a few days before review can start.
One question, is there a code style guide for this project?

alamb · 2023-10-29T11:38:42Z

One question, is there a code style guide for this project?

@peter-toth what we have is here: https://arrow.apache.org/datafusion/contributor-guide/index.html

I would say we follow clippy and rustfmt and in general try to use a style consistent with existing code, but there is nothing more formal that I know of

peter-toth · 2023-12-15T14:56:58Z

@alamb, I haven't created any issues yet, but wanted to put together a POC PR with some of the changes I would like to propose.

I think this is a bit related to #7775 and to this planning performance epic: #5637.

If any of the above make sense I'm happy to create dedicated issues and then update or split this PR.

alamb · 2023-12-18T19:17:15Z

Thanks @peter-toth -- I will try and review this proposal later today or tomorrow

… `TreeNode`s and use them in a few examples - add `transform_down_with_payload()`, `transform_up_with_payload()`, `transform_with_payload()` and use it in `EnforceSorting` as an example

alamb

Thank you very much for this PR @peter-toth. While I likely did not grok all the nuances of this PR and its implications, I really like where it is headed and.

Thoughts about API breakages

In so far that we can minimize or spread out in time the breaking API changes required I think that would help roll out these changes in a way for users accept this change. I also think we can take most/all of the ideas in this PR and minimize the breaking changes

Some potential ways to keep the API change smaller:

keep transform_up called transform_up rather than renaming it to transform_up_old
Typedef let type VisitRecusion = TreeNodeRecursion so users don't have to change

Things I am not not sure about

I am not sure about introducing Expr::Nop -- the thinking is that then one has to check at runtime that no Expr::Nop is left, rather than using Option<Expr> where you can have the compiler check for you

I think it would be good to get some feeback from the broader community as well

cc @liukun4515 and @yahoNanJing who were instrumental in implementing the current system. cc @sadboy who has been looking into improving planning performance as well - this could be part of the story

@mustafasrepo and @metesynnada and @crepererum perhaps you have insights to share as well

alamb · 2023-12-19T18:30:25Z

datafusion/core/src/physical_optimizer/combine_partial_final_agg.rs

-                                        AggregateMode::Partial
-                                    ) && can_combine(
+        plan.transform_down(&mut |plan| {
+            plan.clone()


it is certainly nice to avoid the clone

alamb · 2023-12-19T18:34:01Z

datafusion/core/src/physical_optimizer/sort_pushdown.rs

-/// computational cost by pushing down `SortExec`s through some executors.
-///
-/// [`EnforceSorting`]: crate::physical_optimizer::enforce_sorting::EnforceSorting
-#[derive(Debug, Clone)]


This was inlined into pushdown_sort via transform_down_with_payload which I think is a nice change 👍

alamb · 2023-12-19T18:35:00Z

datafusion/expr/src/expr.rs

 pub enum Expr {
+    #[default]


Can you explain a bit about the need / usecase for Expr::Nop? Could the same be accomplished with Option<Expr>?

The reason why I added Expr::Nop is to be able to refactor unalias() to be self mutating one: https://github.com/apache/arrow-datafusion/pull/7942/files#diff-204cfc4f999c3d12dc065f323cb952fb0ecb33c5570eed8dc1fb52b806e87004R960. I needed a dummy Expr for mem::take(), but as you showed in https://github.com/apache/arrow-datafusion/pull/7942/files#r1431799873 unalias() doens't need to be self mutating.

But, maybe having a default dummy value of Expr is still useful in some cases, like in @sadboy's PR: https://github.com/apache/arrow-datafusion/pull/8591/files#diff-6515fda3c67b9d487ab491fd21c27e32c411a8cbee24725d737290d19c10c199R388-R389, https://github.com/apache/arrow-datafusion/pull/8591/files#diff-6515fda3c67b9d487ab491fd21c27e32c411a8cbee24725d737290d19c10c199R427-R428, Expr::Wildcard { qualifier: None } is used for such purposes.

Clarification: is nop "no op(eration)"? If so, could we use the more industry standard Noop? A quick google search seems to be evidence for it's prevalence:

https://www.google.com/search?client=firefox-b-1-d&q=abbreviate+no+operation

Yes, that was my intention. Noop sounds good to me.

needed a dummy Expr

If it's just for this purpose, then I think the Null literal should serve it well enough:

impl Default for Expr { fn default() -> Self { Expr::Literal(ScalarValue::Null) } }

As @alamb mentioned above, there is a (relatively high) cost to introducing new Expr variants, as it increases potential invalid states along every step of the analyzer/optimizer pipeline.

Using Null literal sounds good to me.

Fixed in 6cd5d39.

alamb · 2023-12-19T18:35:52Z

datafusion/expr/src/expr.rs

-        match self {
-            Expr::Alias(alias) => alias.expr.as_ref().clone(),
-            _ => self,
+    pub fn unalias(&mut self) -> &mut Self {


Nice spot -- I think we can avoid a copy like this too: #8588

alamb · 2023-12-19T18:41:17Z

datafusion/expr/src/logical_plan/plan.rs

-    /// children.
-    pub fn inspect_expressions<F, E>(self: &LogicalPlan, mut f: F) -> Result<(), E>
+    /// Apply `f` on expressions of the plan node.
+    /// `f` is not allowed to return [`TreeNodeRecursion::Prune`].


why can't it return Prune?

Here we basically iterate over the expressions in a logical plan tree node and apply f on each. Those expressions are the root nodes of expressions trees and the trees have no connection with each other. (Maybe we can think of them as siblings?)

So actually, I'm a bit uncertain about what should we do if f returns Prune here. (Other TreeNodeRecursion elements are clear how to proceed with.) Shall we handle Prune as Continue and proceed to the next expression?

alamb · 2023-12-19T18:44:23Z

datafusion/expr/src/logical_plan/plan.rs

+                    .node
+                    .expressions()
+                    .iter()
+                    .for_each_till_continue(f)


this for_each_till_continue is an interesting concept

I was trying define clear APIs on TreeNodes. After f4d28e0 we have

visit(), visit_down(),

transform(), transform_down(), transform_up(),

transform_with_payload(), transform_down_with_payload() and transform_up_with_payload()

functions on TreeNode and all can be controlled with TreeNodeRecursion.

alamb · 2023-12-19T18:50:31Z

datafusion/common/src/tree_node.rs

+    /// If a preorder visit of a tree node returns [`TreeNodeRecursion::Prune`] then inner
+    /// children and children will not be visited and postorder visit of the node will not
+    /// be invoked.
+    Prune,


Is this equivalent to RewriteRecursion::Skip? If so, perhaps we can use the same terminology

We can keep Skip if you prefer. (To me Prune better describes that children should not be visited.)

alamb · 2023-12-19T18:52:08Z

datafusion/common/src/tree_node.rs

+}
+
+impl TreeNodeRecursion {
+    pub fn and_then_on_continue<F>(self, f: F) -> Result<TreeNodeRecursion>


These are neat helpers, it would be useful to document their intended usecases if possible

Added comment to and_then_on_continue() this in 8882285. I will add more details and comments to other helpers later. Let's see first if we need fail_on_prune() at all in #7942 (comment).

alamb · 2023-12-19T18:53:23Z

datafusion/common/src/tree_node.rs

+        })
+    }
+
+    pub fn fail_on_prune(self) -> Result<TreeNodeRecursion> {


I don't understand the usecase for this method -- if there is going to be a panic, perhaps it would be clearer to put the check directly at the callsite with a explination for why the situation warrants a panic

sadboy · 2023-12-20T06:14:44Z

Rust is somehow smart enough to optimize away the memory allocations already.

Playing around in godbolt pointed me to this -- https://doc.rust-lang.org/src/alloc/vec/in_place_collect.rs.html :)

peter-toth · 2023-12-20T10:40:26Z

I wrote a small benchmark test:

#[cfg(test)]
mod test {
    use crate::{and, lit, Expr};
    use datafusion_common::tree_node::{Transformed, TreeNode, TreeNodeRecursion};
    use std::time::Instant;

    fn create_and_tree(level: u32) -> Expr {
        if level == 0 {
            lit(true)
        } else {
            and(create_and_tree(level - 1), create_and_tree(level - 1))
        }
    }

    #[test]
    fn transform_test() {
        let now = Instant::now();
        let mut and_tree = create_and_tree(25);
        println!("create_and_tree: {}", now.elapsed().as_millis());

        let now = Instant::now();
        and_tree = and_tree
            .transform_down_old(&mut |e| Ok(Transformed::No(e)))
            .unwrap();
        println!("and_tree.transform_down_old: {}", now.elapsed().as_millis());

        let now = Instant::now();
        let mut and_tree_clone = and_tree.clone();
        println!("and_tree.clone: {}", now.elapsed().as_millis());

        let now = Instant::now();
        and_tree_clone
            .transform_down(&mut |_e| Ok(TreeNodeRecursion::Continue))
            .unwrap();
        println!(
            "and_tree_clone.transform_down: {}",
            now.elapsed().as_millis()
        );

        println!("results: {}", and_tree == and_tree_clone);

        let now = Instant::now();
        and_tree = and_tree
            .transform_down_old(&mut |e| match e {
                Expr::Literal(_) => Ok(Transformed::Yes(lit(false))),
                o => Ok(Transformed::No(o)),
            })
            .unwrap();
        println!(
            "and_tree.transform_down_old 2: {}",
            now.elapsed().as_millis()
        );

        let now = Instant::now();
        and_tree_clone
            .transform_down(&mut |e| match e {
                Expr::Literal(_) => {
                    *e = lit(false);
                    Ok(TreeNodeRecursion::Continue)
                }
                o => Ok(TreeNodeRecursion::Continue),
            })
            .unwrap();
        println!(
            "and_tree_clone.transform_down 2: {}",
            now.elapsed().as_millis()
        );

        println!("results: {}", and_tree == and_tree_clone);
    }
}

available here: https://github.com/peter-toth/arrow-datafusion/commits/refactor-treenode-benchmark/ and run it with --release as cargo test --color=always --lib tree_node::expr::test::transform_test --release -- --show-output and this is what I got :

---- tree_node::expr::test::transform_test stdout ----
create_and_tree: 8912
and_tree.transform_down_old: 6129
and_tree.clone: 12670
and_tree_clone.transform_down: 2137
results: true
and_tree.transform_down_old 2: 6507
and_tree_clone.transform_down 2: 2734
results: true

So transform_down() seems to be 2.5-3x times faster than transform_down_old().
The above results already contain @sadboy's #8591 improvement to the current code (transform_down_old() in this PR).
I'm failry new to Datafusion and Rust so please let me know if you would suggest a different benchmark.

Dandandan · 2023-12-20T12:16:25Z

I like where this is going 🚀

I suggest to also add some benchmarking. We could take for example TCP-H and TCP-DS (which we already have in the benchmarks / tests) and benchmark the time it takes to plan/optimize the queries rather than execute them. It seems it might not be much work adding an option to the benchmark code to only perform the planning rather than executing the queries.

…visit()`, `visit_down()`, `transform()`, `transform_down()`, `transform_up()`, `transform_with_payload()`, `transform_down_with_payload()` and `transform_up_with_payload()` functions on `TreeNode`, others can be deprecated and removed once no longer used

…yload()` in its pre-order transform (`f_down`) function

peter-toth · 2023-12-20T16:14:21Z

I like where this is going 🚀

I suggest to also add some benchmarking. We could take for example TCP-H and TCP-DS (which we already have in the benchmarks / tests) and benchmark the time it takes to plan/optimize the queries rather than execute them. It seems it might not be much work adding an option to the benchmark code to only perform the planning rather than executing the queries.

I like this idea, but I'm not sure that this PR itself can bring much improvement yet. This PR only refactors a few transform/rewrite operations but the old methods are still kept and used at many places.
Also, some trees like LogicalPlan uses Arcs and their new in place mutation method (transform_children() in this PR: https://github.com/apache/arrow-datafusion/pull/7942/files#diff-9619441d9605f143a911319cea75ae5192e6c5b17acfcbc17a3c73a9e32a8e61R62-R80) is not yet better than their old map_children() (https://github.com/apache/arrow-datafusion/pull/7942/files#diff-9619441d9605f143a911319cea75ae5192e6c5b17acfcbc17a3c73a9e32a8e61R39) is.
Actually I'm not sure yet it's possible to do in place mutation on Arcs at all.

BTW, anyone can explain me why are this difference between TreeNodes in Datafusion? Why do some of them use Boxs but others use Arcs? Do we share subtrees between threads?

# Conflicts: # datafusion/expr/src/expr.rs # datafusion/expr/src/utils.rs

…ult for `Expr`

peter-toth · 2023-12-21T20:45:40Z

I've updated the https://github.com/peter-toth/arrow-datafusion/commits/refactor-treenode-benchmark/ with a LogicalPlan benchmark so as to see how the proposed PR affects trees with Arcs.

The benchmark is very similar to the previous Expr based one but uses Union and EmptyRelation to build up a tree: peter-toth@6d8ad17#diff-9619441d9605f143a911319cea75ae5192e6c5b17acfcbc17a3c73a9e32a8e61R83-R153

It is interresting to see that new transform_down() is better than the old transform_down_old() on LogicalPlan trees as well, but the improvement is not that significant:

---- tree_node::plan::test::transform_test stdout ----
create_union_tree: 8481
union_tree.transform_down_old: 6406
union_tree.clone: 0
union_tree_clone.transform_down: 3861
results: true
union_tree.transform_down_old 2: 11855
union_tree_clone.transform_down 2: 9479
results: true
results: false

I think the key takeaway of these 2 benchmarks is that I had to scale down the LogicalPlan based one run on a 23 height binary tree (peter-toth@75cde35#diff-9619441d9605f143a911319cea75ae5192e6c5b17acfcbc17a3c73a9e32a8e61R102) vs the Expr based one that ran on 25 height tree (peter-toth@75cde35#diff-6515fda3c67b9d487ab491fd21c27e32c411a8cbee24725d737290d19c10c199R498) to get roughly similar transform down numbers. I think this is the cost of using Arcs vs. Boxes and the fact that we can't mutate in place. (Although there might be good reasons for using Arcs, please see my question above.)

alamb · 2023-12-23T11:33:19Z

I think this is the cost of using Arcs vs. Boxes and the fact that we can't mutate in place. (Although there might be good reasons for using Arcs, please see my question above.)

I think the particular choices of Boxs vs Arcs does not have a well thought out rationale or if there is one I do not know of one.

alamb · 2023-12-23T12:00:09Z

Here is my suggestion in how to proceed with this PR

Create some basic end to end planning performance benchmarks (I elaborated on @Dandandan 's idea Benchmarks for planning queries #8638 Refactor TreeNode recursions #7942 (comment))
Use that information to guide which part(s) of this PR are the most valuable for increasing performance.

@sadboy, do you have any benchmarks you could share that model your existing workload?

+1 to the importance of this -- our workloads involve lots of analysis/transformations on the Datafusion LogicalPlan, so any perf improvements in this department would be extremely beneficial to us.

It would be great if there's some kind of benchmark to demonstrate the concrete effects of this change -- perf-related impacts can often times be counter-intuitive and surprising.

100% agree

ozankabak · 2023-12-23T12:17:19Z

I like this general effort and we will be happy to help. The main challenge I see is that this touches many files and procedures, and we may lose/break certain behaviors that are not adequately tested. Therefore, IMO it makes sense to first clean-up some of the tree traversal logic in our planner/optimization rules as a stepping stone to this.

We will submit a cleanup PR early next week to simplify around half of the usages in physical planning/optimization so the job here will be easier.

sadboy · 2023-12-23T17:17:03Z

@sadboy, do you have any benchmarks you could share that model your existing workload?

Not readily, ours is all production queries that we can not share. But I can certainly synthesize some test cases from the more "pathological" cases we've encountered, e.g. large WITHs, deep nested IFs, 1000+ columns, etc. Would be a good augment to what you described in #8638.

peter-toth · 2023-12-27T15:18:14Z

Thanks you all for the feedbacks! I've updated the PR description with the lastest findings.

With 1. and 2. let's wait for the new benchmarks Benchmarks for planning queries #8638 and the TreeNode cleanup PR: TreeNode Refactor Part 2 #8653.
I discarded 3. as adding Expr::Nop isn't needed. We can use use Expr::Literal(ScalarValue::Null) where a dummy Expr is required: Refactor TreeNode recursions #7942 (comment)
I extracted 4. into an issue: Get rid of special TreeNodes #8663 and opened a PR for it: Transform with payload #8664

…ransform_up_with_payload` related changes

# Conflicts: # datafusion/common/src/tree_node.rs # datafusion/core/src/datasource/physical_plan/parquet/row_groups.rs # datafusion/core/src/physical_optimizer/enforce_distribution.rs # datafusion/core/src/physical_optimizer/enforce_sorting.rs # datafusion/core/src/physical_optimizer/pipeline_checker.rs # datafusion/core/src/physical_optimizer/replace_with_order_preserving_variants.rs # datafusion/core/src/physical_optimizer/sort_pushdown.rs # datafusion/expr/src/tree_node/expr.rs # datafusion/expr/src/tree_node/plan.rs # datafusion/optimizer/src/analyzer/count_wildcard_rule.rs # datafusion/optimizer/src/analyzer/type_coercion.rs # datafusion/optimizer/src/push_down_filter.rs # datafusion/physical-expr/src/equivalence.rs # datafusion/physical-expr/src/sort_properties.rs # datafusion/physical-expr/src/utils/mod.rs

github-actions · 2024-04-24T01:46:15Z

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules core Core DataFusion crate labels Oct 26, 2023

peter-toth marked this pull request as draft October 27, 2023 09:31

peter-toth force-pushed the refactor-treenode-apply branch from 2eaf118 to 26b4811 Compare October 28, 2023 14:44

peter-toth force-pushed the refactor-treenode-apply branch 5 times, most recently from 606a6b0 to 36e36fd Compare December 15, 2023 11:44

peter-toth changed the title ~~Refactor TreeNode::apply and its relatives~~ Refactor TreeNode recursions Dec 16, 2023

This was referenced Dec 16, 2023

DataFusion weekly project plan (Andrew Lamb) - Dec 11, 2023 #8490

Closed

DataFusion weekly project plan (Andrew Lamb) - Dec 18, 2023 #8577

Closed

POC

c0990de

peter-toth force-pushed the refactor-treenode-apply branch from 36e36fd to cca3135 Compare December 19, 2023 13:01

github-actions bot added the sql SQL Planner label Dec 19, 2023

- refactor transform_down() and transform_up() to work on mutable…

5c61470

… `TreeNode`s and use them in a few examples - add `transform_down_with_payload()`, `transform_up_with_payload()`, `transform_with_payload()` and use it in `EnforceSorting` as an example

peter-toth force-pushed the refactor-treenode-apply branch 2 times, most recently from 5044144 to 09bbb6b Compare December 19, 2023 15:53

- refactor EnforceDistribution using transform_down_with_payload()

8fa80e7

peter-toth force-pushed the refactor-treenode-apply branch from 09bbb6b to 8fa80e7 Compare December 19, 2023 16:10

alamb mentioned this pull request Dec 19, 2023

Minor: avoid a copy in Expr::unalias #8588

Merged

alamb reviewed Dec 19, 2023

View reviewed changes

peter-toth added 3 commits December 20, 2023 15:52

fix transform_with_payload() to behave like `transform_down_with_pa…

9279c6a

…yload()` in its pre-order transform (`f_down`) function

add docs

8882285

Merge remote-tracking branch 'origin/main' into refactor-treenode-apply

aa333d1

# Conflicts: # datafusion/expr/src/expr.rs # datafusion/expr/src/utils.rs

peter-toth force-pushed the refactor-treenode-apply branch from a9caddf to aa333d1 Compare December 20, 2023 16:32

peter-toth added 2 commits December 20, 2023 17:41

remove Expr::Nop, define Expr::Literal(ScalarValue::Null) as defa…

6cd5d39

…ult for `Expr`

fix docs

25b75bb

alamb mentioned this pull request Dec 23, 2023

[Epic] A collection of issues to improve planning performance / speed / efficiency #5637

Open

15 tasks

alamb mentioned this pull request Dec 23, 2023

Benchmarks for planning queries #8638

Closed

berkaysynnada mentioned this pull request Dec 25, 2023

TreeNode Refactor Part 2 #8653

Merged

This was referenced Dec 27, 2023

Get rid of special TreeNodes #8663

Closed

Transform with payload #8664

Closed

peter-toth mentioned this pull request Dec 29, 2023

Cleanup TreeNode implementations #8672

Merged

peter-toth added 2 commits January 2, 2024 11:28

revert transform_with_payload, transform_down_with_payload and `t…

9e13bea

…ransform_up_with_payload` related changes

peter-toth mentioned this pull request Jan 17, 2024

Consolidate TreeNode transform and rewrite APIs #8891

Merged

peter-toth mentioned this pull request Mar 25, 2024

[EPIC] Stop copying LogicalPlan during OptimizerPasses #9637

Closed

31 tasks

github-actions bot added the Stale PR has not had any activity for some time label Apr 24, 2024

peter-toth closed this Apr 24, 2024

Refactor TreeNode recursions #7942

Refactor TreeNode recursions #7942

Conversation

peter-toth commented Oct 26, 2023 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

peter-toth commented Oct 26, 2023

alamb commented Oct 27, 2023

peter-toth commented Oct 27, 2023

alamb commented Oct 29, 2023

peter-toth commented Dec 15, 2023

alamb commented Dec 18, 2023

alamb left a comment

Choose a reason for hiding this comment

Thoughts about API breakages

Things I am not not sure about

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peter-toth Dec 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peter-toth Dec 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peter-toth Dec 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sadboy commented Dec 20, 2023

peter-toth commented Dec 20, 2023 • edited Loading

Dandandan commented Dec 20, 2023

peter-toth commented Dec 20, 2023 • edited Loading

peter-toth commented Dec 21, 2023 • edited Loading

alamb commented Dec 23, 2023

alamb commented Dec 23, 2023

ozankabak commented Dec 23, 2023

sadboy commented Dec 23, 2023

peter-toth commented Dec 27, 2023

github-actions bot commented Apr 24, 2024

peter-toth commented Oct 26, 2023 •

edited

Loading

peter-toth Dec 20, 2023 •

edited

Loading

peter-toth Dec 20, 2023 •

edited

Loading

peter-toth Dec 20, 2023 •

edited

Loading

peter-toth commented Dec 20, 2023 •

edited

Loading

peter-toth commented Dec 20, 2023 •

edited

Loading

peter-toth commented Dec 21, 2023 •

edited

Loading