Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Plan Properties caching #1

Closed
wants to merge 6 commits into from

Conversation

mustafasrepo
Copy link
Collaborator

@mustafasrepo mustafasrepo commented Feb 14, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

In great analysis by [MENTION NAME] at the issue 9084. [MENTION NAME] recognized that stack usage (depth) increases a lot during logical and physical planning. The root cause of aggressive stack usage in the logical planning is excessive use of .clone of LogicalPlan enum.

In physical planning the problem stems from recursive function calls in the getter APIs of the Arc<dyn ExecutionPlan>, such as EquivalenceProperties, output_partitioning, output_ordering, etc.

In the PR9084, [MENTION NAME] could reduce physical plan stack usage by caching equivalence_properties for ProjectionExec.

This PR introduces a new struct to cache PlanProperties (PlanPropertiesCache). This this struct, schema, output_partitioning, equivalence_properties, output_ordering is cached. This removes recursive calls during getter methods. Also, given .cache method is implemented, default implementations of the .output_partitioning, .equivalence_properties, output_ordering works.

With these changes flame graph for the query 54 convert from following graph
flamegraph_main_q54.svg.zip (github couldn't upload it, hence loaded as .zip)

to following graph

flamegraph_branch_q54 where stack usage decreases.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

api change

mustafasrepo pushed a commit that referenced this pull request Feb 16, 2024
…aTypes) (apache#8985)

* ScalarValue return types from argument values

* change file name

* try using ?Sized

* use Ok

* move method default impl outside trait

* Use type trait for ExprSchemable

* fix nit

* Proposed Return Type from Expr suggestions (#1)

* Improve return_type_from_args

* Rework example

* Update datafusion/core/tests/user_defined/user_defined_scalar_functions.rs

---------

Co-authored-by: Junhao Liu <junhaoliu2023@gmail.com>

* Apply suggestions from code review

Co-authored-by: Alex Huang <huangweijun1001@gmail.com>

* Fix tests + clippy

* rework types to use dyn trait

* fmt

* docs

* Apply suggestions from code review

Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>

* Add docs explaining what happens when both `return_type` and `return_type_from_exprs` are called

* clippy

* fix doc -- comedy of errors

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Alex Huang <huangweijun1001@gmail.com>
Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
mustafasrepo and others added 2 commits February 16, 2024 16:58
# Conflicts:
#	datafusion/physical-plan/src/streaming.rs
# Conflicts:
#	datafusion/physical-plan/src/projection.rs
@ozankabak
Copy link
Collaborator

Merged upstream.

@ozankabak ozankabak closed this Feb 29, 2024
mustafasrepo added a commit that referenced this pull request Mar 5, 2024
* refactor `TreeNode::rewrite()`

* use handle_tree_recursion in `Expr`

* use macro for transform recursions

* fix api

* minor fixes

* fix

* don't trust `t.transformed` coming from transformation closures, keep the old way of detecting if changes were made

* rephrase todo comment, always propagate up `t.transformed` from the transformation closure, fix projection pushdown closure

* Fix `TreeNodeRecursion` docs

* extend Skip (Prune) functionality to Jump as it is defined in https://synnada.notion.site/synnada/TreeNode-Design-Proposal-bceac27d18504a2085145550e267c4c1

* fix Jump and add tests

* jump test fixes

* fix clippy

* unify "transform" traversals using macros, fix "visit" traversal jumps, add visit jump tests, ensure consistent naming `f` instead of `op`, `f_down` instead of `pre_visit` and `f_up` instead of `post_visit`

* fix macro rewrite

* minor fixes

* minor fix

* refactor tests

* add transform tests

* add apply, transform_down and transform_up tests

* refactor tests

* test jump on both a and e nodes in both top-down and bottom-up traversals

* better transform/rewrite tests

* minor fix

* simplify tests

* add stop tests, reorganize tests

* fix previous merges and remove leftover file

* Review TreeNode Refactor (#1)

* Minor changes

* Jump doesn't ignore f_up

* update test

* Update rewriter

* LogicalPlan visit update and propagate from children flags

* Update tree_node.rs

* Update map_children's

---------

Co-authored-by: Mustafa Akur <mustafa.akur@synnada.ai>

* fix

* minor fixes

* fix f_up call when f_down returns jump

* simplify code

* minor fix

* revert unnecessary changes

* fix `DynTreeNode` and `ConcreteTreeNode` `transformed` and `tnr` propagation

* introduce TransformedResult helper

* fix docs

* restore transform as alias to trassform_up

* restore transform as alias to trassform_up 2

* Simplifications and comment improvements (#2)

---------

Co-authored-by: Berkay Şahin <124376117+berkaysynnada@users.noreply.github.com>
Co-authored-by: Mustafa Akur <mustafa.akur@synnada.ai>
Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
@mustafasrepo mustafasrepo deleted the feature/properties_caching branch March 27, 2024 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants