-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16958] [SQL] Reuse subqueries within the same query #14548
Conversation
Test build #63389 has finished for PR 14548 at commit
|
@@ -502,15 +508,64 @@ case class OutputFakerExec(output: Seq[Attribute], child: SparkPlan) extends Spa | |||
|
|||
/** | |||
* Physical plan for a subquery. | |||
* | |||
* This is used to generate tree string for SparkScalarSubquery. | |||
*/ | |||
case class SubqueryExec(name: String, child: SparkPlan) extends UnaryExecNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A large part of this class is shared with BroadcastExchangeExec. Should we try to factor out common functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's OK to have some duplicated code here, over abstracted code is actually harder to read.
@davies this looks pretty good. I am very excited about the SparkPlan clean-up! |
@hvanhovell Had posted an picture, check it out. |
Test build #63560 has finished for PR 14548 at commit
|
Test build #63563 has finished for PR 14548 at commit
|
Cool picture! |
LGTM |
Merging it into master, thanks! |
## What changes were proposed in this pull request? this code come from PR: #11190, but this code has never been used, only since PR: #14548, Let's continue fix it. thanks. ## How was this patch tested? N / A Closes #23227 from heary-cao/unuseSparkPlan. Authored-by: caoxuewen <cao.xuewen@zte.com.cn> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@davies @hvanhovell @gatorsmile But in deed, the stage of same subquery execute maybe not once as following: |
@JkSelf can you file a JIRA ticket? |
@hvanhovell , Thanks for your help and I have filed Jira 26639. |
## What changes were proposed in this pull request? this code come from PR: apache#11190, but this code has never been used, only since PR: apache#14548, Let's continue fix it. thanks. ## How was this patch tested? N / A Closes apache#23227 from heary-cao/unuseSparkPlan. Authored-by: caoxuewen <cao.xuewen@zte.com.cn> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
There could be multiple subqueries that generate same results, we could re-use the result instead of running it multiple times.
This PR also cleanup up how we run subqueries.
For SQL query
The explain is
The visualized plan:
How was this patch tested?
Existing tests.