Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-4836][VL]Add support for WindowGroupLimitExec in gluten #5398

Merged
merged 3 commits into from
Apr 24, 2024

Conversation

ayushi-agarwal
Copy link
Contributor

What changes were proposed in this pull request?

There is a new operator named WindowGroupLimitExec introduced in spark 3.5 for optimizing window with rank like function with filter on it. apache/spark#38799. This PR maps this operator with row number function to TopNRowNumberNode in velox. Currently velox doesn't support TopN for rank and dense_rank. There is an issue opened in velox for it facebookincubator/velox#9404. Once that support gets added in velox this new operator in Gluten can be extended to support those.

(Partially Fixes: #4836)

How was this patch tested?

Added new UT

Copy link

#4836

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@ayushi-agarwal ayushi-agarwal marked this pull request as ready for review April 15, 2024 11:56
@ayushi-agarwal
Copy link
Contributor Author

@JkSelf Could you please help with review?

Copy link
Contributor

@JkSelf JkSelf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayushi-agarwal Thanks for your great work.

@@ -319,6 +319,15 @@ message WindowRel {
}
}

message WindowGroupLimitRel {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayushi-agarwal It seems that the only difference between WindowGroupLimitRel and WindowRel is the "limit" parameter? If that's the case, can we simply use WindowRel and pass the "limit" parameter through advanced_extension? This would prevent the introduction of a new operator in the Substrait code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JkSelf For now we are only supporting row number but later when we support rank and dense rank, then I was thinking that we would need an extra parameter for passing the function name, so I created the new operator. Shall that info also be passed in advanced_extension?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayushi-agarwal Yes, it can be included in the advanced_extension. However, it appears that the WindowGroupLimitRel is intended to map to the TopNRowNumber operator, not the Window operator. Therefore, it make sense to introduce a new operator in Substrait for this purpose. @zhztheplayer Do you have any input?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore, it make sense to introduce a new operator in Substrait for this purpose.

I am inclined to this approach, like adding a TopNRowNumberRel or something in proto. 1:1 mapping from Velox's operator to Substrait / Gluten operator is definitely encouraged, like discussed in #5409 (comment).

Copy link
Member

@zhztheplayer zhztheplayer Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make things fancier, we can even add a TopNRowNumberExec in backends-velox module then use a rule to convert Spark's WindowGroupLimitExec conditionally to TopNRowNumberExec. That would be the best method to solve this kind of operator-mapping problems.

However if you think WindowGroupLimitExec and TopNRowNumberRel don't have much semantic difference apart of the window function types they support, then we can do it without rules and it's also fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it as WindowGroupLimitRel and not TopNRowNumberRel so that later we can use the same for rank and denseRank.


override def requiredChildOrdering: Seq[Seq[SortOrder]] = {
if (BackendsApiManager.getSettings.requiredChildOrderingForWindowGroupLimit()) {
// Velox StreamingTopNRowNumber need to require child order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayushi-agarwal In Velox, the TopNRowNumber here will reorder the input channels based on the passed partition key and order by key here. Do we still need to set the sorting order in scala side? Moreover, it seems that this method is not used anywhere; can we remove it? If it is indeed necessary, can we construct it on the Velox native side based on the partition key and order by key?

Copy link
Contributor Author

@ayushi-agarwal ayushi-agarwal Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JkSelf In this case it will go in else part as TopNRowNumber does the sorting based on partition and order keys passed. But as spark already adds a sort operator, we don't need sorting to be done by TopNRowNumber operator as you mentioned it in point 3 here #4836 (comment). In that case we will need the if part. Please let me know if I understood it incorrectly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayushi-agarwal My previous understanding was incorrect. We indeed require this check to confirm that the input data for the Window operator is pre-sorted by the partition key and the order-by key.

@JkSelf
Copy link
Contributor

JkSelf commented Apr 22, 2024

@ayushi-agarwal Can you help to resolve the conflicts? Thanks.

Copy link

Run Gluten Clickhouse CI

@ayushi-agarwal
Copy link
Contributor Author

@ayushi-agarwal Can you help to resolve the conflicts? Thanks.

@JkSelf I have updated it.

@JkSelf
Copy link
Contributor

JkSelf commented Apr 23, 2024

@ayushi-agarwal Can you help to fix the following compile error in clickhouse backend?

17:47:31  [INFO] --- scala-maven-plugin:4.8.0:compile (scala-compile-first) @ backends-clickhouse ---
17:47:31  [WARNING] joda-time:joda-time/maven-metadata.xmlfailed to transfer from http://0.0.0.0/ during a previous attempt. This failure was cached in the local repository and resolution will not be reattempted until the update interval of maven-default-http-blocker has elapsed or updates are forced. Original error: Could not transfer metadata joda-time:joda-time/maven-metadata.xml from/to maven-default-http-blocker (http://0.0.0.0/): transfer failed for http://0.0.0.0/joda-time/joda-time/maven-metadata.xml
17:47:31  [INFO] /home/jenkins/agent/workspace/gluten/gluten-ci/gluten/backends-clickhouse/src/main/java:-1: info: compiling
17:47:31  [INFO] /home/jenkins/agent/workspace/gluten/gluten-ci/gluten/backends-clickhouse/src/main/delta-20:-1: info: compiling
17:47:31  [INFO] /home/jenkins/agent/workspace/gluten/gluten-ci/gluten/backends-clickhouse/src/main/scala:-1: info: compiling
17:47:31  [INFO] Compiling 122 source files to /home/jenkins/agent/workspace/gluten/gluten-ci/gluten/backends-clickhouse/target/scala-2.12/classes at 1713779251191
17:47:41  [ERROR] /home/jenkins/agent/workspace/gluten/gluten-ci/gluten2/backends-clickhouse/src/main/scala/org/apache/spark/sql/delta/catalog/ClickHouseTableV2.scala:312: error: not found: type Expression
17:47:41  [ERROR]       filterExprs: Seq[Expression]): Seq[InputPartition] = {
17:47:41  [ERROR]                        ^
17:47:41  [ERROR] /home/jenkins/agent/workspace/gluten/gluten-ci/gluten/backends-clickhouse/src/main/scala/org/apache/spark/sql/delta/catalog/ClickHouseTableV2.scala:312: error: not found: type Expression
17:47:41  [ERROR]       filterExprs: Seq[Expression]): Seq[InputPartition] = {
17:47:41  [ERROR]                        ^
17:47:41  [ERROR] one error found
17:47:41  [ERROR] one error found
17:47:41  [ERROR] exception compilation error occurred!!!

Copy link

Run Gluten Clickhouse CI

@ayushi-agarwal
Copy link
Contributor Author

ayushi-agarwal commented Apr 23, 2024

@JkSelf I have rebased it. The compile issue was fixed a day back ayushi-agarwal@b4641c9

Copy link
Contributor

@JkSelf JkSelf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your great work.

@JkSelf JkSelf merged commit 4e04d31 into apache:main Apr 24, 2024
39 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_5398_time.csv log/native_master_04_23_2024_fd2519a2b_time.csv difference percentage
q1 39.00 36.36 -2.645 93.22%
q2 22.05 24.01 1.962 108.90%
q3 39.51 36.90 -2.610 93.40%
q4 39.61 39.08 -0.525 98.68%
q5 68.66 69.94 1.281 101.87%
q6 7.35 7.58 0.227 103.09%
q7 82.73 86.14 3.408 104.12%
q8 84.85 85.13 0.278 100.33%
q9 125.02 122.27 -2.750 97.80%
q10 44.51 43.15 -1.359 96.95%
q11 19.76 21.01 1.254 106.35%
q12 27.74 27.87 0.129 100.46%
q13 55.63 55.25 -0.378 99.32%
q14 17.54 20.45 2.910 116.60%
q15 31.60 28.88 -2.719 91.39%
q16 14.07 14.54 0.478 103.40%
q17 103.78 106.73 2.955 102.85%
q18 145.44 147.26 1.813 101.25%
q19 14.43 13.90 -0.531 96.32%
q20 27.31 28.89 1.582 105.79%
q21 287.22 295.12 7.904 102.75%
q22 14.67 17.71 3.040 120.73%
total 1312.45 1328.16 15.704 101.20%

Preetesh2110 pushed a commit to Preetesh2110/incubator-gluten that referenced this pull request Apr 25, 2024
…he#5398)

* Add support for WindowGroupLimitExec in gluten

---------

Co-authored-by: ayushi agarwal <ayaga@microsoft.com>
@@ -495,6 +504,7 @@ message Rel {
GenerateRel generate = 17;
WriteRel write = 18;
TopNRel top_n = 19;
WindowGroupLimitRel windowGroupLimit = 20;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Substrait already has a field #20. This means that your Substrait is not going to be compatible with any other Substrait consumers or tools. The Substrait project is open to adding new relations -- all it takes is a PR, issue, or email to start the discussion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EpsilonPrime can you explain what you mean and if it's still an issue? Does any change to algebra.proto need to be updated in the Substrait project too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the this copy differs in about a half dozen ways from the original project. A protobuf saved using this version will be loaded incorrectly since DdlRel has field number 20. What should happen is that the differences introduced here get applied back to the main project. I've started the process of the CSV text format there. The Substrait project introduced ConsistentPartitionWindowRel a while back (as field number 17) which may actually do what you want here.

As long as you only talk to other consumers using this version of Substrait your code will work. But you're missing out on the other tools. For instance, the Substrait Validator is great for checking that you've constructed a conforming plan. I run all of the plans generated by my end to end tests through it and catches issues all the time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I'm working on a PR that also touches algebra.proto - https://github.com/apache/incubator-gluten/pull/5632/files#diff-632cd96e81af6938b453cb8ec69d66123284f17c6e47f3f16b8f08e8046afc39. Can the reconciliation happen in one shot? I can try to port these changes over with your guidance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VL] support partial window in Spark 3.5
6 participants