Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Audit WindowExec #388

Closed
nartal1 opened this issue Jul 17, 2020 · 6 comments
Closed

[FEA] Audit WindowExec #388

nartal1 opened this issue Jul 17, 2020 · 6 comments
Assignees
Labels
documentation Improvements or additions to documentation feature request New feature or request P0 Must have for release

Comments

@nartal1
Copy link
Collaborator

nartal1 commented Jul 17, 2020

Is your feature request related to a problem? Please describe.

  • Ensure Spark and Rapids plugin version of the exec match functionality.

  • Verify Config specific to the operator match.

  • Verify API is consistent and fully translated.

  • Port relevant tests.

@nartal1 nartal1 added feature request New feature or request ? - Needs Triage Need team to review and classify labels Jul 17, 2020
@kuhushukla kuhushukla removed the ? - Needs Triage Need team to review and classify label Jul 17, 2020
@sameerz sameerz added documentation Improvements or additions to documentation P0 Must have for release labels Jul 22, 2020
@nartal1
Copy link
Collaborator Author

nartal1 commented Jul 24, 2020

There is difference in API. Will sync up with @mythrocks to understand the missing parameters.

   case class WindowExec(
    windowExpression: Seq[NamedExpression],
    partitionSpec: Seq[Expression],
    orderSpec: Seq[SortOrder],
    child: SparkPlan)

These configs are in the tests of Spark test Suite-
Not required:
SQLConf.OPTIMIZER_EXCLUDED_RULES
SQLConf.WINDOW_EXEC_BUFFER_IN_MEMORY_THRESHOLD

@mythrocks
Copy link
Collaborator

Apologies for the delay in responding on this issue.

The reason for this variance is that these parameters were redundant. The partitionSpec, orderSpec, etc. were available from the Apache Spark WindowExpression class, via the WindowSpecDefinition. It is therefore available in GpuWindowExpressions, where the evaluation happens. I found that in GpuWindowExec was dropping those expressions on the floor, and extracting them in GpuWindowExpression.

Examining the code again, I see that Apache Spark uses it in the definition of WindowExec::requiredChildOrdering(). It doesn't seem to affect the execution of WindowExec, since this information is available in GpuWindowExpression.

@tgravescs, I'm not too familiar with requiredChildOrdering(). Would it be alright to ignore this API variance for now, until such a time as when the output column ordering is affected?

@nartal1
Copy link
Collaborator Author

nartal1 commented Jul 30, 2020

Thanks @mythrocks for detailed explanation. I did not know that the information is available from WindowExpression.

@tgravescs
Copy link
Collaborator

so I think the default requiredChildOrdering is going to be empty and the when Spark goes to make sure its requirements are fullfilled will always be true. It means that if the ordering got messed with at all during shuffle it won't fix it. I'm guessing that is never happening right now and I'm not sure off the top of my head a condition that would cause that.
I think it would be good for us to at least file an issue to investigate it further. If we can add requiredChildOrdering back I think it would be good in case it does hit some case. The only issue might be if databricks is different.

override def requiredChildOrdering: Seq[Seq[SortOrder]] =
Seq(partitionSpec.map(SortOrder(_, Ascending)) ++ orderSpec)

@mythrocks
Copy link
Collaborator

If we can add requiredChildOrdering back I think it would be good in case it does hit some case.

I have filed #486. I'll try add this back in the next sprint.

@nartal1
Copy link
Collaborator Author

nartal1 commented Jul 31, 2020

Thanks @mythrocks and @tgravescs . I am closing this as other things are audited and follow on issues are filed.

@nartal1 nartal1 closed this as completed Jul 31, 2020
pxLi pushed a commit to pxLi/spark-rapids that referenced this issue May 12, 2022
* FL server supports multi-run.

* Client support multi-Run.

* Enable the server aux send.

* Enable show_stats command.

* Made the HA to handle multi-Run.

* Clean up.

* Changed delete_run command.

* Fixed delete_run command.

* Fixed codestyle issue.

* Cleaned up.

* Cleaned up.

* remove no use import.

* Refactoried.

* Fixed codestyle.
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation feature request New feature or request P0 Must have for release
Projects
None yet
Development

No branches or pull requests

5 participants