[FEA] Audit WindowExec #388

nartal1 · 2020-07-17T15:26:29Z

Is your feature request related to a problem? Please describe.

Ensure Spark and Rapids plugin version of the exec match functionality.
Verify Config specific to the operator match.
Verify API is consistent and fully translated.
Port relevant tests.

nartal1 · 2020-07-24T21:20:57Z

There is difference in API. Will sync up with @mythrocks to understand the missing parameters.

   case class WindowExec(
    windowExpression: Seq[NamedExpression],
    partitionSpec: Seq[Expression],
    orderSpec: Seq[SortOrder],
    child: SparkPlan)

These configs are in the tests of Spark test Suite-
Not required:
SQLConf.OPTIMIZER_EXCLUDED_RULES
SQLConf.WINDOW_EXEC_BUFFER_IN_MEMORY_THRESHOLD

mythrocks · 2020-07-30T18:43:58Z

Apologies for the delay in responding on this issue.

The reason for this variance is that these parameters were redundant. The partitionSpec, orderSpec, etc. were available from the Apache Spark WindowExpression class, via the WindowSpecDefinition. It is therefore available in GpuWindowExpressions, where the evaluation happens. I found that in GpuWindowExec was dropping those expressions on the floor, and extracting them in GpuWindowExpression.

Examining the code again, I see that Apache Spark uses it in the definition of WindowExec::requiredChildOrdering(). It doesn't seem to affect the execution of WindowExec, since this information is available in GpuWindowExpression.

@tgravescs, I'm not too familiar with requiredChildOrdering(). Would it be alright to ignore this API variance for now, until such a time as when the output column ordering is affected?

nartal1 · 2020-07-30T22:41:29Z

Thanks @mythrocks for detailed explanation. I did not know that the information is available from WindowExpression.

tgravescs · 2020-07-31T16:07:01Z

so I think the default requiredChildOrdering is going to be empty and the when Spark goes to make sure its requirements are fullfilled will always be true. It means that if the ordering got messed with at all during shuffle it won't fix it. I'm guessing that is never happening right now and I'm not sure off the top of my head a condition that would cause that.
I think it would be good for us to at least file an issue to investigate it further. If we can add requiredChildOrdering back I think it would be good in case it does hit some case. The only issue might be if databricks is different.

override def requiredChildOrdering: Seq[Seq[SortOrder]] =
Seq(partitionSpec.map(SortOrder(_, Ascending)) ++ orderSpec)

mythrocks · 2020-07-31T18:35:35Z

If we can add requiredChildOrdering back I think it would be good in case it does hit some case.

I have filed #486. I'll try add this back in the next sprint.

nartal1 · 2020-07-31T21:11:43Z

Thanks @mythrocks and @tgravescs . I am closing this as other things are audited and follow on issues are filed.

* FL server supports multi-run. * Client support multi-Run. * Enable the server aux send. * Enable show_stats command. * Made the HA to handle multi-Run. * Clean up. * Changed delete_run command. * Fixed delete_run command. * Fixed codestyle issue. * Cleaned up. * Cleaned up. * remove no use import. * Refactoried. * Fixed codestyle.

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

nartal1 added feature request New feature or request ? - Needs Triage Need team to review and classify labels Jul 17, 2020

kuhushukla removed the ? - Needs Triage Need team to review and classify label Jul 17, 2020

sameerz added documentation Improvements or additions to documentation P0 Must have for release labels Jul 22, 2020

kuhushukla assigned nartal1 Jul 24, 2020

nartal1 mentioned this issue Jul 24, 2020

[FEA] Audit: Add tests for GpuWindowExec from Spark WindowExec #429

Open

mythrocks mentioned this issue Jul 31, 2020

[BUG] GpuWindowExec does not implement requiredChildOrdering #486

Closed

nartal1 closed this as completed Jul 31, 2020

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023

Update submodule cudf to ae1b581 (NVIDIA#388)

5ee598e

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Audit WindowExec #388

[FEA] Audit WindowExec #388

nartal1 commented Jul 17, 2020

nartal1 commented Jul 24, 2020 •

edited

Loading

mythrocks commented Jul 30, 2020

nartal1 commented Jul 30, 2020

tgravescs commented Jul 31, 2020

mythrocks commented Jul 31, 2020

nartal1 commented Jul 31, 2020

[FEA] Audit WindowExec #388

[FEA] Audit WindowExec #388

Comments

nartal1 commented Jul 17, 2020

nartal1 commented Jul 24, 2020 • edited Loading

mythrocks commented Jul 30, 2020

nartal1 commented Jul 30, 2020

tgravescs commented Jul 31, 2020

mythrocks commented Jul 31, 2020

nartal1 commented Jul 31, 2020

nartal1 commented Jul 24, 2020 •

edited

Loading