Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update explain format to show what will and will not run on the GPU #1624

Merged
merged 2 commits into from
Jan 29, 2021

Conversation

revans2
Copy link
Collaborator

@revans2 revans2 commented Jan 29, 2021

This fixes #906

Plans now look something like

    *Exec <ProjectExec> will run on GPU
      !Exec <FilterExec> cannot run on GPU because not all expressions can be replaced
        @Expression <And> (isnotnull(sum((ps_supplycost#95 * cast(ps_availqty#94L as double)))#1128) AND (sum((ps_supplycost#95 * cast(ps_availqty#94L as double)))#1128 > Subquery scalar-subquery#1122, [id=#14126])) could run on GPU
          @Expression <IsNotNull> isnotnull(sum((ps_supplycost#95 * cast(ps_availqty#94L as double)))#1128) could run on GPU
            @Expression <AttributeReference> sum((ps_supplycost#95 * cast(ps_availqty#94L as double)))#1128 could run on GPU
          @Expression <GreaterThan> (sum((ps_supplycost#95 * cast(ps_availqty#94L as double)))#1128 > Subquery scalar-subquery#1122, [id=#14126]) could run on GPU
            @Expression <AttributeReference> sum((ps_supplycost#95 * cast(ps_availqty#94L as double)))#1128 could run on GPU
            !NOT_FOUND <ScalarSubquery> Subquery scalar-subquery#1122, [id=#14126] cannot run on GPU because no GPU enabled version of expression class org.apache.spark.sql.execution.ScalarSubquery could be found

and

          *Exec <SortMergeJoinExec> will run on GPU
            #Exec <SortExec> could run on GPU but is going to be removed because removing SortExec as part replacing sortMergeJoin with shuffleHashJoin
              #Expression <SortOrder> s_suppkey#108L ASC NULLS FIRST could run on GPU but is going to be removed because parent plan is removed
              *Exec <ShuffleExchangeExec> will run on GPU
                *Partitioning <HashPartitioning> will run on GPU
                *Exec <FilterExec> will run on GPU
                  *Expression <IsNotNull> isnotnull(s_nationkey#111L) will run on GPU
                  *Exec <FileSourceScanExec> will run on GPU

Now

  • * indicates things that will run on the GPU
  • @ indicates things that could run on the GPU but will not because the Exec they are a part of will not run on the GPU
  • # indicates things that have been removed from the plan for some reason
  • ! indicates things that cannot run on the GPU

There is no real documentation for this format so I didn't update it. If this is something that we want to add I am happy to do it.

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
@revans2 revans2 added the feature request New feature or request label Jan 29, 2021
@revans2 revans2 added this to the Jan 18 - Jan 29 milestone Jan 29, 2021
@revans2 revans2 self-assigned this Jan 29, 2021
@revans2
Copy link
Collaborator Author

revans2 commented Jan 29, 2021

build

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, but should the user docs be updated to explain the meaning of the new symbols that are being added in the explain logging?

@revans2
Copy link
Collaborator Author

revans2 commented Jan 29, 2021

Looks good overall, but should the user docs be updated to explain the meaning of the new symbols that are being added in the explain logging?

I am happy to do it, but I am not sure where to do it. Like I said there is no documentation right now explaining the format.

I could update it in configs.md, but I am not sure that is the best place. The only other place it is mentioned is in tuning-guide.md

@tgravescs
Copy link
Collaborator

I would think putting something in FAQ section and then also in the Monitoring section of the getting started on prep guide would be useful. but having an explanation be in its own section somewhere and link to it from those places makes sense to me. Or maybe we need to make the monitoring section its own.

@jlowe
Copy link
Member

jlowe commented Jan 29, 2021

putting something in FAQ section

+1 for this. I think minimally the FAQ should have an entry for, "How do I tell if my query is running on the GPU?" which arguably is our most asked question. The FAQ also does mention explain in the first entry.

@revans2
Copy link
Collaborator Author

revans2 commented Jan 29, 2021

I added docs to the FAQ.

@revans2
Copy link
Collaborator Author

revans2 commented Jan 29, 2021

build

@revans2 revans2 merged commit 7d7363b into NVIDIA:branch-0.4 Jan 29, 2021
@revans2 revans2 deleted the better_explain branch January 29, 2021 20:00
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
…VIDIA#1624)

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
…VIDIA#1624)

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Clarify query explanation to directly state what will run on GPU
3 participants