Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify the default value of spark.rapids.sql.explain as NOT_ON_GPU #5819

Merged
merged 7 commits into from
Jun 24, 2022

Conversation

sinkinben
Copy link
Contributor

Signed-off-by: sinkinben sinkinben@outlook.com

  1. Set NOT_ON_GPU as the default value of "spark.rapids.sql.explain" in RapidsConf.scala.
  2. See [FEA] Set spark.rapids.sql.explain=NOT_ON_GPU by default #5702 .

@sinkinben
Copy link
Contributor Author

build

@sinkinben sinkinben self-assigned this Jun 14, 2022
firestarman
firestarman previously approved these changes Jun 14, 2022
tgravescs
tgravescs previously approved these changes Jun 14, 2022
Copy link
Collaborator

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the only other thing here is we may want to disable for our tests so the logs don't get much bigger. thoughts @revans2

Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not super concerned about the log size right now. But I have not tested it so I'm not totally sure what that impact would be.

I would like to see the docs updated though. The default getting started guide does not mention this config at all, and it might be confusing for users to see a WARNING message logged by default when it is mostly a performance issue.

Also there are several places in the logs that mention stetting this config, like https://github.com/NVIDIA/spark-rapids/blob/branch-22.08/docs/get-started/getting-started-on-prem.md where it says

If you want to see why an operation did not run on the GPU you can turn on the configuration: --conf spark.rapids.sql.explain=NOT_ON_GPU. A log message will then be emitted to the driver log as to why a Spark operation is not able to run on the GPU.

I think we should go through the docs and make sure anywhere that sql.explain is mentioned that we make sure it is updated to match and that in the getting started guide and FAQ we mention what these WARNING messages are all about and why they are set to WARN.

@sameerz sameerz added the task Work required that improves the product but is not user facing label Jun 15, 2022
@GaryShen2008
Copy link
Collaborator

@GaryShen2008
Copy link
Collaborator

I simply searched sql.explain in branch-22.08, I think all the places look good. Most of them mentioned to set ALL to get more detailed explain logs.

Signed-off-by: sinkinben <sinkinben@outlook.com>
@sinkinben
Copy link
Contributor Author

sinkinben commented Jun 16, 2022

@tgravescs I have checked the docs related with "sql.explain", it seems that there is no effect on most of them. Shall I add some description about spark.sql.explain in docs/getting-started.md?

revans2
revans2 previously approved these changes Jun 16, 2022
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to see getting started updated to talk about it, but that is minor and we can do it if we start to see questions about it.

Copy link
Collaborator

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a section on debugging in the getting started docs that references it: https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-on-prem.html#debugging
As far as adding more, it might be nice like Bobby said but I'm ok if we want to wait to see if users complain or its not clear.

Also I just realized that this will break the explain API logic in explainCatalystSQLPlan, where normally the explain API defaults to ALL unless its set in to something other than NONE in spark.rapids.sql.explain. So we need to update that to keep the default of ALL there.

…alue of rapids.sql.explain

Signed-off-by: sinkinben <sinkinben@outlook.com>
Signed-off-by: sinkinben <sinkinben@outlook.com>
Comment on lines 104 to 107
logWarning("spark.rapids.sql.explain is set to the default value 'NOT_ON_GPU' to print " +
"operations not executed on GPU. Set to 'ALL' to print the complete query plan, " +
"display whether each operation is placed on the GPU. Set to 'NONE' to suppress the " +
"diagnostics about the query placement on the GPU.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move it inside the if branch if (conf.isSqlEnabled && conf.isSqlExecuteOnGPU) { on L85 so it prints only if GPU execution is enabled (default).

And it should only be printed when the value for spark.rapids.sql.explain implies some output (this is an evolution from my prior review) to make it shorter.

How about around L85:

if (conf.isSqlEnabled && conf.isSqlExecuteOnGPU) {
  logWarning("RAPIDS Accelerator is enabled, to disable GPU " +
    s"support set `${RapidsConf.SQL_ENABLED}` to false.")

  if (conf.explain != 'NONE') {
    logWarning("spark.rapids.sql.explain is set to ${conf.explain}. Set to 'NONE' to suppress the " +
        "diagnostics logging about the query placement on the GPU.")
  }
} else ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the code here. Thanks.

Signed-off-by: sinkinben <sinkinben@outlook.com>
* Fix typo in `logPluginMode`.
* Add a function `isSQLExplainExplicitlySet` to judge rapids.sql.explain is explicitly set by the
users in `explainCatalystSQLPlan`.

Signed-off-by: sinkinben <sinkinben@outlook.com>
Signed-off-by: sinkinben <sinkinben@outlook.com>
Signed-off-by: sinkinben <sinkinben@outlook.com>
Copy link
Collaborator

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't force push your branch, it makes it hard to compare to the last set of comments to see what changed.

@tgravescs
Copy link
Collaborator

build

@sinkinben
Copy link
Contributor Author

sinkinben commented Jun 24, 2022

@tgravescs Is it fine to write commit messages like this?

Modify the default value of spark.rapids.sql.explain as NOT_ON_GPU (#5819)

* Add description about `spark.rapids.sql.explain` in `getting-started.md` and `FAQ.md`.
* Set the default value of `spark.rapids.sql.explain` to `NOT_ON_GPU`, both in spark2-sql-plugin and sql-plugin.
* Add log to notify the users that default value of `spark.rapids.sql.explain` is `NOT_ON_GPU` (in`RapidsPluginUtils`).
* Update the logic of `explainCatalystSQLPlan`.

Signed-off-by: sinkinben <sinkinben@outlook.com>

@sinkinben sinkinben merged commit 100c6af into NVIDIA:branch-22.08 Jun 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Set spark.rapids.sql.explain=NOT_ON_GPU by default
7 participants