Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug where AQE does not respect entirePlanWillNotWork #6250

Merged
merged 2 commits into from
Aug 9, 2022

Conversation

andygrove
Copy link
Contributor

@andygrove andygrove commented Aug 6, 2022

Closes #6230

While testing some changes related to Delta Lake fallback, I discovered that entirePlanWillNotWork is not always respected by AQE because we weren't tagging the underlying plan, so during the planning for individual query stages, we could decide to run on GPU, and this can lead to invalid plans with mixed CPU/GPU usage that then fail to run.

For the example that I hit, the query failed with the following error due to an invalid transition being inserted. See the issue for more details.

java.lang.AssertionError: assertion failed
  at scala.Predef$.assert(Predef.scala:208)
  at org.apache.spark.sql.execution.ColumnarToRowExec.<init>(Columnar.scala:69)

This issue can only happen when spark.rapids.sql.detectDeltaLogQueries is enabled (because that is the only time we call entirePlanWillNotWork). This setting is currently enabled by default.

I have tried some different approaches to writing unit or integration tests for this change, but it is proving to be challenging, so I am open to suggestions on how to do this. I hit the error with some changes on another branch that I am no longer planning on merging.

Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove andygrove added this to the July 22 - Aug 5 milestone Aug 6, 2022
@andygrove andygrove self-assigned this Aug 6, 2022
@andygrove andygrove added the bug Something isn't working label Aug 6, 2022
@andygrove
Copy link
Contributor Author

build

1 similar comment
@andygrove
Copy link
Contributor Author

build

@sameerz sameerz requested a review from tgravescs August 8, 2022 06:14
tgravescs
tgravescs previously approved these changes Aug 8, 2022
Copy link
Collaborator

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we file a followup to try to find a test for this? I think we need to find better ways to test AQE in general

@andygrove
Copy link
Contributor Author

Can we file a followup to try to find a test for this? I think we need to find better ways to test AQE in general

Filed #6253

@andygrove
Copy link
Contributor Author

One downside to this fix is that it results in noisy logging

22/08/08 14:13:05 WARN GpuOverrides: Can't replace any part of this plan due to: Delta Lake metadata queries are not efficient on GPU,Delta Lake metadata queries are not efficient on GPU,Delta Lake metadata queries are not efficient on GPU,Delta Lake metadata queries are not efficient on GPU,Delta Lake metadata queries are not efficient on GPU,Delta Lake metadata queries are not efficient on GPU
22/08/08 14:13:05 WARN GpuOverrides: Can't replace any part of this plan due to: Delta Lake metadata queries are not efficient on GPU,Delta Lake metadata queries are not efficient on GPU,Delta Lake metadata queries are not efficient on GPU,Delta Lake metadata queries are not efficient on GPU,Delta Lake metadata queries are not efficient on GPU,Delta Lake metadata queries are not efficient on GPU
22/08/08 14:13:05 WARN GpuOverrides: Can't replace any part of this plan due to: Delta Lake metadata queries are not efficient on GPU,Delta Lake metadata queries are not efficient on GPU,Delta Lake metadata queries are not efficient on GPU,Delta Lake metadata queries are not efficient on GPU,Delta Lake metadata queries are not efficient on GPU
22/08/08 14:13:05 WARN GpuOverrides: Can't replace any part of this plan due to: Delta Lake metadata queries are not efficient on GPU,Delta Lake metadata queries are not efficient on GPU,Delta Lake metadata queries are not efficient on GPU,Delta Lake metadata queries are not efficient on GPU,Delta Lake metadata queries are not efficient on GPU
22/08/08 14:13:05 WARN GpuOverrides: Can't replace any part of this plan due to: Delta Lake metadata queries are not efficient on GPU
22/08/08 14:13:05 WARN GpuOverrides: Can't replace any part of this plan due to: Delta Lake metadata queries are not efficient on GPU,Delta Lake metadata queries are not efficient on GPU,Delta Lake metadata queries are not efficient on GPU,Delta Lake metadata queries are not efficient on GPU,Delta Lake metadata queries are not efficient on GPU

@tgravescs
Copy link
Collaborator

can we do more to limit it to when AQE is on? Or perhaps we can make the logging smarter. Since this seems to be needed for 22.08, I think the logging is ok being noisy but we can file a followup to improve in 22.10. Unless others have other ideas.

@andygrove
Copy link
Contributor Author

can we do more to limit it to when AQE is on? Or perhaps we can make the logging smarter. Since this seems to be needed for 22.08, I think the logging is ok being noisy but we can file a followup to improve in 22.10. Unless others have other ideas.

Yes, I think we can change getReasonsNotToReplaceEntirePlan to return a Set instead of a Seq. Testing this now.

@andygrove
Copy link
Contributor Author

Using Set instead of Seq makes the logging much more reasonable:

22/08/08 14:31:43 WARN GpuOverrides: Can't replace any part of this plan due to: Delta Lake metadata queries are not efficient on GPU
22/08/08 14:31:43 WARN GpuOverrides: Can't replace any part of this plan due to: Delta Lake metadata queries are not efficient on GPU
22/08/08 14:31:43 WARN GpuOverrides: Can't replace any part of this plan due to: Delta Lake metadata queries are not efficient on GPU

@andygrove
Copy link
Contributor Author

build

@andygrove andygrove merged commit 8a50bc5 into NVIDIA:branch-22.08 Aug 9, 2022
@andygrove andygrove deleted the entire-plan-aqe-fix branch August 9, 2022 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants