-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workaround for Databricks using AQE even when disabled #6159
Conversation
Signed-off-by: Thomas Graves <tgraves@apache.org>
build |
1 similar comment
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any tests that we can add to verify that this is working? I know that #5981 is adding in some support for delta lake tests. Perhaps we can have a follow on issue to look into extending some of that for Databricks.
we don't have any current tests because we haven't been able to reproduce it consistently. I will file a followup though to see if we can narrow it down more to get a test added. |
#6171 filed |
databricks build is having timeout issues |
build |
fixes #6158
It seems that Databricks is using AQE for some Delta stats even when we disable it in Spark. We see this on Databricks 10.4 with certain queries using delta files.
To workaround this we look for ProjectExecs that have ScalaUDFs using a function that contains tahoe.Snapshot as that is what we have seen from the stack traces. We also check explicitly for AdaptiveSparkPlan. On Databricks, we don't support AQE right now so it shouldn't be used. In both cases we fallback to the CPU to run those operations.
Tested manually on queries that were failing and no longer see these failures. Checking with customer to validate as well.