Workaround for Databricks using AQE even when disabled #6159

tgravescs · 2022-07-29T21:15:02Z

It seems that Databricks is using AQE for some Delta stats even when we disable it in Spark. We see this on Databricks 10.4 with certain queries using delta files.

To workaround this we look for ProjectExecs that have ScalaUDFs using a function that contains tahoe.Snapshot as that is what we have seen from the stack traces. We also check explicitly for AdaptiveSparkPlan. On Databricks, we don't support AQE right now so it shouldn't be used. In both cases we fallback to the CPU to run those operations.

Tested manually on queries that were failing and no longer see these failures. Checking with customer to validate as well.

Signed-off-by: Thomas Graves <tgraves@apache.org>

tgravescs · 2022-07-29T21:17:21Z

build

tgravescs · 2022-08-01T13:31:22Z

build

revans2

Are there any tests that we can add to verify that this is working? I know that #5981 is adding in some support for delta lake tests. Perhaps we can have a follow on issue to look into extending some of that for Databricks.

tgravescs · 2022-08-01T15:33:19Z

we don't have any current tests because we haven't been able to reproduce it consistently. I will file a followup though to see if we can narrow it down more to get a test added.

tgravescs · 2022-08-01T15:35:04Z

#6171 filed

tgravescs · 2022-08-01T18:17:50Z

databricks build is having timeout issues

tgravescs · 2022-08-01T18:18:18Z

build

tgravescs added 4 commits July 29, 2022 15:14

Databricks workaround AQE always on for Delta stats

431a898

Signed-off-by: Thomas Graves <tgraves@apache.org>

minor updates

622b02d

Only look for tahoe.Snapshot if DB

17087c4

cleanup

c55526b

tgravescs added the bug Something isn't working label Jul 29, 2022

tgravescs added this to the July 22 - Aug 5 milestone Jul 29, 2022

tgravescs self-assigned this Jul 29, 2022

tgravescs changed the title ~~Databricks using AQE even when disabled workaround [Databricks]~~ Workaround for Databricks using AQE even when disabled [Databricks] Jul 29, 2022

revans2 approved these changes Aug 1, 2022

View reviewed changes

tgravescs mentioned this pull request Aug 1, 2022

Add test for workaround for Databricks using AQE even when disabled #6171

Open

tgravescs changed the title ~~Workaround for Databricks using AQE even when disabled [Databricks]~~ Workaround for Databricks using AQE even when disabled Aug 1, 2022

tgravescs merged commit 852d0db into NVIDIA:branch-22.08 Aug 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workaround for Databricks using AQE even when disabled #6159

Workaround for Databricks using AQE even when disabled #6159

tgravescs commented Jul 29, 2022

tgravescs commented Jul 29, 2022

tgravescs commented Aug 1, 2022

revans2 left a comment

tgravescs commented Aug 1, 2022

tgravescs commented Aug 1, 2022

tgravescs commented Aug 1, 2022

tgravescs commented Aug 1, 2022

Workaround for Databricks using AQE even when disabled #6159

Workaround for Databricks using AQE even when disabled #6159

Conversation

tgravescs commented Jul 29, 2022

tgravescs commented Jul 29, 2022

tgravescs commented Aug 1, 2022

revans2 left a comment

Choose a reason for hiding this comment

tgravescs commented Aug 1, 2022

tgravescs commented Aug 1, 2022

tgravescs commented Aug 1, 2022

tgravescs commented Aug 1, 2022