[BUG] Some queries fail when cost-based optimizations are enabled #1899

andygrove · 2021-03-09T01:54:46Z

Describe the bug
With the experimental cost-based optimizer enabled, 23 of the NDS queries fail due to inconsistent joins (incompatible mix of CPU/GPU operators).

The queries that fail are q7, q9, q26, q27, q28, q30, q32, q36, q44, q59, q81, q92, q1, q6, q10, q54, q85, q94, q11, q13, q16, q23a, q35

The text was updated successfully, but these errors were encountered:

andygrove · 2021-03-09T19:17:08Z

q6 fails with this when running against Spark 3.1.1 but works with Spark 3.0.2 (with AQE and RAPIDS CBO enabled in both cases)

java.util.NoSuchElementException: key not found: numPartitions
        at scala.collection.immutable.Map$EmptyMap$.apply(Map.scala:101)
        at scala.collection.immutable.Map$EmptyMap$.apply(Map.scala:99)
        at org.apache.spark.sql.execution.adaptive.CustomShuffleReaderExec.sendDriverMetrics(CustomShuffleReaderExec.scala:122)
        at org.apache.spark.sql.execution.adaptive.CustomShuffleReaderExec.shuffleRDD$lzycompute(CustomShuffleReaderExec.scala:182)
        at org.apache.spark.sql.execution.adaptive.CustomShuffleReaderExec.shuffleRDD(CustomShuffleReaderExec.scala:181)
        at org.apache.spark.sql.execution.adaptive.CustomShuffleReaderExec.doExecuteColumnar(CustomShuffleReaderExec.scala:196)

andygrove · 2021-03-10T01:16:58Z

The q6 error above was misleading. There is a regression in Spark 3.1.1 with error handling related to executing on a canonicalized plan. I filed https://issues.apache.org/jira/browse/SPARK-34682.

andygrove · 2021-03-10T18:57:54Z

Most of these failures are due to a single issue. CBO is sometimes forcing a GPU CustomShuffleReaderExec back onto CPU, making it incompatible with the GPU shuffle that already happened.

sameerz · 2021-03-21T16:23:36Z

@andygrove is this resolved with #1910 ?

andygrove · 2021-03-21T16:55:55Z

@andygrove is this resolved with #1910 ?

@sameerz No, but it is resolved by #1954

andygrove added bug Something isn't working ? - Needs Triage Need team to review and classify labels Mar 9, 2021

andygrove added this to the Mar 1 - Mar 12 milestone Mar 9, 2021

andygrove self-assigned this Mar 9, 2021

andygrove changed the title ~~[BUG] Some queries fail due to inconsistent joins when cost-based optimizations are enabled~~ [BUG] Some queries fail when cost-based optimizations are enabled Mar 9, 2021

sameerz removed the ? - Needs Triage Need team to review and classify label Mar 9, 2021

This was linked to pull requests Mar 11, 2021

Fix bug where cost-based optimizer was forcing GPU CustomShuffleReaderExec onto CPU #1911

Closed

Make hash partitioning match CPU #1910

Merged

sameerz modified the milestones: Mar 1 - Mar 12, Mar 15 - March 26 Mar 15, 2021

sameerz linked a pull request Mar 22, 2021 that will close this issue

Fix CBO bug where incompatible plans were produced with AQE on #1954

Merged

andygrove modified the milestones: Mar 15 - March 26, Mar 30 - Apr 9 Mar 29, 2021

andygrove closed this as completed in #1954 Apr 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Some queries fail when cost-based optimizations are enabled #1899

[BUG] Some queries fail when cost-based optimizations are enabled #1899

andygrove commented Mar 9, 2021

andygrove commented Mar 9, 2021

andygrove commented Mar 10, 2021

andygrove commented Mar 10, 2021

sameerz commented Mar 21, 2021

andygrove commented Mar 21, 2021

[BUG] Some queries fail when cost-based optimizations are enabled #1899

[BUG] Some queries fail when cost-based optimizations are enabled #1899

Comments

andygrove commented Mar 9, 2021

andygrove commented Mar 9, 2021

andygrove commented Mar 10, 2021

andygrove commented Mar 10, 2021

sameerz commented Mar 21, 2021

andygrove commented Mar 21, 2021