[BUG] test_hash_multiple_mode_query failing #1185

tgravescs · 2020-11-23T15:17:48Z

Describe the bug

11:37:59 FAILED src/main/python/hash_aggregate_test.py::test_hash_multiple_mode_query[{'spark.rapids.sql.variableFloatAgg.enabled': 'true', 'spark.rapids.sql.hasNans': 'false', 'spark.rapids.sql.castStringToFloat.enabled': 'true'}-[('a', Null), ('b', Integer), ('c', Long)]][IGNORE_ORDER, INCOMPAT, APPROXIMATE_FLOAT]
11:37:59 FAILED src/main/python/hash_aggregate_test.py::test_hash_query_multiple_distincts_with_non_distinct[{'spark.rapids.sql.variableFloatAgg.enabled': 'true', 'spark.rapids.sql.hasNans': 'false', 'spark.rapids.sql.castStringToFloat.enabled': 'true'}-[('a', Null), ('b', Integer), ('c', Long)]][IGNORE_ORDER, INCOMPAT, APPROXIMATE_FLOAT]
11:37:59 FAILED src/main/python/hash_aggregate_test.py::test_hash_query_max_with_multiple_distincts[{'spark.rapids.sql.variableFloatAgg.enabled': 'true', 'spark.rapids.sql.hasNans': 'false', 'spark.rapids.sql.castStringToFloat.enabled': 'true'}-[('a', RepeatSeq(String)), ('b', Integer), ('c', Null)]][IGNORE_ORDER, INCOMPAT, APPROXIMATE_FLOAT]

One of the reasons is it looks like HashAggregate is not on the GPU:

e = IllegalArgumentException('Part of the plan is not columnar class org.apache.spark.sql.execution.aggregate.HashAggregat...:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:251)\n\tat java.lang.Thread.run(Thread.java:748)\n', None)
11:37:59  
11:37:59  >   ???
11:37:59  E   pyspark.sql.utils.IllegalArgumentException: Part of the plan is not columnar class org.apache.spark.sql.execution.aggregate.HashAggregateExec
11:37:59  E   HashAggregate(keys=[a#269263], functions=[finalmerge_first(merge first#269320L, valueSet#269321) AS first(if ((gid#269300 = 0)) count(`a`)#269306L else null) ignore nulls#269307L, finalmerge_first(merge first#269324, valueSet#269325) AS first(if ((gid#269300 = 0)) avg(CAST(`b` AS BIGINT))#269308 else null) ignore nulls#269309, finalmerge_first(merge first#269328, valueSet#269329) AS first(if ((gid#269300 = 0)) avg(CAST(`a` AS DOUBLE))#269310 else null) ignore nulls#269311, finalmerge_count(merge count#269331L) AS count(if ((gid#269300 = 2)) CAST(`b` AS BIGINT)#269302L else null)#269288L, finalmerge_first(merge first#269334, valueSet#269335) AS first(if ((gid#269300 = 0)) sum(CAST(`a` AS DOUBLE))#269312 else null) ignore nulls#269313, finalmerge_first(merge first#269338, valueSet#269339) AS first(if ((gid#269300 = 0)) min(`a`)#269314 else null) ignore nulls#269315, finalmerge_first(merge first#269342, valueSet#269343) AS first(if ((gid#269300 = 0)) max(`a`)#269316 else null) ignore nulls#269317, finalmerge_sum(merge sum#269345L) AS sum(if ((gid#269300 = 2)) CAST(`b` AS BIGINT)#269302L else null)#269278L, finalmerge_count(merge count#269347L) AS count(if ((gid#269300 = 1)) `c`#269301L else null)#269289L], output=[a#269263, count(a)#269279L, avg(b)#269280, avg(a)#269281, count(b)#269282L, sum(a)#269283, min(a)#269284, max(a)#269285, sum(DISTINCT b)#269286L, count(c)#269287L])
11:37:59  E   +- Exchange hashpartitioning(a#269263, 200), true, [id=#288411]
11:37:59  E      +- HashAggregate(keys=[a#269263], functions=[partial_first(if ((gid#269300 = 0)) count(`a`)#269306L else null, true) AS (first#269320L, valueSet#269321), partial_first(if ((gid#269300 = 0)) avg(CAST(`b` AS BIGINT))#269308 else null, true) AS (first#269324, valueSet#269325), partial_first(if ((gid#269300 = 0)) avg(CAST(`a` AS DOUBLE))#269310 else null, true) AS (first#269328, valueSet#269329), partial_count(if ((gid#269300 = 2)) CAST(`b` AS BIGINT)#269302L else null) AS count#269331L, partial_first(if ((gid#269300 = 0)) sum(CAST(`a` AS DOUBLE))#269312 else null, true) AS (first#269334, valueSet#269335), partial_first(if ((gid#269300 = 0)) min(`a`)#269314 else null, true) AS (first#269338, valueSet#269339), partial_first(if ((gid#269300 = 0)) max(`a`)#269316 else null, true) AS (first#269342, valueSet#269343), partial_sum(if ((gid#269300 = 2)) CAST(`b` AS BIGINT)#269302L else null) AS sum#269345L, partial_count(if ((gid#269300 = 1)) `c`#269301L else null) AS count#269347L], output=[a#269263, first#269320L, valueSet#269321, first#269324, valueSet#269325, first#269328, valueSet#269329, count#269331L, first#269334, valueSet#269335, first#269338, valueSet#269339, first#269342, valueSet#269343, sum#269345L, count#269347L])
11:37:59  E         +- GpuColumnarToRow false
11:37:59  E            +- GpuHashAggregate(keys=[a#269263, `c`#269301L, CAST(`b` AS BIGINT)#269302L, gid#269300], functions=[gpucount(`a`#269303), gpuavg(CAST(`b` AS BIGINT)#269304L), gpuavg(CAST(`a` AS DOUBLE)#269305), gpusum(CAST(`a` AS DOUBLE)#269305), gpumin(`a`#269303), gpumax(`a`#269303)], output=[a#269263, `c`#269301L, CAST(`b` AS BIGINT)#269302L, gid#269300, count(`a`)#269306L, avg(CAST(`b` AS BIGINT))#269308, avg(CAST(`a` AS DOUBLE))#269310, sum(CAST(`a` AS DOUBLE))#269312, min(`a`)#269314, max(`a`)#269316])
11:37:59  E               +- ShuffleCoalesce com.nvidia.spark.rapids.RapidsConf@62a275fc
11:37:59  E                  +- GpuColumnarExchange gpuhashpartitioning(a#269263, `c`#269301L, CAST(`b` AS BIGINT)#269302L, gid#269300, 200), true, [id=#288406]
11:37:59  E                     +- GpuHashAggregate(keys=[a#269263, `c`#269301L, CAST(`b` AS BIGINT)#269302L, gid#269300], functions=[partial_gpucount(`a`#269303), partial_gpuavg(CAST(`b` AS BIGINT)#269304L), partial_gpuavg(CAST(`a` AS DOUBLE)#269305), partial_gpusum(CAST(`a` AS DOUBLE)#269305), partial_gpumin(`a`#269303), partial_gpumax(`a`#269303)], output=[a#269263, `c`#269301L, CAST(`b` AS BIGINT)#269302L, gid#269300, count#269349L, sum#269352, count#269353L, sum#269356, count#269357L, sum#269359, min#269361, max#269363])
11:37:59  E                        +- GpuExpand [ArrayBuffer(a#269263, null, null, 0, a#269263, cast(b#269264 as bigint), cast(a#269263 as double)), ArrayBuffer(a#269263, c#269265L, null, 1, null, null, null), ArrayBuffer(a#269263, null, cast(b#269264 as bigint), 2, null, null, null)], [a#269263, `c`#269301L, CAST(`b` AS BIGINT)#269302L, gid#269300, `a`#269303, CAST(`b` AS BIGINT)#269304L, CAST(`a` AS DOUBLE)#269305]
11:37:59  E                           +- GpuRowToColumnar TargetSize(2147483647)
11:37:59  E                              +- Scan ExistingRDD[a#269263,b#269264,c#269265L]

The text was updated successfully, but these errors were encountered:

jlowe · 2020-11-23T15:20:18Z

I'm not sure this failure is specific to Databricks.

kuhushukla · 2020-11-23T15:44:16Z

Do you folks want me to take a look at this in case @tgravescs and @jlowe are not?

jlowe · 2020-11-23T15:55:44Z

@kuhushukla if you could look into this that'd be great. The nightly Spark 3.0.0 integration tests have been failing with this as well, it appears.

kuhushukla · 2020-11-23T16:01:12Z

Will update here asap.

kuhushukla · 2020-11-23T16:08:07Z

Seems related to NullType support IMO, as of the failing ones - at least one column is Null.

kuhushukla · 2020-11-23T19:40:53Z

I am unable to repro this locally even with the spark-3.0.0 artifact that is used by the nightly build. I will see what I can do further and pissbly add explain-all for debug if nothing works out.

revans2 · 2020-11-23T20:12:03Z

Thanks @kuhushukla It looks like the failure is happening on 3.0.1, and it is related to First.

            !Expression <First> first(if ((gid#1631 = 0)) max(`a`)#1648 else null) ignore nulls cannot run on GPU because unsupported data types in input: NullType; expression First first(if ((gid#1631 = 0)) max(`a`)#1648 else null) ignore nulls produces an unsupported type NullType

I'll see if I can fix it.

revans2 · 2020-11-23T20:18:28Z

Yup it is my fault. I forgot to update the 3.0.1 shim layer to let first and last work with NullType. I got the 3.0.0 shim layer. I'll put up a PR shortly.

…IDIA#1185) Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

tgravescs added bug Something isn't working ? - Needs Triage Need team to review and classify P0 Must have for release labels Nov 23, 2020

tgravescs added this to the Nov 23 - Dec 4 milestone Nov 23, 2020

kuhushukla self-assigned this Nov 23, 2020

tgravescs changed the title ~~[BUG] test_hash_multiple_mode_query failing on Databricks 3.0.1~~ [BUG] test_hash_multiple_mode_query failing Nov 23, 2020

revans2 assigned revans2 and unassigned kuhushukla Nov 23, 2020

revans2 mentioned this issue Nov 23, 2020

Enable NullType for First and Last in 3.0.1+ #1189

Merged

revans2 closed this as completed in #1189 Nov 23, 2020

sameerz removed the ? - Needs Triage Need team to review and classify label Dec 14, 2020

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023

Update submodule cudf to ebc68df255b9f304ee19b969645a10d5ab9f8bea (NV…

0280ada

…IDIA#1185) Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] test_hash_multiple_mode_query failing #1185

[BUG] test_hash_multiple_mode_query failing #1185

tgravescs commented Nov 23, 2020

jlowe commented Nov 23, 2020

kuhushukla commented Nov 23, 2020

jlowe commented Nov 23, 2020

kuhushukla commented Nov 23, 2020

kuhushukla commented Nov 23, 2020

kuhushukla commented Nov 23, 2020

revans2 commented Nov 23, 2020

revans2 commented Nov 23, 2020

[BUG] test_hash_multiple_mode_query failing #1185

[BUG] test_hash_multiple_mode_query failing #1185

Comments

tgravescs commented Nov 23, 2020

jlowe commented Nov 23, 2020

kuhushukla commented Nov 23, 2020

jlowe commented Nov 23, 2020

kuhushukla commented Nov 23, 2020

kuhushukla commented Nov 23, 2020

kuhushukla commented Nov 23, 2020

revans2 commented Nov 23, 2020

revans2 commented Nov 23, 2020