Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] test_hash_multiple_mode_query failing #1185

Closed
tgravescs opened this issue Nov 23, 2020 · 8 comments · Fixed by #1189
Closed

[BUG] test_hash_multiple_mode_query failing #1185

tgravescs opened this issue Nov 23, 2020 · 8 comments · Fixed by #1189
Assignees
Labels
bug Something isn't working P0 Must have for release

Comments

@tgravescs
Copy link
Collaborator

Describe the bug

11:37:59 FAILED src/main/python/hash_aggregate_test.py::test_hash_multiple_mode_query[{'spark.rapids.sql.variableFloatAgg.enabled': 'true', 'spark.rapids.sql.hasNans': 'false', 'spark.rapids.sql.castStringToFloat.enabled': 'true'}-[('a', Null), ('b', Integer), ('c', Long)]][IGNORE_ORDER, INCOMPAT, APPROXIMATE_FLOAT]
11:37:59 FAILED src/main/python/hash_aggregate_test.py::test_hash_query_multiple_distincts_with_non_distinct[{'spark.rapids.sql.variableFloatAgg.enabled': 'true', 'spark.rapids.sql.hasNans': 'false', 'spark.rapids.sql.castStringToFloat.enabled': 'true'}-[('a', Null), ('b', Integer), ('c', Long)]][IGNORE_ORDER, INCOMPAT, APPROXIMATE_FLOAT]
11:37:59 FAILED src/main/python/hash_aggregate_test.py::test_hash_query_max_with_multiple_distincts[{'spark.rapids.sql.variableFloatAgg.enabled': 'true', 'spark.rapids.sql.hasNans': 'false', 'spark.rapids.sql.castStringToFloat.enabled': 'true'}-[('a', RepeatSeq(String)), ('b', Integer), ('c', Null)]][IGNORE_ORDER, INCOMPAT, APPROXIMATE_FLOAT]

One of the reasons is it looks like HashAggregate is not on the GPU:

e = IllegalArgumentException('Part of the plan is not columnar class org.apache.spark.sql.execution.aggregate.HashAggregat...:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:251)\n\tat java.lang.Thread.run(Thread.java:748)\n', None)
11:37:59  
11:37:59  >   ???
11:37:59  E   pyspark.sql.utils.IllegalArgumentException: Part of the plan is not columnar class org.apache.spark.sql.execution.aggregate.HashAggregateExec
11:37:59  E   HashAggregate(keys=[a#269263], functions=[finalmerge_first(merge first#269320L, valueSet#269321) AS first(if ((gid#269300 = 0)) count(`a`)#269306L else null) ignore nulls#269307L, finalmerge_first(merge first#269324, valueSet#269325) AS first(if ((gid#269300 = 0)) avg(CAST(`b` AS BIGINT))#269308 else null) ignore nulls#269309, finalmerge_first(merge first#269328, valueSet#269329) AS first(if ((gid#269300 = 0)) avg(CAST(`a` AS DOUBLE))#269310 else null) ignore nulls#269311, finalmerge_count(merge count#269331L) AS count(if ((gid#269300 = 2)) CAST(`b` AS BIGINT)#269302L else null)#269288L, finalmerge_first(merge first#269334, valueSet#269335) AS first(if ((gid#269300 = 0)) sum(CAST(`a` AS DOUBLE))#269312 else null) ignore nulls#269313, finalmerge_first(merge first#269338, valueSet#269339) AS first(if ((gid#269300 = 0)) min(`a`)#269314 else null) ignore nulls#269315, finalmerge_first(merge first#269342, valueSet#269343) AS first(if ((gid#269300 = 0)) max(`a`)#269316 else null) ignore nulls#269317, finalmerge_sum(merge sum#269345L) AS sum(if ((gid#269300 = 2)) CAST(`b` AS BIGINT)#269302L else null)#269278L, finalmerge_count(merge count#269347L) AS count(if ((gid#269300 = 1)) `c`#269301L else null)#269289L], output=[a#269263, count(a)#269279L, avg(b)#269280, avg(a)#269281, count(b)#269282L, sum(a)#269283, min(a)#269284, max(a)#269285, sum(DISTINCT b)#269286L, count(c)#269287L])
11:37:59  E   +- Exchange hashpartitioning(a#269263, 200), true, [id=#288411]
11:37:59  E      +- HashAggregate(keys=[a#269263], functions=[partial_first(if ((gid#269300 = 0)) count(`a`)#269306L else null, true) AS (first#269320L, valueSet#269321), partial_first(if ((gid#269300 = 0)) avg(CAST(`b` AS BIGINT))#269308 else null, true) AS (first#269324, valueSet#269325), partial_first(if ((gid#269300 = 0)) avg(CAST(`a` AS DOUBLE))#269310 else null, true) AS (first#269328, valueSet#269329), partial_count(if ((gid#269300 = 2)) CAST(`b` AS BIGINT)#269302L else null) AS count#269331L, partial_first(if ((gid#269300 = 0)) sum(CAST(`a` AS DOUBLE))#269312 else null, true) AS (first#269334, valueSet#269335), partial_first(if ((gid#269300 = 0)) min(`a`)#269314 else null, true) AS (first#269338, valueSet#269339), partial_first(if ((gid#269300 = 0)) max(`a`)#269316 else null, true) AS (first#269342, valueSet#269343), partial_sum(if ((gid#269300 = 2)) CAST(`b` AS BIGINT)#269302L else null) AS sum#269345L, partial_count(if ((gid#269300 = 1)) `c`#269301L else null) AS count#269347L], output=[a#269263, first#269320L, valueSet#269321, first#269324, valueSet#269325, first#269328, valueSet#269329, count#269331L, first#269334, valueSet#269335, first#269338, valueSet#269339, first#269342, valueSet#269343, sum#269345L, count#269347L])
11:37:59  E         +- GpuColumnarToRow false
11:37:59  E            +- GpuHashAggregate(keys=[a#269263, `c`#269301L, CAST(`b` AS BIGINT)#269302L, gid#269300], functions=[gpucount(`a`#269303), gpuavg(CAST(`b` AS BIGINT)#269304L), gpuavg(CAST(`a` AS DOUBLE)#269305), gpusum(CAST(`a` AS DOUBLE)#269305), gpumin(`a`#269303), gpumax(`a`#269303)], output=[a#269263, `c`#269301L, CAST(`b` AS BIGINT)#269302L, gid#269300, count(`a`)#269306L, avg(CAST(`b` AS BIGINT))#269308, avg(CAST(`a` AS DOUBLE))#269310, sum(CAST(`a` AS DOUBLE))#269312, min(`a`)#269314, max(`a`)#269316])
11:37:59  E               +- ShuffleCoalesce com.nvidia.spark.rapids.RapidsConf@62a275fc
11:37:59  E                  +- GpuColumnarExchange gpuhashpartitioning(a#269263, `c`#269301L, CAST(`b` AS BIGINT)#269302L, gid#269300, 200), true, [id=#288406]
11:37:59  E                     +- GpuHashAggregate(keys=[a#269263, `c`#269301L, CAST(`b` AS BIGINT)#269302L, gid#269300], functions=[partial_gpucount(`a`#269303), partial_gpuavg(CAST(`b` AS BIGINT)#269304L), partial_gpuavg(CAST(`a` AS DOUBLE)#269305), partial_gpusum(CAST(`a` AS DOUBLE)#269305), partial_gpumin(`a`#269303), partial_gpumax(`a`#269303)], output=[a#269263, `c`#269301L, CAST(`b` AS BIGINT)#269302L, gid#269300, count#269349L, sum#269352, count#269353L, sum#269356, count#269357L, sum#269359, min#269361, max#269363])
11:37:59  E                        +- GpuExpand [ArrayBuffer(a#269263, null, null, 0, a#269263, cast(b#269264 as bigint), cast(a#269263 as double)), ArrayBuffer(a#269263, c#269265L, null, 1, null, null, null), ArrayBuffer(a#269263, null, cast(b#269264 as bigint), 2, null, null, null)], [a#269263, `c`#269301L, CAST(`b` AS BIGINT)#269302L, gid#269300, `a`#269303, CAST(`b` AS BIGINT)#269304L, CAST(`a` AS DOUBLE)#269305]
11:37:59  E                           +- GpuRowToColumnar TargetSize(2147483647)
11:37:59  E                              +- Scan ExistingRDD[a#269263,b#269264,c#269265L]
@tgravescs tgravescs added bug Something isn't working ? - Needs Triage Need team to review and classify P0 Must have for release labels Nov 23, 2020
@tgravescs tgravescs added this to the Nov 23 - Dec 4 milestone Nov 23, 2020
@jlowe
Copy link
Member

jlowe commented Nov 23, 2020

I'm not sure this failure is specific to Databricks.

@kuhushukla
Copy link
Collaborator

Do you folks want me to take a look at this in case @tgravescs and @jlowe are not?

@jlowe
Copy link
Member

jlowe commented Nov 23, 2020

@kuhushukla if you could look into this that'd be great. The nightly Spark 3.0.0 integration tests have been failing with this as well, it appears.

@kuhushukla kuhushukla self-assigned this Nov 23, 2020
@kuhushukla
Copy link
Collaborator

Will update here asap.

@kuhushukla
Copy link
Collaborator

Seems related to NullType support IMO, as of the failing ones - at least one column is Null.

@tgravescs tgravescs changed the title [BUG] test_hash_multiple_mode_query failing on Databricks 3.0.1 [BUG] test_hash_multiple_mode_query failing Nov 23, 2020
@kuhushukla
Copy link
Collaborator

I am unable to repro this locally even with the spark-3.0.0 artifact that is used by the nightly build. I will see what I can do further and pissbly add explain-all for debug if nothing works out.

@revans2 revans2 assigned revans2 and unassigned kuhushukla Nov 23, 2020
@revans2
Copy link
Collaborator

revans2 commented Nov 23, 2020

Thanks @kuhushukla It looks like the failure is happening on 3.0.1, and it is related to First.

            !Expression <First> first(if ((gid#1631 = 0)) max(`a`)#1648 else null) ignore nulls cannot run on GPU because unsupported data types in input: NullType; expression First first(if ((gid#1631 = 0)) max(`a`)#1648 else null) ignore nulls produces an unsupported type NullType

I'll see if I can fix it.

@revans2
Copy link
Collaborator

revans2 commented Nov 23, 2020

Yup it is my fault. I forgot to update the 3.0.1 shim layer to let first and last work with NullType. I got the 3.0.0 shim layer. I'll put up a PR shortly.

@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Dec 14, 2020
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
…IDIA#1185)

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0 Must have for release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants