Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Incorrect output from partial-only averages with nulls #154

Closed
kuhushukla opened this issue Jun 11, 2020 · 0 comments · Fixed by #157
Closed

[BUG] Incorrect output from partial-only averages with nulls #154

kuhushukla opened this issue Jun 11, 2020 · 0 comments · Fixed by #157
Labels
bug Something isn't working SQL part of the SQL/Dataframe plugin

Comments

@kuhushukla
Copy link
Collaborator

Describe the bug
In partial only average aggregates having a columnar batch with one null entry for the sum column (averages are sent down as (sum,count) columns), results in wrong output for averages.

Steps/Code to reproduce bug
Here is a scala test to reproduce the issue and cn be run by adding it to HashAggregatesSuite.scala:

 IGNORE_ORDER_ALLOW_NON_GPU_testSparkResultsAreEqual(
    "PartMerge:avg_partOnly_null_corner_case", nullIntDf,
    execsAllowedNonGpu = Seq("HashAggregateExec", "AggregateExpression", "AttributeReference",
      "Alias", "Average", "Cast"),
    conf = partialOnlyConf,
    repart = 2) {
    frame => val result = frame.agg(avg("more_ints"))
      checkExecNode(result)
      result.explain()
      result
  }

def nullIntDf(session: SparkSession): DataFrame = {
    import session.sqlContext.implicits._
    Seq[(java.lang.Integer, java.lang.Integer)](
      (null, 15),
      (null, null)
    ).toDF("ints", "more_ints")
  }

Expected behavior
cpu avg = 15, gpu avg = null

@kuhushukla kuhushukla added bug Something isn't working ? - Needs Triage Need team to review and classify SQL part of the SQL/Dataframe plugin and removed ? - Needs Triage Need team to review and classify labels Jun 11, 2020
@kuhushukla kuhushukla changed the title [BUG] Incorrect output from partial-only hash averages with nulls [BUG] Incorrect output from partial-only averages with nulls Jun 11, 2020
@kuhushukla kuhushukla added this to the Release 0.1 milestone Jun 12, 2020
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working SQL part of the SQL/Dataframe plugin
Projects
None yet
1 participant