Fix tests failures in string_test.py #11030

razajafri · 2024-06-08T05:32:06Z

FAILED ../../../../integration_tests/src/main/python/string_test.py::test_endswith
FAILED ../../../../integration_tests/src/main/python/string_test.py::test_unsupported_fallback_substring_index

The text was updated successfully, but these errors were encountered:

mythrocks · 2024-06-12T23:47:51Z

test_unsupported_fallback_substring_index fails with a legitimate cause:

E               pyspark.errors.exceptions.captured.NumberFormatException: For input string: "rdd_value_2"

The other tests all pass with ANSI mode disabled.

mythrocks · 2024-06-25T01:22:09Z

This is odd. I can't seem to repro this failure now.

mythrocks · 2024-06-25T17:31:27Z

I have double-checked my work. These tests don't fail.

I'm closing this. We can reopen this if we see failures in the future.

mythrocks · 2024-06-25T20:47:55Z

Yep, I think I spoke too soon. Reopening.

mythrocks · 2024-07-19T22:08:54Z

The problem with .endswith is proving elusive. While this can be repro-ed in test, its occurrence is occasional from the REPL.
For a brief while, it could be repro-ed simply by adding the plugin jar to the class path. (i.e. not even enabling the plugin.) It appeared to have been some sort of shading error.

I'm still investigating, but this is proving a time sink.

mythrocks · 2024-07-22T21:25:03Z

Yep, this is still baffling. Here is the exception:

py4j.protocol.Py4JJavaError: An error occurred while calling o206.endsWith.
: java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.Column.expr()" because "x$1" is null
      at org.apache.spark.sql.Column$.$anonfun$fn$2(Column.scala:77)
      at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:75)
      at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:35)
      at org.apache.spark.sql.Column$.$anonfun$fn$1(Column.scala:77)
      at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:84)
      at org.apache.spark.sql.package$.withOrigin(package.scala:111)
      at org.apache.spark.sql.Column$.fn(Column.scala:76)
      at org.apache.spark.sql.Column$.fn(Column.scala:64)
      at org.apache.spark.sql.Column.fn(Column.scala:169)
      at org.apache.spark.sql.Column.endsWith(Column.scala:1078)
      at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)

This is pointing into new code in Spark 4.0.

Column {
  UnresolvedFunction(Seq(name), inputs.map(_.expr), isDistinct, ignoreNulls = ignoreNulls)
}

The complaint seems to be that .expr can't be called on the null passed into .endswith(). (Note that the code sees this as a null Column, and not a literal.)

I'm unable to repro this from the command line. Attaching a debugger allows this code to run through as well.

This is occasionally reproducible from the pyspark shell. The exception is thrown from Spark CPU, and should not need the plugin for repro.

I'm fairly confident that this is a bug in Spark 4, that routes None as column, instead of a literal.

mythrocks · 2024-07-22T22:49:58Z

As for the problem highlighted in test_unsupported_fallback_substring_index, I'm fairly certain this is a bug in code-gen in Spark 4.0. Here's the stack trace:

scala> sql("select SUBSTRING_INDEX('a', '_', num) from mytable ").show(false)
java.lang.NumberFormatException: For input string: "columnartorow_value_0"
  at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
  at java.base/java.lang.Integer.parseInt(Integer.java:668)
  at org.apache.spark.sql.catalyst.expressions.SubstringIndex.$anonfun$doGenCode$29(stringExpressions.scala:1449)
  at org.apache.spark.sql.catalyst.expressions.TernaryExpression.$anonfun$defineCodeGen$3(Expression.scala:869)
  at org.apache.spark.sql.catalyst.expressions.TernaryExpression.nullSafeCodeGen(Expression.scala:888)
  at org.apache.spark.sql.catalyst.expressions.TernaryExpression.defineCodeGen(Expression.scala:868)
  at org.apache.spark.sql.catalyst.expressions.SubstringIndex.doGenCode(stringExpressions.scala:1448)
  at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:207)

Edit: I have filed https://issues.apache.org/jira/browse/SPARK-48989 against Spark 4.x, to track the WholeStageCodeGen/NFE problem. This is happening on the CPU, without the plugin's involvement.

Fixes NVIDIA#11030. This commit skips the tests pertaining to the following operations on Apache Spark 4.0: 1. Column.endswith() 2. substring_index() In both cases, the CPU version of Apache Spark 4.0 seems to cause exceptions, unrelated to the Spark RAPIDS plugin. See: 1. https://issues.apache.org/jira/browse/SPARK-48989 2. https://issues.apache.org/jira/browse/SPARK-48995 Signed-off-by: MithunR <mithunr@nvidia.com>

Fixes #11030. This commit skips the tests pertaining to the following operations on Apache Spark 4.0: 1. Column.endswith() 2. substring_index() In both cases, the CPU version of Apache Spark 4.0 seems to cause exceptions, unrelated to the Spark RAPIDS plugin. See: 1. https://issues.apache.org/jira/browse/SPARK-48989 2. https://issues.apache.org/jira/browse/SPARK-48995 Signed-off-by: MithunR <mithunr@nvidia.com>

razajafri added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jun 8, 2024

razajafri mentioned this issue Jun 8, 2024

Fix test failures for Spark 4.0.0 #11004

Open

razajafri added the Spark 4.0+ Spark 4.0+ issues label Jun 8, 2024

mattahrens removed the ? - Needs Triage Need team to review and classify label Jun 11, 2024

mythrocks self-assigned this Jun 12, 2024

mythrocks closed this as completed Jun 25, 2024

mythrocks reopened this Jun 25, 2024

mythrocks mentioned this issue Jul 24, 2024

Fix string_test.py errors on Spark 4.0 #11247

Merged

razajafri closed this as completed in #11247 Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix tests failures in string_test.py #11030

Fix tests failures in string_test.py #11030

razajafri commented Jun 8, 2024

mythrocks commented Jun 12, 2024

mythrocks commented Jun 25, 2024

mythrocks commented Jun 25, 2024

mythrocks commented Jun 25, 2024

mythrocks commented Jul 19, 2024 •

edited

Loading

mythrocks commented Jul 22, 2024 •

edited

Loading

mythrocks commented Jul 22, 2024 •

edited

Loading

Fix tests failures in string_test.py #11030

Fix tests failures in string_test.py #11030

Comments

razajafri commented Jun 8, 2024

mythrocks commented Jun 12, 2024

mythrocks commented Jun 25, 2024

mythrocks commented Jun 25, 2024

mythrocks commented Jun 25, 2024

mythrocks commented Jul 19, 2024 • edited Loading

mythrocks commented Jul 22, 2024 • edited Loading

mythrocks commented Jul 22, 2024 • edited Loading

mythrocks commented Jul 19, 2024 •

edited

Loading

mythrocks commented Jul 22, 2024 •

edited

Loading

mythrocks commented Jul 22, 2024 •

edited

Loading