Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tests failures in string_test.py #11030

Closed
Tracked by #11004
razajafri opened this issue Jun 8, 2024 · 7 comments · Fixed by #11247
Closed
Tracked by #11004

Fix tests failures in string_test.py #11030

razajafri opened this issue Jun 8, 2024 · 7 comments · Fixed by #11247
Assignees
Labels
bug Something isn't working Spark 4.0+ Spark 4.0+ issues

Comments

@razajafri
Copy link
Collaborator

FAILED ../../../../integration_tests/src/main/python/string_test.py::test_endswith
FAILED ../../../../integration_tests/src/main/python/string_test.py::test_unsupported_fallback_substring_index
@razajafri razajafri added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jun 8, 2024
@razajafri razajafri added the Spark 4.0+ Spark 4.0+ issues label Jun 8, 2024
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Jun 11, 2024
@mythrocks
Copy link
Collaborator

test_unsupported_fallback_substring_index fails with a legitimate cause:

E               pyspark.errors.exceptions.captured.NumberFormatException: For input string: "rdd_value_2"

The other tests all pass with ANSI mode disabled.

@mythrocks mythrocks self-assigned this Jun 12, 2024
@mythrocks
Copy link
Collaborator

This is odd. I can't seem to repro this failure now.

@mythrocks
Copy link
Collaborator

I have double-checked my work. These tests don't fail.

I'm closing this. We can reopen this if we see failures in the future.

@mythrocks
Copy link
Collaborator

Yep, I think I spoke too soon. Reopening.

@mythrocks mythrocks reopened this Jun 25, 2024
@mythrocks
Copy link
Collaborator

mythrocks commented Jul 19, 2024

The problem with .endswith is proving elusive. While this can be repro-ed in test, its occurrence is occasional from the REPL.
For a brief while, it could be repro-ed simply by adding the plugin jar to the class path. (i.e. not even enabling the plugin.) It appeared to have been some sort of shading error.

I'm still investigating, but this is proving a time sink.

@mythrocks
Copy link
Collaborator

mythrocks commented Jul 22, 2024

Yep, this is still baffling. Here is the exception:

py4j.protocol.Py4JJavaError: An error occurred while calling o206.endsWith.
: java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.Column.expr()" because "x$1" is null
      at org.apache.spark.sql.Column$.$anonfun$fn$2(Column.scala:77)
      at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:75)
      at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:35)
      at org.apache.spark.sql.Column$.$anonfun$fn$1(Column.scala:77)
      at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:84)
      at org.apache.spark.sql.package$.withOrigin(package.scala:111)
      at org.apache.spark.sql.Column$.fn(Column.scala:76)
      at org.apache.spark.sql.Column$.fn(Column.scala:64)
      at org.apache.spark.sql.Column.fn(Column.scala:169)
      at org.apache.spark.sql.Column.endsWith(Column.scala:1078)
      at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)

This is pointing into new code in Spark 4.0.

Column {
  UnresolvedFunction(Seq(name), inputs.map(_.expr), isDistinct, ignoreNulls = ignoreNulls)
}

The complaint seems to be that .expr can't be called on the null passed into .endswith(). (Note that the code sees this as a null Column, and not a literal.)

I'm unable to repro this from the command line. Attaching a debugger allows this code to run through as well.

This is occasionally reproducible from the pyspark shell. The exception is thrown from Spark CPU, and should not need the plugin for repro.

I'm fairly confident that this is a bug in Spark 4, that routes None as column, instead of a literal.

@mythrocks
Copy link
Collaborator

mythrocks commented Jul 22, 2024

As for the problem highlighted in test_unsupported_fallback_substring_index, I'm fairly certain this is a bug in code-gen in Spark 4.0. Here's the stack trace:

scala> sql("select SUBSTRING_INDEX('a', '_', num) from mytable ").show(false)
java.lang.NumberFormatException: For input string: "columnartorow_value_0"
  at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
  at java.base/java.lang.Integer.parseInt(Integer.java:668)
  at org.apache.spark.sql.catalyst.expressions.SubstringIndex.$anonfun$doGenCode$29(stringExpressions.scala:1449)
  at org.apache.spark.sql.catalyst.expressions.TernaryExpression.$anonfun$defineCodeGen$3(Expression.scala:869)
  at org.apache.spark.sql.catalyst.expressions.TernaryExpression.nullSafeCodeGen(Expression.scala:888)
  at org.apache.spark.sql.catalyst.expressions.TernaryExpression.defineCodeGen(Expression.scala:868)
  at org.apache.spark.sql.catalyst.expressions.SubstringIndex.doGenCode(stringExpressions.scala:1448)
  at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:207)

Edit: I have filed https://issues.apache.org/jira/browse/SPARK-48989 against Spark 4.x, to track the WholeStageCodeGen/NFE problem. This is happening on the CPU, without the plugin's involvement.

mythrocks added a commit to mythrocks/spark-rapids that referenced this issue Jul 24, 2024
Fixes NVIDIA#11030.

This commit skips the tests pertaining to the following operations
on Apache Spark 4.0:

1. Column.endswith()
2. substring_index()

In both cases, the CPU version of Apache Spark 4.0 seems to cause
exceptions, unrelated to the Spark RAPIDS plugin. See:

1. https://issues.apache.org/jira/browse/SPARK-48989
2. https://issues.apache.org/jira/browse/SPARK-48995

Signed-off-by: MithunR <mithunr@nvidia.com>
razajafri pushed a commit that referenced this issue Jul 25, 2024
Fixes #11030.

This commit skips the tests pertaining to the following operations
on Apache Spark 4.0:

1. Column.endswith()
2. substring_index()

In both cases, the CPU version of Apache Spark 4.0 seems to cause
exceptions, unrelated to the Spark RAPIDS plugin. See:

1. https://issues.apache.org/jira/browse/SPARK-48989
2. https://issues.apache.org/jira/browse/SPARK-48995

Signed-off-by: MithunR <mithunr@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Spark 4.0+ Spark 4.0+ issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants