Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] GpuArrayExists encounters a CudfException on an input partition consisting of just empty lists #5108

Closed
gerashegalov opened this issue Mar 31, 2022 · 2 comments
Assignees
Labels
bug Something isn't working cudf_dependency An issue or PR with this label depends on a new feature in cudf P0 Must have for release reliability Features to improve reliability or bugs that severly impact the reliability of the plugin

Comments

@gerashegalov
Copy link
Collaborator

Describe the bug

When a partition consists of just rows of empty lists, the executor crashes with

22/03/31 17:58:14 WARN TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3) (127.0.0.1 executor 0): ai.rapids.cudf.CudfException: cuDF failure at: ../src/copying/copy.cu:368: Boolean mask column must be the same size as rhs column
	at ai.rapids.cudf.ColumnView.ifElseSV(Native Method)
	at ai.rapids.cudf.ColumnView.ifElse(ColumnView.java:590)
	at com.nvidia.spark.rapids.GpuArrayExists.$anonfun$imputeFalseForEmptyArrays$4(higherOrderFunctions.scala:338)
	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
	at com.nvidia.spark.rapids.GpuArrayExists.withResource(higherOrderFunctions.scala:312)
	at com.nvidia.spark.rapids.GpuArrayExists.$anonfun$imputeFalseForEmptyArrays$3(higherOrderFunctions.scala:337)

This is related to rapidsai/cudf#10556.

Steps/Code to reproduce bug
To deterministically reproduce the bug use a single row with an empty array

spark.createDataFrame(
    [
        [ [] ],
    ],
    'a array<int>'
).createOrReplaceTempView('df')

sql("SELECT a, exists(a, x -> x = 10) AS exits10 FROM df").show()

Expected behavior
Query should not crash and produce output with the number of output rows matching input rows with the value false.

+---+-------+
|  a|exits10|
+---+-------+
| []|  false|
+---+-------+

Environment details (please complete the following information)

  • Environment location: any
  • Spark configuration settings related to the issue: N/A

Additional context
rapidsai/cudf#10556

@gerashegalov gerashegalov added bug Something isn't working ? - Needs Triage Need team to review and classify cudf_dependency An issue or PR with this label depends on a new feature in cudf labels Mar 31, 2022
@gerashegalov gerashegalov changed the title [BUG] GpuArrayExists encounters a CudfError on an input partition consisting of just empty lists [BUG] GpuArrayExists encounters a CudfException on an input partition consisting of just empty lists Mar 31, 2022
@gerashegalov gerashegalov self-assigned this Apr 4, 2022
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Apr 5, 2022
@revans2 revans2 added P0 Must have for release reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels Apr 12, 2022
@sameerz sameerz added this to the May 2 - May 20 milestone Apr 29, 2022
@jlowe
Copy link
Member

jlowe commented May 18, 2022

This should be fixed now that rapidsai/cudf#10876 has been merged.

@jlowe jlowe closed this as completed May 18, 2022
@gerashegalov
Copy link
Collaborator Author

verified as fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cudf_dependency An issue or PR with this label depends on a new feature in cudf P0 Must have for release reliability Features to improve reliability or bugs that severly impact the reliability of the plugin
Projects
None yet
Development

No branches or pull requests

5 participants