[BUG] loading SPARK-32639 example parquet file triggers a JVM crash #1576

gerashegalov · 2021-01-25T19:19:24Z

Describe the bug
Replaying the scenario from #1463 (SPARK-32639) in GPU-enabled Scala Spark or pyspark shell results in SIGSEGV in the executor call path

C  [cudf_io5390148209696334138.so+0x170830]  cudf::io::detail::parquet::reader::impl::decode_page_data(hostdevice_vector<cudf::io::parquet::gpu::ColumnChunkDesc>&, hostdevice_vector<cudf::io::parquet::gpu::PageInfo>&, hostdevice_vec
tor<cudf::io::parquet::gpu::PageNestingInfo>&, unsigned long, unsigned long, rmm::cuda_stream_view)+0x4b0

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  ai.rapids.cudf.Table.readParquet([Ljava/lang/String;Ljava/lang/String;JJIZ)[J+0
j  ai.rapids.cudf.Table.readParquet(Lai/rapids/cudf/ParquetOptions;Lai/rapids/cudf/HostMemoryBuffer;JJ)Lai/rapids/cudf/Table;+122
j  com.nvidia.spark.rapids.MultiFileParquetPartitionReader.$anonfun$readToTable$1(Lai/rapids/cudf/ParquetOptions;Lai/rapids/cudf/HostMemoryBuffer;JLcom/nvidia/spark/rapids/NvtxWithMetrics;)Lai/rapids/cudf/Table;+4
j  com.nvidia.spark.rapids.MultiFileParquetPartitionReader$$Lambda$2380.apply(Ljava/lang/Object;)Ljava/lang/Object;+16
j  com.nvidia.spark.rapids.Arm.withResource(Ljava/lang/AutoCloseable;Lscala/Function1;)Ljava/lang/Object;+2

Steps/Code to reproduce bug

Start (py)Spark REPL with RAPIDS plugin and GPU enabled
load the file from attached to SPARK-32639

spark.read.schema('value MAP<STRUCT<first:STRING, middle:STRING, last:STRING>, STRING>').parquet("/home/gshegalov/gits/spark-rapids/integration_tests/src/test/resources/SPARK-32639/000.snappy.parquet").take(1)

Expected behavior
In Spark rc 3.1.1 the data loads fine on CPU and should do the same on GPU:

>>> spark.read.schema('value MAP<STRUCT<first:STRING, middle:STRING, last:STRING>, STRING>').parquet("/home/gshegalov/gits/spark-rapids/integration_tests/src/test/resources/SPARK-32639/000.snappy.parquet").take(1)
[Row(value={Row(first='John', middle='Y.', last='Doe'): 'brother'})]

Environment details (please complete the following information)

Environment location: Local master/shell
Spark configuration settings related to the issue:

	--conf spark.plugins=com.nvidia.spark.SQLPlugin \
	--conf spark.rapids.sql.enabled=true \
	--jars  ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}

The text was updated successfully, but these errors were encountered:

Add a test documenting the scenario failing on Spark (CPU) prior to Spark 3.1.0, closes #1463. On GPU, executor JVM crash still #1576. Since xfail does not handle the process crash gracefully and fails pytest, skip is used instead. Signed-off-by: Gera Shegalov <gera@apache.org>

jlowe · 2021-02-03T18:41:29Z

cudf fixed the crash in rapidsai/cudf#7229. We just need to re-enable the test when the cudf jar picks up that fix.

Add a test documenting the scenario failing on Spark (CPU) prior to Spark 3.1.0, closes NVIDIA#1463. On GPU, executor JVM crash still NVIDIA#1576. Since xfail does not handle the process crash gracefully and fails pytest, skip is used instead. Signed-off-by: Gera Shegalov <gera@apache.org>

… handling of multiple GPUs by Docker (NVIDIA#1576) Signed-off-by: Navin Kumar <navink@nvidia.com>

gerashegalov added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jan 25, 2021

gerashegalov mentioned this issue Jan 25, 2021

Test to track RAPIDS-side issues re SPARK-32639 #1578

Merged

sameerz removed the ? - Needs Triage Need team to review and classify label Jan 26, 2021

sameerz assigned jlowe Jan 26, 2021

jlowe mentioned this issue Feb 3, 2021

Enable Parquet test for file containing map struct key #1661

Merged

jlowe closed this as completed in #1661 Feb 4, 2021

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023

Add ability to customize DOCKER_GPU_OPTS env variable to handle weird…

0eec2c6

… handling of multiple GPUs by Docker (NVIDIA#1576) Signed-off-by: Navin Kumar <navink@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] loading SPARK-32639 example parquet file triggers a JVM crash #1576

[BUG] loading SPARK-32639 example parquet file triggers a JVM crash #1576

gerashegalov commented Jan 25, 2021

jlowe commented Feb 3, 2021

[BUG] loading SPARK-32639 example parquet file triggers a JVM crash #1576

[BUG] loading SPARK-32639 example parquet file triggers a JVM crash #1576

Comments

gerashegalov commented Jan 25, 2021

jlowe commented Feb 3, 2021