Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] test_parquet_read_merge_schema failed w/ TITAN V #5493

Closed
pxLi opened this issue May 16, 2022 · 2 comments · Fixed by NVIDIA/spark-rapids-jni#365
Closed

[BUG] test_parquet_read_merge_schema failed w/ TITAN V #5493

pxLi opened this issue May 16, 2022 · 2 comments · Fixed by NVIDIA/spark-rapids-jni#365
Assignees
Labels
bug Something isn't working P1 Nice to have for release

Comments

@pxLi
Copy link
Collaborator

pxLi commented May 16, 2022

Describe the bug
blossom rapids_it-ubuntu16-dev-github build ID 14. This pipeline use titan_v which has relatively less gpu memory as T4/V100

mostly Caused by: java.lang.AssertionError: End address is too high for setBytes 0x7fcfe5a57628 < 0x7fcfe5a57624

[2022-05-15T15:44:49.997Z] FAILED ../../src/main/python/parquet_test.py::test_parquet_read_merge_schema[-reader_confs3]
[2022-05-15T15:44:49.997Z] FAILED ../../src/main/python/parquet_test.py::test_parquet_read_merge_schema[-reader_confs4]
[2022-05-15T15:44:49.997Z] FAILED ../../src/main/python/parquet_test.py::test_parquet_read_merge_schema[parquet-reader_confs3]
[2022-05-15T15:44:49.997Z] FAILED ../../src/main/python/parquet_test.py::test_parquet_read_merge_schema[parquet-reader_confs4]
[2022-05-15T15:44:49.997Z] FAILED ../../src/main/python/parquet_test.py::test_parquet_read_merge_schema_from_conf[-reader_confs3]
[2022-05-15T15:44:49.997Z] FAILED ../../src/main/python/parquet_test.py::test_parquet_read_merge_schema_from_conf[-reader_confs4]
[2022-05-15T15:44:49.997Z] FAILED ../../src/main/python/parquet_test.py::test_parquet_read_merge_schema_from_conf[parquet-reader_confs3]
[2022-05-15T15:44:49.997Z] FAILED ../../src/main/python/parquet_test.py::test_parquet_read_merge_schema_from_conf[parquet-reader_confs4]
[2022-05-15T15:44:49.702Z] =================================== FAILURES ===================================
[2022-05-15T15:44:49.702Z] �[31m�[1m________________ test_parquet_read_merge_schema[-reader_confs3] ________________�[0m
[2022-05-15T15:44:49.702Z] 
[2022-05-15T15:44:49.702Z] spark_tmp_path = '/tmp/pyspark_tests//it-ub16-302-4-zgm59-x6qfb-main-1147-1482896633/'
[2022-05-15T15:44:49.702Z] v1_enabled_list = ''
[2022-05-15T15:44:49.702Z] reader_confs = {'spark.rapids.sql.format.parquet.reader.footer.type': 'NATIVE', 'spark.rapids.sql.format.parquet.reader.type': 'PERFILE'}
[2022-05-15T15:44:49.702Z] 
[2022-05-15T15:44:49.702Z]     @pytest.mark.parametrize('reader_confs', reader_opt_confs)
[2022-05-15T15:44:49.702Z]     @pytest.mark.parametrize('v1_enabled_list', ["", "parquet"])
[2022-05-15T15:44:49.702Z]     def test_parquet_read_merge_schema(spark_tmp_path, v1_enabled_list, reader_confs):
[2022-05-15T15:44:49.702Z]         # Once https://github.com/NVIDIA/spark-rapids/issues/133 and https://github.com/NVIDIA/spark-rapids/issues/132 are fixed
[2022-05-15T15:44:49.702Z]         # we should go with a more standard set of generators
[2022-05-15T15:44:49.702Z]         parquet_gens = [byte_gen, short_gen, int_gen, long_gen, float_gen, double_gen,
[2022-05-15T15:44:49.702Z]         string_gen, boolean_gen, DateGen(start=date(1590, 1, 1)),
[2022-05-15T15:44:49.702Z]         TimestampGen(start=datetime(1900, 1, 1, tzinfo=timezone.utc))] + decimal_gens
[2022-05-15T15:44:49.702Z]         first_gen_list = [('_c' + str(i), gen) for i, gen in enumerate(parquet_gens)]
[2022-05-15T15:44:49.702Z]         first_data_path = spark_tmp_path + '/PARQUET_DATA/key=0'
[2022-05-15T15:44:49.702Z]         with_cpu_session(
[2022-05-15T15:44:49.702Z]                 lambda spark : gen_df(spark, first_gen_list).write.parquet(first_data_path),
[2022-05-15T15:44:49.702Z]                 conf=rebase_write_legacy_conf)
[2022-05-15T15:44:49.702Z]         second_gen_list = [(('_c' if i % 2 == 0 else '_b') + str(i), gen) for i, gen in enumerate(parquet_gens)]
[2022-05-15T15:44:49.702Z]         second_data_path = spark_tmp_path + '/PARQUET_DATA/key=1'
[2022-05-15T15:44:49.702Z]         with_cpu_session(
[2022-05-15T15:44:49.702Z]                 lambda spark : gen_df(spark, second_gen_list).write.parquet(second_data_path),
[2022-05-15T15:44:49.702Z]                 conf=rebase_write_corrected_conf)
[2022-05-15T15:44:49.703Z]         data_path = spark_tmp_path + '/PARQUET_DATA'
[2022-05-15T15:44:49.703Z]         all_confs = copy_and_update(reader_confs, {'spark.sql.sources.useV1SourceList': v1_enabled_list})
[2022-05-15T15:44:49.703Z] >       assert_gpu_and_cpu_are_equal_collect(
[2022-05-15T15:44:49.703Z]                 lambda spark : spark.read.option('mergeSchema', 'true').parquet(data_path),
[2022-05-15T15:44:49.703Z]                 conf=all_confs)
[2022-05-15T15:44:49.703Z] 
[2022-05-15T15:44:49.703Z] �[1m�[31m../../src/main/python/parquet_test.py�[0m:364: 
[2022-05-15T15:44:49.703Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2022-05-15T15:44:49.703Z] �[1m�[31m../../src/main/python/asserts.py�[0m:508: in assert_gpu_and_cpu_are_equal_collect
[2022-05-15T15:44:49.703Z]     _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first)
[2022-05-15T15:44:49.703Z] �[1m�[31m../../src/main/python/asserts.py�[0m:428: in _assert_gpu_and_cpu_are_equal
[2022-05-15T15:44:49.703Z]     run_on_gpu()
[2022-05-15T15:44:49.703Z] �[1m�[31m../../src/main/python/asserts.py�[0m:422: in run_on_gpu
[2022-05-15T15:44:49.703Z]     from_gpu = with_gpu_session(bring_back, conf=conf)
[2022-05-15T15:44:49.703Z] �[1m�[31m../../src/main/python/spark_session.py�[0m:131: in with_gpu_session
[2022-05-15T15:44:49.703Z]     return with_spark_session(func, conf=copy)
[2022-05-15T15:44:49.703Z] �[1m�[31m../../src/main/python/spark_session.py�[0m:98: in with_spark_session
[2022-05-15T15:44:49.703Z]     ret = func(_spark)
[2022-05-15T15:44:49.703Z] �[1m�[31m../../src/main/python/asserts.py�[0m:201: in <lambda>
[2022-05-15T15:44:49.703Z]     bring_back = lambda spark: limit_func(spark).collect()
[2022-05-15T15:44:49.703Z] �[1m�[31m/home/jenkins/agent/workspace/jenkins-rapids_it-ubuntu16-dev-github-4/jars/spark-3.1.2-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/dataframe.py�[0m:677: in collect
[2022-05-15T15:44:49.703Z]     sock_info = self._jdf.collectToPython()
[2022-05-15T15:44:49.703Z] �[1m�[31m/home/jenkins/agent/workspace/jenkins-rapids_it-ubuntu16-dev-github-4/jars/spark-3.1.2-bin-hadoop3.2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py�[0m:1304: in __call__
[2022-05-15T15:44:49.703Z]     return_value = get_return_value(
[2022-05-15T15:44:49.703Z] �[1m�[31m/home/jenkins/agent/workspace/jenkins-rapids_it-ubuntu16-dev-github-4/jars/spark-3.1.2-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/utils.py�[0m:111: in deco
[2022-05-15T15:44:49.703Z]     return f(*a, **kw)
[2022-05-15T15:44:49.703Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2022-05-15T15:44:49.703Z] 
[2022-05-15T15:44:49.703Z] answer = 'xro2558741'
[2022-05-15T15:44:49.703Z] gateway_client = <py4j.java_gateway.GatewayClient object at 0x7f7b3ecf1c10>
[2022-05-15T15:44:49.703Z] target_id = 'o2558740', name = 'collectToPython'
[2022-05-15T15:44:49.703Z] 
[2022-05-15T15:44:49.703Z]     def get_return_value(answer, gateway_client, target_id=None, name=None):
[2022-05-15T15:44:49.703Z]         """Converts an answer received from the Java gateway into a Python object.
[2022-05-15T15:44:49.703Z]     
[2022-05-15T15:44:49.703Z]         For example, string representation of integers are converted to Python
[2022-05-15T15:44:49.703Z]         integer, string representation of objects are converted to JavaObject
[2022-05-15T15:44:49.703Z]         instances, etc.
[2022-05-15T15:44:49.703Z]     
[2022-05-15T15:44:49.703Z]         :param answer: the string returned by the Java gateway
[2022-05-15T15:44:49.703Z]         :param gateway_client: the gateway client used to communicate with the Java
[2022-05-15T15:44:49.703Z]             Gateway. Only necessary if the answer is a reference (e.g., object,
[2022-05-15T15:44:49.703Z]             list, map)
[2022-05-15T15:44:49.703Z]         :param target_id: the name of the object from which the answer comes from
[2022-05-15T15:44:49.703Z]             (e.g., *object1* in `object1.hello()`). Optional.
[2022-05-15T15:44:49.703Z]         :param name: the name of the member from which the answer comes from
[2022-05-15T15:44:49.703Z]             (e.g., *hello* in `object1.hello()`). Optional.
[2022-05-15T15:44:49.703Z]         """
[2022-05-15T15:44:49.703Z]         if is_error(answer)[0]:
[2022-05-15T15:44:49.703Z]             if len(answer) > 1:
[2022-05-15T15:44:49.703Z]                 type = answer[1]
[2022-05-15T15:44:49.703Z]                 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
[2022-05-15T15:44:49.703Z]                 if answer[1] == REFERENCE_TYPE:
[2022-05-15T15:44:49.703Z] >                   raise Py4JJavaError(
[2022-05-15T15:44:49.703Z]                         "An error occurred while calling {0}{1}{2}.\n".
[2022-05-15T15:44:49.703Z]                         format(target_id, ".", name), value)
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   py4j.protocol.Py4JJavaError: An error occurred while calling o2558740.collectToPython.�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   : org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 45957.0 failed 1 times, most recent failure: Lost task 5.0 in stage 45957.0 (TID 198266) (10.233.113.151 executor 0): java.lang.AssertionError: End address is too high for setBytes 0x7fcfe5a57628 < 0x7fcfe5a57624�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at ai.rapids.cudf.MemoryBuffer.addressOutOfBoundsCheck(MemoryBuffer.java:138)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at ai.rapids.cudf.HostMemoryBuffer.setBytes(HostMemoryBuffer.java:313)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.HostMemoryOutputStream.write(HostMemoryStreams.scala:39)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReaderBase.$anonfun$readPartFile$3(GpuParquetScan.scala:1404)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.closeOnExcept(Arm.scala:87)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.closeOnExcept$(Arm.scala:85)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.FilePartitionReaderBase.closeOnExcept(GpuMultiFileReader.scala:263)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReaderBase.$anonfun$readPartFile$2(GpuParquetScan.scala:1396)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.FilePartitionReaderBase.withResource(GpuMultiFileReader.scala:263)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReaderBase.$anonfun$readPartFile$1(GpuParquetScan.scala:1394)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.FilePartitionReaderBase.withResource(GpuMultiFileReader.scala:263)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReaderBase.readPartFile(GpuParquetScan.scala:1393)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReaderBase.readPartFile$(GpuParquetScan.scala:1388)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.readPartFile(GpuParquetScan.scala:2008)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.readToTable(GpuParquetScan.scala:2080)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.$anonfun$readBatch$1(GpuParquetScan.scala:2061)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.FilePartitionReaderBase.withResource(GpuMultiFileReader.scala:263)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.readBatch(GpuParquetScan.scala:2049)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.next(GpuParquetScan.scala:2035)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.PartitionReaderWithBytesRead.next(dataSourceUtil.scala:62)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarPartitionReaderWithPartitionValues.next(ColumnarPartitionReaderWithPartitionValues.scala:36)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.v2.PartitionedFileReader.next(FilePartitionReaderFactory.scala:54)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.next(FilePartitionReader.scala:67)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.PartitionIterator.hasNext(dataSourceUtil.scala:29)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.MetricsBatchIterator.hasNext(dataSourceUtil.scala:46)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.$anonfun$fetchNextBatch$2(GpuColumnarToRowExec.scala:239)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.withResource(GpuColumnarToRowExec.scala:187)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.fetchNextBatch(GpuColumnarToRowExec.scala:238)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.loadNextBatch(GpuColumnarToRowExec.scala:215)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.hasNext(GpuColumnarToRowExec.scala:255)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.Task.run(Task.scala:131)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at java.lang.Thread.run(Thread.java:748)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   �[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   Driver stacktrace:�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2258)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2207)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2206)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2206)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1079)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1079)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at scala.Option.foreach(Option.scala:407)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1079)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2445)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2387)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2376)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:868)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2196)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2217)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2236)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2261)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDD.collect(RDD.scala:1029)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:390)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.sql.Dataset.$anonfun$collectToPython$1(Dataset.scala:3519)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:3516)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at sun.reflect.GeneratedMethodAccessor82.invoke(Unknown Source)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at java.lang.reflect.Method.invoke(Method.java:498)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at py4j.Gateway.invoke(Gateway.java:282)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at py4j.commands.CallCommand.execute(CallCommand.java:79)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at py4j.GatewayConnection.run(GatewayConnection.java:238)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at java.lang.Thread.run(Thread.java:748)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   Caused by: java.lang.AssertionError: End address is too high for setBytes 0x7fcfe5a57628 < 0x7fcfe5a57624�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at ai.rapids.cudf.MemoryBuffer.addressOutOfBoundsCheck(MemoryBuffer.java:138)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at ai.rapids.cudf.HostMemoryBuffer.setBytes(HostMemoryBuffer.java:313)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.HostMemoryOutputStream.write(HostMemoryStreams.scala:39)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReaderBase.$anonfun$readPartFile$3(GpuParquetScan.scala:1404)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.closeOnExcept(Arm.scala:87)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.closeOnExcept$(Arm.scala:85)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.FilePartitionReaderBase.closeOnExcept(GpuMultiFileReader.scala:263)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReaderBase.$anonfun$readPartFile$2(GpuParquetScan.scala:1396)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.FilePartitionReaderBase.withResource(GpuMultiFileReader.scala:263)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReaderBase.$anonfun$readPartFile$1(GpuParquetScan.scala:1394)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.FilePartitionReaderBase.withResource(GpuMultiFileReader.scala:263)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReaderBase.readPartFile(GpuParquetScan.scala:1393)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReaderBase.readPartFile$(GpuParquetScan.scala:1388)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.readPartFile(GpuParquetScan.scala:2008)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.readToTable(GpuParquetScan.scala:2080)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.$anonfun$readBatch$1(GpuParquetScan.scala:2061)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.FilePartitionReaderBase.withResource(GpuMultiFileReader.scala:263)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.readBatch(GpuParquetScan.scala:2049)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.next(GpuParquetScan.scala:2035)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.PartitionReaderWithBytesRead.next(dataSourceUtil.scala:62)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarPartitionReaderWithPartitionValues.next(ColumnarPartitionReaderWithPartitionValues.scala:36)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.v2.PartitionedFileReader.next(FilePartitionReaderFactory.scala:54)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.next(FilePartitionReader.scala:67)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.PartitionIterator.hasNext(dataSourceUtil.scala:29)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.MetricsBatchIterator.hasNext(dataSourceUtil.scala:46)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.$anonfun$fetchNextBatch$2(GpuColumnarToRowExec.scala:239)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.withResource(GpuColumnarToRowExec.scala:187)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.fetchNextBatch(GpuColumnarToRowExec.scala:238)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.loadNextBatch(GpuColumnarToRowExec.scala:215)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.hasNext(GpuColumnarToRowExec.scala:255)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at org.apache.spark.scheduler.Task.run(Task.scala:131)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	... 1 more�[0m
[2022-05-15T15:44:49.706Z] 
@pxLi pxLi added bug Something isn't working ? - Needs Triage Need team to review and classify labels May 16, 2022
@revans2 revans2 self-assigned this May 16, 2022
@revans2
Copy link
Collaborator

revans2 commented May 17, 2022

I am skeptical that this has anything to do with the TITAN V. The tests fail in a part of the code that has not even toughed the GPU yet. Low memory on the TITAN V would reduce the parallelism of the tests, but the test settings are set up so that it should not matter. I have tried to recreate the failures with Spark 3.1.1 which is the version that precommit was using and Spark 3.1.2 which is the version in this CI that is failing. Neither of them were able to reproduce the problem.

I was able to verify that the test failure is reproducible in CI, so now I am going to try and slowly work towards reproducing it myself. Perhaps it is ubuntu 16 instead of 20? Or it could be running all of the tests in the same application? Not sure.

@revans2
Copy link
Collaborator

revans2 commented May 17, 2022

The only other idea that I have right now is that it might be the order in which files and directories are returned. It could be that they are being returned in different orders and that is causing schema discovery to come up with something different? Not really sure because it should be merging the schemas to produce the read schema.

@sameerz sameerz added P1 Nice to have for release and removed ? - Needs Triage Need team to review and classify labels May 17, 2022
sperlingxx added a commit that referenced this issue May 19, 2022
…5500)

Native footer reader for parquet fetches data fields totally based on read schema, which may lead to overflow if merge schema is enabled. When merge schema is enabled, the file schema of each file partition may not contain the complete (read) schema. In this situation, native footer reader will come up with incorrect footers.

Fallback the parquet reading to CPU if merge schema and native footer reader are both enabled, in case of buffer overflow like #5493
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P1 Nice to have for release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants