Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] parquet_test.py pytests FAILED on Databricks-9.1-ML-spark-3.0.2 #224

Closed
NvTimLiu opened this issue Nov 10, 2021 · 0 comments
Closed
Labels
bug Something isn't working

Comments

@NvTimLiu
Copy link

Describe the bug

2021-11-10T06:53:08.173Z] 
[2021-11-10T06:53:08.173Z] =================================== FAILURES ===================================
[2021-11-10T06:53:08.173Z] �[31m�[1m_ test_nested_pruning_and_case_insensitive[true--reader_confs0-[['struct', Struct(['c_1', String],['case_insensitive', Long],['c_3', Short])]]-[['STRUCT', Struct(['case_INSENsitive', Long])]]] _�[0m
[2021-11-10T06:53:08.173Z] [gw0] linux -- Python 3.8.12 /databricks/conda/envs/cudf-udf/bin/python
[2021-11-10T06:53:08.173Z] 
[2021-11-10T06:53:08.173Z] spark_tmp_path = '/tmp/pyspark_tests//754491/'
[2021-11-10T06:53:08.173Z] data_gen = [['struct', Struct(['c_1', String],['case_insensitive', Long],['c_3', Short])]]
[2021-11-10T06:53:08.173Z] read_schema = [['STRUCT', Struct(['case_INSENsitive', Long])]]
[2021-11-10T06:53:08.173Z] reader_confs = {'spark.rapids.sql.format.parquet.reader.type': 'PERFILE'}
[2021-11-10T06:53:08.173Z] v1_enabled_list = '', nested_enabled = 'true'
[2021-11-10T06:53:08.173Z] 
[2021-11-10T06:53:08.173Z]     @pytest.mark.parametrize('data_gen,read_schema', _nested_pruning_schemas, ids=idfn)
[2021-11-10T06:53:08.173Z]     @pytest.mark.parametrize('reader_confs', reader_opt_confs)
[2021-11-10T06:53:08.173Z]     @pytest.mark.parametrize('v1_enabled_list', ["", "parquet"])
[2021-11-10T06:53:08.173Z]     @pytest.mark.parametrize('nested_enabled', ["true", "false"])
[2021-11-10T06:53:08.173Z]     def test_nested_pruning_and_case_insensitive(spark_tmp_path, data_gen, read_schema, reader_confs, v1_enabled_list, nested_enabled):
[2021-11-10T06:53:08.173Z]         data_path = spark_tmp_path + '/PARQUET_DATA'
[2021-11-10T06:53:08.173Z]         with_cpu_session(
[2021-11-10T06:53:08.173Z]                 lambda spark : gen_df(spark, data_gen).write.parquet(data_path),
[2021-11-10T06:53:08.173Z]                 conf=rebase_write_corrected_conf)
[2021-11-10T06:53:08.173Z]         all_confs = copy_and_update(reader_confs, {
[2021-11-10T06:53:08.173Z]             'spark.sql.sources.useV1SourceList': v1_enabled_list,
[2021-11-10T06:53:08.173Z]             'spark.sql.optimizer.nestedSchemaPruning.enabled': nested_enabled,
[2021-11-10T06:53:08.173Z]             'spark.sql.legacy.parquet.datetimeRebaseModeInRead': 'CORRECTED'})
[2021-11-10T06:53:08.173Z]         # This is a hack to get the type in a slightly less verbose way
[2021-11-10T06:53:08.173Z]         rs = StructGen(read_schema, nullable=False).data_type
[2021-11-10T06:53:08.173Z] >       assert_gpu_and_cpu_are_equal_collect(lambda spark : spark.read.schema(rs).parquet(data_path),
[2021-11-10T06:53:08.173Z]                 conf=all_confs)
[2021-11-10T06:53:08.174Z] 
[2021-11-10T06:53:08.174Z] �[1m�[31m../../src/main/python/parquet_test.py�[0m:504: 
[2021-11-10T06:53:08.174Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2021-11-10T06:53:08.174Z] �[1m�[31m../../src/main/python/asserts.py�[0m:505: in assert_gpu_and_cpu_are_equal_collect
[2021-11-10T06:53:08.174Z]     _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first)
[2021-11-10T06:53:08.174Z] �[1m�[31m../../src/main/python/asserts.py�[0m:425: in _assert_gpu_and_cpu_are_equal
[2021-11-10T06:53:08.174Z]     run_on_gpu()
[2021-11-10T06:53:08.174Z] �[1m�[31m../../src/main/python/asserts.py�[0m:419: in run_on_gpu
[2021-11-10T06:53:08.174Z]     from_gpu = with_gpu_session(bring_back, conf=conf)
[2021-11-10T06:53:08.174Z] �[1m�[31m../../src/main/python/spark_session.py�[0m:105: in with_gpu_session
[2021-11-10T06:53:08.174Z]     return with_spark_session(func, conf=copy)
[2021-11-10T06:53:08.174Z] �[1m�[31m../../src/main/python/spark_session.py�[0m:70: in with_spark_session
[2021-11-10T06:53:08.174Z]     ret = func(_spark)
[2021-11-10T06:53:08.174Z] �[1m�[31m../../src/main/python/asserts.py�[0m:198: in <lambda>
[2021-11-10T06:53:08.174Z]     bring_back = lambda spark: limit_func(spark).collect()
[2021-11-10T06:53:08.174Z] �[1m�[31m/databricks/spark/python/pyspark/sql/dataframe.py�[0m:697: in collect
[2021-11-10T06:53:08.174Z]     sock_info = self._jdf.collectToPython()
[2021-11-10T06:53:08.174Z] �[1m�[31m/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py�[0m:1304: in __call__
[2021-11-10T06:53:08.174Z]     return_value = get_return_value(
[2021-11-10T06:53:08.174Z] �[1m�[31m/databricks/spark/python/pyspark/sql/utils.py�[0m:117: in deco
[2021-11-10T06:53:08.174Z]     return f(*a, **kw)
[2021-11-10T06:53:08.174Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2021-11-10T06:53:08.174Z] 
[2021-11-10T06:53:08.174Z] answer = 'xro499862'
[2021-11-10T06:53:08.174Z] gateway_client = <py4j.java_gateway.GatewayClient object at 0x7f7ea9f70a00>
[2021-11-10T06:53:08.174Z] target_id = 'o499859', name = 'collectToPython'
[2021-11-10T06:53:08.174Z] 
[2021-11-10T06:53:08.174Z]     def get_return_value(answer, gateway_client, target_id=None, name=None):
[2021-11-10T06:53:08.174Z]         """Converts an answer received from the Java gateway into a Python object.
[2021-11-10T06:53:08.174Z]     
[2021-11-10T06:53:08.174Z]         For example, string representation of integers are converted to Python
[2021-11-10T06:53:08.174Z]         integer, string representation of objects are converted to JavaObject
[2021-11-10T06:53:08.174Z]         instances, etc.
[2021-11-10T06:53:08.174Z]     
[2021-11-10T06:53:08.174Z]         :param answer: the string returned by the Java gateway
[2021-11-10T06:53:08.174Z]         :param gateway_client: the gateway client used to communicate with the Java
[2021-11-10T06:53:08.174Z]             Gateway. Only necessary if the answer is a reference (e.g., object,
[2021-11-10T06:53:08.174Z]             list, map)
[2021-11-10T06:53:08.174Z]         :param target_id: the name of the object from which the answer comes from
[2021-11-10T06:53:08.174Z]             (e.g., *object1* in `object1.hello()`). Optional.
[2021-11-10T06:53:08.174Z]         :param name: the name of the member from which the answer comes from
[2021-11-10T06:53:08.174Z]             (e.g., *hello* in `object1.hello()`). Optional.
[2021-11-10T06:53:08.174Z]         """
[2021-11-10T06:53:08.174Z]         if is_error(answer)[0]:
[2021-11-10T06:53:08.174Z]             if len(answer) > 1:
[2021-11-10T06:53:08.174Z]                 type = answer[1]
[2021-11-10T06:53:08.174Z]                 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
[2021-11-10T06:53:08.174Z]                 if answer[1] == REFERENCE_TYPE:
[2021-11-10T06:53:08.174Z] >                   raise Py4JJavaError(
[2021-11-10T06:53:08.174Z]                         "An error occurred while calling {0}{1}{2}.\n".
[2021-11-10T06:53:08.174Z]                         format(target_id, ".", name), value)
[2021-11-10T06:53:08.174Z] �[1m�[31mE                   py4j.protocol.Py4JJavaError: An error occurred while calling o499859.collectToPython.�[0m
[2021-11-10T06:53:08.174Z] �[1m�[31mE                   : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 14363.0 failed 1 times, most recent failure: Lost task 1.0 in stage 14363.0 (TID 56503) (ip-10-59-180-78.us-west-2.compute.internal executor driver): ai.rapids.cudf.CudfException: cuDF failure at: /home/jenkins/agent/workspace/jenkins-cudf_nightly-dev-github-518-cuda11/cpp/src/io/parquet/reader_impl.cu:386: Found no metadata for schema index�[0m
[2021-11-10T06:53:08.174Z] �[1m�[31mE                   	at ai.rapids.cudf.Table.readParquet(Native Method)�[0m
[2021-11-10T06:53:08.174Z] �[1m�[31mE                   	at ai.rapids.cudf.Table.readParquet(Table.java:862)�[0m
[2021-11-10T06:53:08.174Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.$anonfun$readToTable$1(GpuParquetScanBase.scala:1491)�[0m
[2021-11-10T06:53:08.174Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)�[0m
[2021-11-10T06:53:08.174Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)�[0m
[2021-11-10T06:53:08.174Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.FilePartitionReaderBase.withResource(GpuMultiFileReader.scala:236)�[0m
[2021-11-10T06:53:08.174Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.readToTable(GpuParquetScanBase.scala:1490)�[0m
[2021-11-10T06:53:08.174Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.$anonfun$readBatch$1(GpuParquetScanBase.scala:1451)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.FilePartitionReaderBase.withResource(GpuMultiFileReader.scala:236)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.readBatch(GpuParquetScanBase.scala:1439)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.next(GpuParquetScanBase.scala:1424)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.PartitionReaderWithBytesRead.next(GpuDataSourceRDD.scala:94)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarPartitionReaderWithPartitionValues.next(ColumnarPartitionReaderWithPartitionValues.scala:36)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.v2.PartitionedFileReader.next(FilePartitionReaderFactory.scala:54)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.next(FilePartitionReader.scala:67)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.PartitionIterator.hasNext(GpuDataSourceRDD.scala:61)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.MetricsBatchIterator.hasNext(GpuDataSourceRDD.scala:78)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.$anonfun$fetchNextBatch$2(GpuColumnarToRowExec.scala:223)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.withResource(GpuColumnarToRowExec.scala:178)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.fetchNextBatch(GpuColumnarToRowExec.scala:222)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.loadNextBatch(GpuColumnarToRowExec.scala:199)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.hasNext(GpuColumnarToRowExec.scala:239)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:178)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:150)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:119)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at org.apache.spark.scheduler.Task.run(Task.scala:91)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:813)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1605)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:816)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:672)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)�[0m
[2021-11-10T06:53:08.175Z] �[1m�[31mE                   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at java.lang.Thread.run(Thread.java:748)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   �[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   Driver stacktrace:�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2828)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2775)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2769)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2769)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1305)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1305)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at scala.Option.foreach(Option.scala:407)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1305)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3036)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2977)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2965)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1067)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:2476)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.Collector.runSparkJobs(Collector.scala:264)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.Collector.collect(Collector.scala:299)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:82)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:88)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.InternalRowFormat$.collect(cachedSparkResults.scala:75)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.InternalRowFormat$.collect(cachedSparkResults.scala:62)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.ResultCacheManager.$anonfun$getOrComputeResultInternal$1(ResultCacheManager.scala:512)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at scala.Option.getOrElse(Option.scala:189)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResultInternal(ResultCacheManager.scala:511)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:399)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:374)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SparkPlan.executeCollectResult(SparkPlan.scala:406)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.sql.Dataset.$anonfun$collectToPython$1(Dataset.scala:3613)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3825)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$5(SQLExecution.scala:130)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:273)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:104)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:854)�[0m
[2021-11-10T06:53:08.176Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:77)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:223)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3823)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:3611)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at sun.reflect.GeneratedMethodAccessor116.invoke(Unknown Source)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at java.lang.reflect.Method.invoke(Method.java:498)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at py4j.Gateway.invoke(Gateway.java:295)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at py4j.commands.CallCommand.execute(CallCommand.java:79)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at py4j.GatewayConnection.run(GatewayConnection.java:251)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at java.lang.Thread.run(Thread.java:748)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   Caused by: ai.rapids.cudf.CudfException: cuDF failure at: /home/jenkins/agent/workspace/jenkins-cudf_nightly-dev-github-518-cuda11/cpp/src/io/parquet/reader_impl.cu:386: Found no metadata for schema index�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at ai.rapids.cudf.Table.readParquet(Native Method)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at ai.rapids.cudf.Table.readParquet(Table.java:862)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.$anonfun$readToTable$1(GpuParquetScanBase.scala:1491)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.FilePartitionReaderBase.withResource(GpuMultiFileReader.scala:236)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.readToTable(GpuParquetScanBase.scala:1490)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.$anonfun$readBatch$1(GpuParquetScanBase.scala:1451)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.FilePartitionReaderBase.withResource(GpuMultiFileReader.scala:236)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.readBatch(GpuParquetScanBase.scala:1439)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.next(GpuParquetScanBase.scala:1424)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.PartitionReaderWithBytesRead.next(GpuDataSourceRDD.scala:94)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarPartitionReaderWithPartitionValues.next(ColumnarPartitionReaderWithPartitionValues.scala:36)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.v2.PartitionedFileReader.next(FilePartitionReaderFactory.scala:54)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.next(FilePartitionReader.scala:67)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.PartitionIterator.hasNext(GpuDataSourceRDD.scala:61)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.MetricsBatchIterator.hasNext(GpuDataSourceRDD.scala:78)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.$anonfun$fetchNextBatch$2(GpuColumnarToRowExec.scala:223)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)�[0m
[2021-11-10T06:53:08.177Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.withResource(GpuColumnarToRowExec.scala:178)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.fetchNextBatch(GpuColumnarToRowExec.scala:222)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.loadNextBatch(GpuColumnarToRowExec.scala:199)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.hasNext(GpuColumnarToRowExec.scala:239)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:178)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:150)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:119)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	at org.apache.spark.scheduler.Task.run(Task.scala:91)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:813)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1605)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:816)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:672)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)�[0m
[2021-11-10T06:53:08.178Z] �[1m�[31mE                   	... 1 more�[0m
[2021-11-10T06:53:08.178Z] 
[2021-11-10T06:53:08.178Z] �[1m�[31m/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py�[0m:326: Py4JJavaError
[2021-11-10T06:53:08.178Z] ----------------------------- Captured stdout call -----------------------------
..... 
2021-11-10T06:53:08.875Z] =========================== short test summary info ============================
[2021-11-10T06:53:08.875Z] FAILED ../../src/main/python/parquet_test.py::test_nested_pruning_and_case_insensitive[true--reader_confs0-[['struct', Struct(['c_1', String],['case_insensitive', Long],['c_3', Short])]]-[['STRUCT', Struct(['case_INSENsitive', Long])]]]
[2021-11-10T06:53:08.875Z] FAILED ../../src/main/python/parquet_test.py::test_nested_pruning_and_case_insensitive[true--reader_confs0-[['struct', Struct(['c_1', String],['case_insensitive', Long],['c_3', Short])]]-[['struct', Struct(['CASE_INSENSITIVE', Long])]]]
[2021-11-10T06:53:08.875Z] FAILED ../../src/main/python/parquet_test.py::test_nested_pruning_and_case_insensitive[true--reader_confs0-[['struct', Struct(['c_1', String],['case_insensitive', Long],['c_3', Short])]]-[['stRUct', Struct(['CASE_INSENSITIVE', Long])]]]
[2021-11-10T06:53:08.875Z] FAILED ../../src/main/python/parquet_test.py::test_nested_pruning_and_case_insensitive[true--reader_confs1-[['struct', Struct(['c_1', String],['case_insensitive', Long],['c_3', Short])]]-[['STRUCT', Struct(['case_INSENsitive', Long])]]]
[2021-11-10T06:53:08.875Z] FAILED ../../src/main/python/parquet_test.py::test_nested_pruning_and_case_insensitive[true--reader_confs1-[['struct', Struct(['c_1', String],['case_insensitive', Long],['c_3', Short])]]-[['struct', Struct(['CASE_INSENSITIVE', Long])]]]
[2021-11-10T06:53:08.875Z] FAILED ../../src/main/python/parquet_test.py::test_nested_pruning_and_case_insensitive[true--reader_confs1-[['struct', Struct(['c_1', String],['case_insensitive', Long],['c_3', Short])]]-[['stRUct', Struct(['CASE_INSENSITIVE', Long])]]]
[2021-11-10T06:53:08.876Z] FAILED ../../src/main/python/parquet_test.py::test_nested_pruning_and_case_insensitive[true--reader_confs2-[['struct', Struct(['c_1', String],['case_insensitive', Long],['c_3', Short])]]-[['STRUCT', Struct(['case_INSENsitive', Long])]]]
[2021-11-10T06:53:08.876Z] FAILED ../../src/main/python/parquet_test.py::test_nested_pruning_and_case_insensitive[true--reader_confs2-[['struct', Struct(['c_1', String],['case_insensitive', Long],['c_3', 
14:53:09  = 36 failed, 10781 passed, 136 skipped, 404 xfailed, 156 xpassed, 76 warnings in 6261.85s (1:44:21) =

Steps/Code to reproduce bug
Build rapids-4-spark and run IT on Databricks 9.1 ML spark-3.1.2

Environment details (please complete the following information)

  • Environment location: [Local spark, Databricks 9.1ML spark-3.1.2]
@NvTimLiu NvTimLiu added the bug Something isn't working label Nov 10, 2021
pxLi pushed a commit that referenced this issue May 12, 2022
* Federated analysis example to compute local and global histograms (#164)

* Federated analysis example to compute local and global histograms

save local and global histograms as html

add example histogram html

also save svg

formatting

fill in the area under the line

add numbers

add x/y labels

update readmes

add third party licenses

update readme, add docstrings, and make configurable

switch example data to COVID-19 Radiography Database

fix requested changed

* add error handling when dataset reading fails

* compute histograms for all four client

* update example plots

* update example root readme

* Add description of hello-pt-tb to examples/README.md (#232)

* Re-structure test app validators and yml file (#227)

* Minor updates to documentation (#224)

* Re-organize unit tests (NVIDIA#237)

* Re-organize unit tests

* Add init.py

* update plotting script to work with latest version on ValidationJsonGenerator (NVIDIA#260)

* fix startup path (NVIDIA#250)

Co-authored-by: Holger Roth <6304754+holgerroth@users.noreply.github.com>
Co-authored-by: Sean Yang <seany314@gmail.com>
Co-authored-by: nvkevlu <55759229+nvkevlu@users.noreply.github.com>
Co-authored-by: Nintorac <Nintorac@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant