You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
If we enable PCBS, the mortgage xgboost training job will fail in "Transformation and Show Result Sample" part.
Error executor log:
{
"name" : "_col30",
"type" : "double",
"nullable" : false,
"metadata" : { }
} ]
}
at scala.Predef$.assert(Predef.scala:223)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.<init>(ParquetRowConverter.scala:158)
at org.apache.spark.sql.execution.datasources.parquet.rapids.shims.v2.ShimParquetRowConverter.<init>(ShimVectorizedColumnReader.scala:45)
at org.apache.spark.sql.execution.datasources.parquet.rapids.shims.v2.ParquetRecordMaterializer.<init>(ParquetMaterializer.scala:47)
at com.nvidia.spark.rapids.shims.v2.ParquetCachedBatchSerializer$CachedBatchIteratorConsumer$$anon$3.$anonfun$convertCachedBatchToInternalRowIter$1(ParquetCachedBatchSerializer.scala:760)
at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
at com.nvidia.spark.rapids.shims.v2.ParquetCachedBatchSerializer.withResource(ParquetCachedBatchSerializer.scala:262)
at com.nvidia.spark.rapids.shims.v2.ParquetCachedBatchSerializer$CachedBatchIteratorConsumer$$anon$3.convertCachedBatchToInternalRowIter(ParquetCachedBatchSerializer.scala:744)
at com.nvidia.spark.rapids.shims.v2.ParquetCachedBatchSerializer$CachedBatchIteratorConsumer$$anon$3.hasNext(ParquetCachedBatchSerializer.scala:724)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:86)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:80)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:307)
at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:621)
at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:397)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:232)
Executor task launch worker for task 2.1 in stage 7.0 (TID 25) 22/04/04 11:52:29:485 ERROR PythonRunner: This may have been caused by a prior exception:
java.lang.AssertionError: assertion failed: User-defined types in Catalyst schema should have already been expanded:
Environment details (please complete the following information)
spark standalone cluster with 8 A100 GPUs(spark2a)
Describe the bug
If we enable PCBS, the mortgage xgboost training job will fail in "Transformation and Show Result Sample" part.
Error executor log:
Environment details (please complete the following information)
spark standalone cluster with 8 A100 GPUs(spark2a)
If we remove
--conf spark.sql.cache.serializer=com.nvidia.spark.ParquetCachedBatchSerializer \
, it works well.The text was updated successfully, but these errors were encountered: