Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support testing parquet encryption #5997

Merged
merged 2 commits into from
Jul 15, 2022

Conversation

NvTimLiu
Copy link
Collaborator

To fix #5771

Download parquet-hadoop tests jar to support testing parquet encryption

Signed-off-by: Tim Liu timl@nvidia.com

Download parquet-hadoop tests jar to support testing parquet encryption

Signed-off-by: Tim Liu <timl@nvidia.com>
@NvTimLiu NvTimLiu added the build Related to CI / CD or cleanly building label Jul 13, 2022
@NvTimLiu NvTimLiu requested a review from jlowe as a code owner July 13, 2022 12:15
@NvTimLiu NvTimLiu self-assigned this Jul 13, 2022
@NvTimLiu
Copy link
Collaborator Author

Build

@NvTimLiu
Copy link
Collaborator Author

Local tests Got below Message thrown , which is expected as mentioned in the PR: #5761

(Caused by: org.apache.parquet.crypto.ParquetCryptoRuntimeException: Trying to read file with encrypted footer. No keys available).

Message thrown looks like:
Caused by: java.lang.RuntimeException: The GPU does not support reading encrypted Parquet files. To read encrypted or columnar encrypted files, disable the GPU Parquet reader via spark.rapids.sql.format.parquet.read.enabled.

22/07/13 12:12:25 ERROR TaskSetManager: Task 0 in stage 45.0 failed 1 times; aborting job
22/07/13 12:12:25 WARN TaskSetManager: Lost task 0.0 in stage 47.0 (TID 70) (172.17.0.2 executor 0): java.lang.RuntimeException: The GPU does not support reading encrypted Parquet files. To read encrypted or columnar encrypted files, disable the GPU Parquet reader via spark.rapids.sql.format.parquet.read.enabled.
        at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$filterBlocks$1(GpuParquetScan.scala:658)
        at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
        at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
        at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.withResource(GpuParquetScan.scala:467)
        at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.filterBlocks(GpuParquetScan.scala:616)
        at com.nvidia.spark.rapids.GpuParquetMultiFilePartitionReaderFactory.$anonfun$buildBaseColumnarReaderForCoalescing$1(GpuParquetScan.scala:992)
        at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
        at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
        at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
        at scala.collection.TraversableLike.map(TraversableLike.scala:286)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
        at com.nvidia.spark.rapids.GpuParquetMultiFilePartitionReaderFactory.buildBaseColumnarReaderForCoalescing(GpuParquetScan.scala:990)
        at com.nvidia.spark.rapids.MultiFilePartitionReaderFactoryBase.createColumnarReader(GpuMultiFileReader.scala:208)
        at com.nvidia.spark.rapids.shims.GpuDataSourceRDD.compute(GpuDataSourceRDD.scala:49)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.parquet.crypto.ParquetCryptoRuntimeException: Trying to read file with encrypted footer. No keys available
        at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:588)
        at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:527)
        at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:521)
        at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:469)
        at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$readAndSimpleFilterFooter$1(GpuParquetScan.scala:605)
        at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
        at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
        at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.withResource(GpuParquetScan.scala:467)
        at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.readAndSimpleFilterFooter(GpuParquetScan.scala:603)
        at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$filterBlocks$1(GpuParquetScan.scala:652)
        ... 34 more

22/07/13 12:12:25 ERROR TaskSetManager: Task 0 in stage 47.0 failed 1 times; aborting job
22/07/13 12:12:25 WARN TaskSetManager: Lost task 1.0 in stage 47.0 (TID 71) (172.17.0.2 executor 0): TaskKilled (Stage cancelled)
PASSED [100%]
============== 12 passed, 14515 deselected, 9 warnings in 40.92s ===============

jenkins/spark-tests.sh Outdated Show resolved Hide resolved
jenkins/spark-tests.sh Show resolved Hide resolved
jenkins/spark-tests.sh Show resolved Hide resolved
@NvTimLiu
Copy link
Collaborator Author

build

@NvTimLiu NvTimLiu merged commit d84b518 into NVIDIA:branch-22.08 Jul 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add jenkins/integration test support for parquet-hadoop tests jar for testing parquet encryption
2 participants