Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Parquet coalesce file reader for local filesystems (NVIDIA#990)
* Add back the small file consolidation for the parquet reader for non-cloud environments Signed-off-by: Thomas Graves <tgraves@apache.org> * make resolveURI local Signed-off-by: Thomas Graves <tgraves@apache.org> * debug * fix debug * Cleanup * rework names * Fix bug in footer psoition * Add input file transition logic back and update tests Signed-off-by: Thomas Graves <tgraves@nvidia.com> * Update configs so can control multi file optmization, multi file read, and coalesce reader * remove debug Signed-off-by: Thomas Graves <tgraves@nvidia.com> * Update tests for 3 parquet readers and small bug fix * Update logging * test fixes * various fixes * Update configs and fix parametsr to GpuParquetScan Signed-off-by: Thomas Graves <tgraves@apache.org> * remove unneeded function dbshim Signed-off-by: Thomas Graves <tgraves@nvidia.com> * remove debug log and update configs Signed-off-by: Thomas Graves <tgraves@nvidia.com> * cleanup and debug * Update configs.md * cleanup Signed-off-by: Thomas Graves <tgraves@nvidia.com> * create a common function for getting small file opts for fileSourceScan * Fix extra line and update config text * Update text * change to use close on exception Signed-off-by: Thomas Graves <tgraves@apache.org> * update configs doc * Fix missing imports * Fix import order Signed-off-by: Thomas Graves <tgraves@nvidia.com> * Rework the parquet multi-file configs to have a single configuration and change the way they are passed around for the InputFileName Signed-off-by: Thomas Graves <tgraves@apache.org> * make rapidsConf transient Signed-off-by: Thomas Graves <tgraves@apache.org> * fix typo Signed-off-by: Thomas Graves <tgraves@apache.org> * forward rapidsconf Signed-off-by: Thomas Graves <tgraves@apache.org> * update test and fix missed config check * Add log statement for original per file reader * Update text and fix test * add space Signed-off-by: Thomas Graves <tgraves@apache.org> * update config.md Signed-off-by: Thomas Graves <tgraves@nvidia.com> * Fix parameter to spark 3.1.0 parquet san * Update sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetScan.scala Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Fix scalastyle line length Signed-off-by: Thomas Graves <tgraves@nvidia.com> * Update docs and change tests to copy reader confs * Update GpuColumnVector.from call to handle MapTypes Signed-off-by: Thomas Graves <tgraves@nvidia.com> Co-authored-by: Jason Lowe <jlowe@nvidia.com>
- Loading branch information