[BUG] Reconsider 8MB copy buffer size in `GpuParquetScan` #9269

mythrocks · 2023-09-19T22:08:21Z

The GpuParquetScan has an 8MB copy buffer allocation per reader instance.

spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetScan.scala

Line 1391 in d7f58ea

    
           val copyBufferSize = conf.getInt("parquet.read.allocation.size", 8 * 1024 * 1024)

This might account for a significant portion of the executor's available memory, on setups with a high core count, and low heap allocation.

For instance, on machines with 128 cores, if the executor is set up with only 1GB of memory, then enabling the multi-file reader for Parquet could result in the entire 1GB of memory being consumed by the copy buffers across all threads. This leads to failures such as in #9135.

It would be good to evaluate whether this 8MB allocation can be lowered if possible. Or at least, accounted for.

The text was updated successfully, but these errors were encountered:

sameerz · 2024-07-23T20:37:52Z

Recommendation from within the team is to reduce the 8MB size to see if it will address #11227 .

There are alternative ideas here: https://aws.amazon.com/blogs/storage/optimizing-performance-of-apache-spark-workloads-on-amazon-s3/

Suggestion is to make a code change in copyDataRange to right size the intermediate buffer for data transfer ( see https://github.com/NVIDIA/spark-rapids/blob/6d65548e5716f02ae7fee564d5091e05a0ac219a/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetScan.scala#L1596C57-L1596C71 )

Any changes will require rerunning all performance tests.

gerashegalov · 2024-07-24T13:26:18Z

The default of 8 MiB might have been inspired by Performance Design Patterns for S3 https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance-design-patterns.html#optimizing-performance-parallelization

recommending it as a granularity for byte range requests.

mythrocks · 2024-08-04T10:23:51Z

Recommendation from within the team is to reduce the 8MB size to see if it will address #11227 .

Looking at #11227, one is reminded of a very similar failure (#9135) with Parquet-reader tests on CDH (in local mode). At the time, we had discussed either reducing the -Xmx or the thread-pool size.

That failure tended to happen in local mode. We had a PR to address the failure (#9177) by setting the thread-pool size. I suspect it might possibly address this OOM as well, depending on how many threads are running.

At the time, adjusting the core-count or memory was deemed a workaround, instead of fixing the reader. We might consider this again.

cc @razajafri, just in case this might be pertinent to #11227.

mythrocks added bug Something isn't working ? - Needs Triage Need team to review and classify labels Sep 19, 2023

This was referenced Sep 19, 2023

[BUG] ThreadPool size is deduced incorrectly in MultiFileReaderThreadPool on YARN clusters #9271

Closed

[BUG] GC/OOM on parquet_test.py::test_small_file_memory #9135

Open

mattahrens added performance A performance related task/issue and removed ? - Needs Triage Need team to review and classify bug Something isn't working labels Nov 7, 2023

sameerz mentioned this issue Jul 23, 2024

Reconsider Parquet read buffer size #11162

Closed

sameerz added the reliability Features to improve reliability or bugs that severly impact the reliability of the plugin label Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Reconsider 8MB copy buffer size in `GpuParquetScan` #9269

[BUG] Reconsider 8MB copy buffer size in `GpuParquetScan` #9269

mythrocks commented Sep 19, 2023 •

edited

Loading

sameerz commented Jul 23, 2024 •

edited

Loading

gerashegalov commented Jul 24, 2024

mythrocks commented Aug 4, 2024 •

edited

Loading

[BUG] Reconsider 8MB copy buffer size in GpuParquetScan #9269

[BUG] Reconsider 8MB copy buffer size in GpuParquetScan #9269

Comments

mythrocks commented Sep 19, 2023 • edited Loading

sameerz commented Jul 23, 2024 • edited Loading

gerashegalov commented Jul 24, 2024

mythrocks commented Aug 4, 2024 • edited Loading

[BUG] Reconsider 8MB copy buffer size in `GpuParquetScan` #9269

[BUG] Reconsider 8MB copy buffer size in `GpuParquetScan` #9269

mythrocks commented Sep 19, 2023 •

edited

Loading

sameerz commented Jul 23, 2024 •

edited

Loading

mythrocks commented Aug 4, 2024 •

edited

Loading