Set `spark.executor.cores` for integration tests. #9177

mythrocks · 2023-09-01T20:52:58Z

Fixes #9135. (By workaround.)

This change sets spark.executor.cores to 10, if it is unset. This allows integration tests to work around the failure seen in parquet_test.py:test_small_file_memory, where the COALESCING Parquet reader's thread pool accidentally uses 128 threads with 8MB memory each, thus consuming the entire heap.

Note that this is a bit of a workaround. A more robust solution would be to scale the Parquet reader's buffers based on the amount of available memory, and the number of threads.

Fixes NVIDIA#9135. (By workaround.) This change sets `spark.executor.cores` to `10`, if it is unset. This allows integration tests to work around the failure seen in `parquet_test.py:test_small_file_memory`, where the `COALESCING` Parquet reader's thread pool accidentally uses 128 threads with 8MB memory each, thus consuming the entire heap. Note that this is a bit of a workaround. A more robust solution would be to scale the Parquet reader's buffers based on the amount of available memory, and the number of threads. Signed-off-by: MithunR <mythrocks@gmail.com>

mythrocks · 2023-09-01T21:13:18Z

Build

integration_tests/run_pyspark_from_build.sh

mythrocks · 2023-09-05T20:07:05Z

Build

revans2 · 2023-09-05T20:26:00Z

integration_tests/run_pyspark_from_build.sh

+
+    # Set per-executor cores, if unspecified.
+    # This prevents per-thread allocations (like Parquet read buffers) from overwhelming the heap.
+    export PYSP_TEST_spark_executor_cores=${PYSP_TEST_spark_executor_cores:-'10'}


why 10? We have a few other places where we try to configure things for local mode already why are the number of executor cores out of sync with the LOCAL_PARALLEL or NUM_LOCAL_EXECS?

spark-rapids/integration_tests/run_pyspark_from_build.sh

Line 297 in 09c2b33

LOCAL_PARALLEL=$(( $CPU_CORES > 4 ? 4 : $CPU_CORES ))

On a side note are databricks tests being run in local mode and configured badly? In a regular databricks cluster will we also run into this type of a problem? If so this workaround feels very much like it is going in the wrong direction, like we need to really fix the underlying problem ASAP instead of trying to work around it.

I did see that. I figured we might consider lower parallelism for local-mode than cluster-mode, and that a more appropriate number might be suggested in the review. I have verified that this works with 4.

mythrocks · 2023-09-05T20:36:49Z

we need to really fix the underlying problem ASAP instead of trying to work around it.

This was an attempt to get a clean build on CDH, as quickly as possible. But I'm supportive of closing this in favour of a proper fix.

mythrocks mentioned this pull request Sep 1, 2023

[BUG] GC/OOM on parquet_test.py::test_small_file_memory #9135

Open

gerashegalov reviewed Sep 1, 2023

View reviewed changes

integration_tests/run_pyspark_from_build.sh Outdated Show resolved Hide resolved

Set spark.executor.cores for parallel tests as well.

09c2b33

mythrocks requested a review from gerashegalov September 5, 2023 20:06

mythrocks self-assigned this Sep 5, 2023

revans2 reviewed Sep 5, 2023

View reviewed changes

mythrocks closed this Sep 5, 2023

mythrocks mentioned this pull request Aug 4, 2024

[BUG] Reconsider 8MB copy buffer size in GpuParquetScan #9269

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set `spark.executor.cores` for integration tests. #9177

Set `spark.executor.cores` for integration tests. #9177

mythrocks commented Sep 1, 2023

mythrocks commented Sep 1, 2023

mythrocks commented Sep 5, 2023

revans2 Sep 5, 2023

mythrocks Sep 5, 2023

mythrocks commented Sep 5, 2023

Set spark.executor.cores for integration tests. #9177

Set spark.executor.cores for integration tests. #9177

Conversation

mythrocks commented Sep 1, 2023

mythrocks commented Sep 1, 2023

mythrocks commented Sep 5, 2023

revans2 Sep 5, 2023

Choose a reason for hiding this comment

mythrocks Sep 5, 2023

Choose a reason for hiding this comment

mythrocks commented Sep 5, 2023

Set `spark.executor.cores` for integration tests. #9177

Set `spark.executor.cores` for integration tests. #9177