-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set spark.executor.cores
for integration tests.
#9177
Set spark.executor.cores
for integration tests.
#9177
Conversation
Fixes NVIDIA#9135. (By workaround.) This change sets `spark.executor.cores` to `10`, if it is unset. This allows integration tests to work around the failure seen in `parquet_test.py:test_small_file_memory`, where the `COALESCING` Parquet reader's thread pool accidentally uses 128 threads with 8MB memory each, thus consuming the entire heap. Note that this is a bit of a workaround. A more robust solution would be to scale the Parquet reader's buffers based on the amount of available memory, and the number of threads. Signed-off-by: MithunR <mythrocks@gmail.com>
Build |
Build |
|
||
# Set per-executor cores, if unspecified. | ||
# This prevents per-thread allocations (like Parquet read buffers) from overwhelming the heap. | ||
export PYSP_TEST_spark_executor_cores=${PYSP_TEST_spark_executor_cores:-'10'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 10? We have a few other places where we try to configure things for local mode already why are the number of executor cores out of sync with the LOCAL_PARALLEL or NUM_LOCAL_EXECS?
LOCAL_PARALLEL=$(( $CPU_CORES > 4 ? 4 : $CPU_CORES )) |
On a side note are databricks tests being run in local mode and configured badly? In a regular databricks cluster will we also run into this type of a problem? If so this workaround feels very much like it is going in the wrong direction, like we need to really fix the underlying problem ASAP instead of trying to work around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did see that. I figured we might consider lower parallelism for local-mode than cluster-mode, and that a more appropriate number might be suggested in the review. I have verified that this works with 4
.
This was an attempt to get a clean build on CDH, as quickly as possible. But I'm supportive of closing this in favour of a proper fix. |
Fixes #9135. (By workaround.)
This change sets
spark.executor.cores
to10
, if it is unset. This allows integration tests to work around the failure seen inparquet_test.py:test_small_file_memory
, where theCOALESCING
Parquet reader's thread pool accidentally uses 128 threads with 8MB memory each, thus consuming the entire heap.Note that this is a bit of a workaround. A more robust solution would be to scale the Parquet reader's buffers based on the amount of available memory, and the number of threads.