[BUG] ThreadPool size is deduced incorrectly in MultiFileReaderThreadPool on YARN clusters #9271

mythrocks · 2023-09-19T22:42:49Z

On YARN clusters, when a Spark cluster is spun up without explicitly setting spark.executor.cores, it is seen that the MultiFileReaderThreadPool is initialized to use all cores on the executor host. This results in the per-thread allocations overwhelming the executor's memory allocation, and queries failing. (One example was observed here: #9135.)

Part of the per-thread allocation problem will be tackled on this bug: #9269. But the core issue seems to be in RapidsPluginUtils::estimateCoresOnExec():

  def estimateCoresOnExec(conf: SparkConf): Int = {
    conf.getOption(RapidsPluginUtils.EXECUTOR_CORES_KEY)
        .map(_.toInt)
        .getOrElse(Runtime.getRuntime.availableProcessors)
  }

On YARN setups (and in local mode), one sees that this code falls back to Runtime.getRuntime.availableProcessors instead of using the default value for EXECUTOR_CORES_KEY. It appears that conf.getOption(RapidsPluginUtils.EXECUTOR_CORES_KEY) does not return the default, if unset.

The right thing to do might be to fetch the option via a ConfigEntry object, instead of using the conf key string.

The text was updated successfully, but these errors were encountered:

mythrocks added bug Something isn't working ? - Needs Triage Need team to review and classify labels Sep 19, 2023

mythrocks mentioned this issue Sep 19, 2023

[BUG] GC/OOM on parquet_test.py::test_small_file_memory #9135

Open

sameerz removed the ? - Needs Triage Need team to review and classify label Sep 26, 2023

sameerz assigned tgravescs Sep 26, 2023

tgravescs mentioned this issue Oct 4, 2023

Change the executor core calculation to take into account the cluster manager #9381

Merged

sameerz closed this as completed in #9381 Oct 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] ThreadPool size is deduced incorrectly in MultiFileReaderThreadPool on YARN clusters #9271

[BUG] ThreadPool size is deduced incorrectly in MultiFileReaderThreadPool on YARN clusters #9271

mythrocks commented Sep 19, 2023 •

edited

Loading

[BUG] ThreadPool size is deduced incorrectly in MultiFileReaderThreadPool on YARN clusters #9271

[BUG] ThreadPool size is deduced incorrectly in MultiFileReaderThreadPool on YARN clusters #9271

Comments

mythrocks commented Sep 19, 2023 • edited Loading

mythrocks commented Sep 19, 2023 •

edited

Loading