Skip to content

Commit

Permalink
Fix std::bad_alloc exception due to JIT reserving a huge buffer (#1…
Browse files Browse the repository at this point in the history
…0317)

In file `jit/cache.cpp`, a program cache always internally reserves a `std::unordered_map` using a size set by an environment variable `LIBCUDF_KERNEL_CACHE_LIMIT_PER_PROCESS`. If that environment variable does not exist, a default value (`std::numeric_limit<size_t>::max`) is used. Such default value is huge, leading to allocating a huge (impossible) size of memory chunk that crashes the system.

This PR changes that default value from `std::numeric_limit<size_t>::max` to `1024^2`. This is essentially a reverse of the PR #10312 but set the default value to `1024` instead of `100`.

Note that `1024^2` is just some random number, not based on any specific calculation.

Closes #10312 and closes #9362.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - Karthikeyan (https://github.com/karthikeyann)

URL: #10317
  • Loading branch information
ttnghia authored Feb 25, 2022
1 parent 3a1dbe8 commit df646b2
Showing 1 changed file with 17 additions and 28 deletions.
45 changes: 17 additions & 28 deletions cpp/src/jit/cache.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2021, NVIDIA CORPORATION.
* Copyright (c) 2019-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -104,13 +104,10 @@ std::string get_program_cache_dir()
#endif
}

void try_parse_numeric_env_var(std::size_t& result, char const* const env_name)
std::size_t try_parse_numeric_env_var(char const* const env_name, std::size_t default_val)
{
auto value = std::getenv(env_name);

if (value != nullptr) {
result = std::stoull(value); // fails if env var contains invalid value.
}
auto const value = std::getenv(env_name);
return value != nullptr ? std::stoull(value) : default_val;
}

jitify2::ProgramCache<>& get_program_cache(jitify2::PreprocessedProgramData preprog)
Expand All @@ -123,27 +120,19 @@ jitify2::ProgramCache<>& get_program_cache(jitify2::PreprocessedProgramData prep
auto existing_cache = caches.find(preprog.name());

if (existing_cache == caches.end()) {
std::size_t kernel_limit_proc = std::numeric_limits<std::size_t>::max();
std::size_t kernel_limit_disk = std::numeric_limits<std::size_t>::max();
try_parse_numeric_env_var(kernel_limit_proc, "LIBCUDF_KERNEL_CACHE_LIMIT_PER_PROCESS");
try_parse_numeric_env_var(kernel_limit_disk, "LIBCUDF_KERNEL_CACHE_LIMIT_DISK");

auto cache_dir = get_program_cache_dir();

if (kernel_limit_disk == 0) {
// if kernel_limit_disk is zero, jitify will assign it the value of kernel_limit_proc.
// to avoid this, we treat zero as "disable disk caching" by not providing the cache dir.
cache_dir = {};
}

auto res = caches.insert({preprog.name(),
std::make_unique<jitify2::ProgramCache<>>( //
kernel_limit_proc,
preprog,
nullptr,
cache_dir,
kernel_limit_disk)});

auto const kernel_limit_proc =
try_parse_numeric_env_var("LIBCUDF_KERNEL_CACHE_LIMIT_PER_PROCESS", 10'000);
auto const kernel_limit_disk =
try_parse_numeric_env_var("LIBCUDF_KERNEL_CACHE_LIMIT_DISK", 100'000);

// if kernel_limit_disk is zero, jitify will assign it the value of kernel_limit_proc.
// to avoid this, we treat zero as "disable disk caching" by not providing the cache dir.
auto const cache_dir = kernel_limit_disk == 0 ? std::string{} : get_program_cache_dir();

auto const res =
caches.insert({preprog.name(),
std::make_unique<jitify2::ProgramCache<>>(
kernel_limit_proc, preprog, nullptr, cache_dir, kernel_limit_disk)});
existing_cache = res.first;
}

Expand Down

0 comments on commit df646b2

Please sign in to comment.