Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DX-24413, DX-24414 : Gandiva cache peformance improvements #15

Open
wants to merge 1 commit into
base: rel-460
Choose a base branch
from

Conversation

projjal
Copy link

@projjal projjal commented Aug 5, 2020

  • Prewarm the gandiva cache on gandiva startup time
  • Make the cache size configurable

@projjal projjal changed the title Prewarm the gandiva cache during loading time. Make the cache size co… DX-24413, DX-24414 : Gandiva cache peformance improvements Aug 5, 2020
@@ -42,8 +49,24 @@ class Cache {
}

private:
static int GetCapacity() {
int capacity;
const char* env_cache_size = std::getenv("GANDIVA_CACHE_SIZE");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cant we instead pass this as a configuration (that is in turn passed from a support option) ? that way we can support other customers too..

https://github.com/apache/arrow/blob/master/cpp/src/gandiva/configuration.h

std::ifstream fin;
fin.open(dir_iter->path().string(), std::ios::binary);
if (!fin.is_open()) {
std::cerr << "[Gandiva Cache Prewarm] Failure opening warmup cache file"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also mention which file..


int32_t schema_len;
if (remaining < sizeof(schema_len)) {
std::cerr << "[Gandiva Cache Prewarm] Invalid file." << std::endl;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

across the board - file name along with errors.

@@ -117,6 +128,9 @@ jint JNI_OnLoad(JavaVM* vm, void* reserved) {
env->GetFieldID(vector_expander_ret_class_, "address", "J");
vector_expander_ret_capacity_ =
env->GetFieldID(vector_expander_ret_class_, "capacity", "J");

PrewarmCache();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this going to block all queries until the cache is warmed? should we do this in the background instead for e.g. if we have a 100 expression list this is going to take upto a minute to prepare the cache during which no query can progress

}

fs::path path(env_path);
if (!fs::is_directory(path) && !fs::create_directories(path)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this thread safe? two threads trying to create the directory at the same time

@@ -660,6 +861,13 @@ JNIEXPORT jlong JNICALL Java_org_apache_arrow_gandiva_evaluator_JniWrapper_build
holder = std::shared_ptr<ProjectorHolder>(
new ProjectorHolder(schema_ptr, ret_types, std::move(projector)));
module_id = projector_modules_.Insert(holder);

if (!cache_hit) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so is the expectation, apple will get files from all executors? wont there be duplicates in that case..

also is the expectation that cache size be at-least as big to accommodate all expressions needed?

are new queries expected in this cluster - if yes and new a query comes in now we might evict one of these expressions. is that ok

stevelorddremio pushed a commit to stevelorddremio/arrow that referenced this pull request Apr 17, 2024
…ache#41152)

### Rationale for this change

An error is received installing R duckdb:

```
dremio#15 18.13 > remotes::install_github('duckdb/duckdb-r', build = FALSE)
dremio#15 18.27 Error: Failed to install 'unknown package' from **GitHub:**
dremio#15 18.27   Line starting 'Roxyg ...' is malformed!
```

Some searching seems to suggest that this is because R cannot process UTF-8 characters in DESCRIPTION files if the `LANG` is set to `C`.

### What changes are included in this PR?

The `LANG` is set to `C.UTF-8` in the dockerfile for this CI job

### Are these changes tested?

The change only affects a test

### Are there any user-facing changes?

No
* GitHub Issue: apache#41145

Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
lriggs pushed a commit to lriggs/arrow that referenced this pull request Apr 25, 2024
…ache#41152)

### Rationale for this change

An error is received installing R duckdb:

```
dremio#15 18.13 > remotes::install_github('duckdb/duckdb-r', build = FALSE)
dremio#15 18.27 Error: Failed to install 'unknown package' from **GitHub:**
dremio#15 18.27   Line starting 'Roxyg ...' is malformed!
```

Some searching seems to suggest that this is because R cannot process UTF-8 characters in DESCRIPTION files if the `LANG` is set to `C`.

### What changes are included in this PR?

The `LANG` is set to `C.UTF-8` in the dockerfile for this CI job

### Are these changes tested?

The change only affects a test

### Are there any user-facing changes?

No
* GitHub Issue: apache#41145

Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants