Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intentionally leak static CUDA resources to avoid crash (part 2) #462

Conversation

kingcrimsontianyu
Copy link
Contributor

@kingcrimsontianyu kingcrimsontianyu commented Sep 10, 2024

The NVbench application PARQUET_READER_NVBENCH in libcudf currently crashes with the segmentation fault. To reproduce:

./PARQUET_READER_NVBENCH -d 0 -b 1 --run-once -a io_type=FILEPATH -a compression_type=SNAPPY -a cardinality=0 -a run_length=1

The root cause is that some (1) thread_local objects on the main thread in libcudf and (2) static objects in kvikio are destroyed after cudaDeviceReset() in NVbench and upon program termination. These objects should simply be leaked, since their destructors making CUDA calls upon program termination constitutes UB in CUDA.

This simple PR is the kvikIO side of the fix. The other part is done here rapidsai/cudf#16787.

@kingcrimsontianyu kingcrimsontianyu marked this pull request as ready for review September 10, 2024 20:44
@kingcrimsontianyu kingcrimsontianyu requested a review from a team as a code owner September 10, 2024 20:44
@kingcrimsontianyu kingcrimsontianyu changed the title Intentionally leak static CUDA resources to avoid crash Intentionally leak static CUDA resources to avoid crash (part 2) Sep 11, 2024
Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kingcrimsontianyu looks good. I only have one suggestion

cpp/include/kvikio/posix_io.hpp Show resolved Hide resolved
Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madsbk
Copy link
Member

madsbk commented Sep 12, 2024

/merge

@madsbk madsbk added bug Something isn't working non-breaking Introduces a non-breaking change labels Sep 12, 2024
@rapids-bot rapids-bot bot merged commit 9d352ef into rapidsai:branch-24.10 Sep 12, 2024
56 checks passed
rapids-bot bot pushed a commit to rapidsai/cudf that referenced this pull request Sep 19, 2024
#16787)

The NVbench application `PARQUET_READER_NVBENCH` in libcudf currently crashes with the segmentation fault. To reproduce:

```
./PARQUET_READER_NVBENCH -d 0 -b 1 --run-once -a io_type=FILEPATH -a compression_type=SNAPPY -a cardinality=0 -a run_length=1
```
 
The root cause is that some (1) `thread_local`  objects on the main thread in `libcudf` and (2) `static` objects in `kvikio` are destroyed after `cudaDeviceReset()` in NVbench and upon program termination. These objects should simply be leaked, since their destructors making CUDA calls upon program termination constitutes UB in CUDA.

This simple PR is the cuDF side of the fix. The other part is done here rapidsai/kvikio#462.

closes #13229

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Nghia Truong (https://github.com/ttnghia)

URL: #16787
rjzamora pushed a commit to rjzamora/cudf that referenced this pull request Sep 24, 2024
rapidsai#16787)

The NVbench application `PARQUET_READER_NVBENCH` in libcudf currently crashes with the segmentation fault. To reproduce:

```
./PARQUET_READER_NVBENCH -d 0 -b 1 --run-once -a io_type=FILEPATH -a compression_type=SNAPPY -a cardinality=0 -a run_length=1
```
 
The root cause is that some (1) `thread_local`  objects on the main thread in `libcudf` and (2) `static` objects in `kvikio` are destroyed after `cudaDeviceReset()` in NVbench and upon program termination. These objects should simply be leaked, since their destructors making CUDA calls upon program termination constitutes UB in CUDA.

This simple PR is the cuDF side of the fix. The other part is done here rapidsai/kvikio#462.

closes rapidsai#13229

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Nghia Truong (https://github.com/ttnghia)

URL: rapidsai#16787
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants