Skip to content

Commit

Permalink
Fix race in ORC string dictionary creation (#13214)
Browse files Browse the repository at this point in the history
Unfortunately this is really hard to reproduce.  For whatever reason I had to try and reproduce this on a relatively small data set with at least 140,001 rows or more, where one column is a LIST<STRING> but all of the lists are empty lists and another column is a STRUCT column with two STRING child columns where all of the STRINGS are empty.  I also had to sort and partition the data before doing the write, and it had to be in a very specific environment with T4 GPUs.  I don't know why all of those were needed to make the race happen regularly, but it did.

Because of this complexity in reproducing it I have not added in any unit tests.

The problem was essentially a race when trying to calculate dictionary duplication for strings in ORC.  As a part of this a function `LoadNonNullIndices` was being called that was supposed to set a value `nnz` in a shared memory location `s`.  In the normal case a loop was taken where `__syncthreads()` was  called, but if there were no rows in the column (the LIST<STRING> column) then the loop was not taken and it was a race to see if `nnz` which was set to 0 by thread 0 showed up in all of the threads or not.

What made this crash is that this `nnz` value is used to determine what happens in the rest of the kernel to see if it reads data, or writes to temp memory (which is not allocated if previous processing shows that there is no need for it), or any of that. If `nnz` is non-zero then it tries to do all of those things and bad stuff starts to happen.

Authors:
  - Robert (Bobby) Evans (https://github.com/revans2)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: #13214
  • Loading branch information
revans2 authored Apr 25, 2023
1 parent 73423f8 commit 086726c
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion cpp/src/io/orc/dict_enc.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2022, NVIDIA CORPORATION.
* Copyright (c) 2019-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -72,6 +72,10 @@ static __device__ void LoadNonNullIndices(volatile dictinit_state_s* s,
Storage& temp_storage)
{
if (t == 0) { s->nnz = 0; }
if (s->chunk.num_rows <= 0) {
// A sync is needed for s->nnz if there are no times through the loop
__syncthreads();
}
for (uint32_t i = 0; i < s->chunk.num_rows; i += block_size) {
const uint32_t* valid_map = s->chunk.leaf_column->null_mask();
auto column_offset = s->chunk.leaf_column->offset();
Expand Down

0 comments on commit 086726c

Please sign in to comment.