Invalid result from DeviceSegmentedSort::SortPairs/SortKeys when keys are bool type #594

davidwendt · 2022-11-21T18:40:09Z

If the keys parameter is type bool then DeviceSegmentedSort::SortPairs and DeviceSegmentedSort::SortKeys return invalid results. The following example reproduces the error consistently. The error occurs with SortKeys as well but is easier to see with SortPairs

#include <thrust/device_vector.h>
#include <cub/cub.cuh>
#include <cuda_runtime.h>
#include <vector>
#include <iostream>

template <typename T>
std::vector<int> segmented_sort(bool const *h_data, size_t input_size,
                                int const *h_segs, size_t segs_size)
{
  T *d_data;
  cudaMalloc(&d_data, input_size * sizeof(bool));
  cudaMemcpy(d_data, h_data, input_size * sizeof(bool), cudaMemcpyHostToDevice);

  thrust::device_vector<int> d_segs(segs_size);
  cudaMemcpy(d_segs.data().get(), h_segs, segs_size * sizeof(int), cudaMemcpyHostToDevice);

  T *d_output;
  cudaMalloc(&d_output, input_size * sizeof(bool));

  thrust::device_vector<int> d_indices(input_size);
  thrust::sequence(thrust::device, d_indices.begin(), d_indices.end(), 0);

  void *d_temp_storage = nullptr;
  size_t temp_storage_bytes = 0;
  cub::DeviceSegmentedSort::SortPairs(d_temp_storage, temp_storage_bytes,
                                      d_data, d_output,
                                      d_indices.data().get(), d_indices.data().get(),
                                      input_size, d_segs.size() - 1,
                                      d_segs.begin(), d_segs.begin() + 1);

  cudaMalloc(&d_temp_storage, temp_storage_bytes);

  cub::DeviceSegmentedSort::SortPairs(d_temp_storage, temp_storage_bytes,
                                      d_data, d_output,
                                      d_indices.data().get(), d_indices.data().get(),
                                      input_size, d_segs.size() - 1,
                                      d_segs.begin(), d_segs.begin() + 1);

  cudaFree(d_temp_storage);
  cudaFree(d_data);
  cudaFree(d_output);

  std::vector<int> h_indices(d_indices.size());
  cudaMemcpy(h_indices.data(), d_indices.data().get(), d_indices.size() * sizeof(int), cudaMemcpyDeviceToHost);
  return h_indices;
}

int main()
{
  bool h_data[] = {true, false, false, true, true,
                   true, true, true, true, true,
                   true, true, true, false, false,
                   false, false, true, false, false,
                   true, true, true, true, true,
                   true, true, false, true, false,
                   true, true, true, true, true, true, false, true, false, false};
  auto input_size = sizeof(h_data) / sizeof(h_data[0]);
  std::vector<int> h_segs({0, 5, 10, 15, 20, 25, 30, 40});

  auto result = segmented_sort<bool>(h_data, input_size, h_segs.data(), h_segs.size());

  std::vector<int> h_valid(
      { 1,  2,  0,  3,  4,  5,  6,  7,  8,  9,
       13, 14, 10, 11, 12, 15, 16, 18, 19, 17,
       20, 21, 22, 23, 24, 27, 29, 25, 26, 28,
       36, 38, 39, 30, 31, 32, 33, 34, 35, 37});

  for (size_t i = 0; i < result.size(); ++i)  {
    if (result[i] != h_valid[i]) {
      std::cout << "error at position " << i << ": " << result[i] << "!=" << h_valid[i] << "\n";
      break;
    }
  }
}

The error consistently occurs at position 33 in the indices/result vector. The value itself is random. The values after position 33 are also incorrect but appear to be just shifted -- result[34] should be in result[33], result[35] should be in result[34], etc

The error appears to be caused by the following line:

cub/cub/agent/agent_sub_warp_merge_sort.cuh

Line 239 in 5ae7439

KeyT oob_default = reinterpret_cast<KeyT &>(default_key_bits);

The oob_default value is xFF and is neither true nor false and so the sort algorithm is unable process it.
Note that true fails to consistently compare against def in this example: https://godbolt.org/z/f8bcjaMhW

The text was updated successfully, but these errors were encountered:

ttnghia · 2022-11-21T19:53:42Z

CC myself to subscibe the issue.

gevtushenko · 2022-11-21T21:24:06Z

@davidwendt thank you for reporting the issue! I can reproduce it. We haven't tested bool type for keys. I guess the issue can be addressed if we do something like:

return Traits<KeyT>::TwiddleIn(lhs) > Traits<KeyT>::TwiddleIn(rhs);

in the AgentSubWarpSort::BinaryOpT. I'll create a PR with the fix and additional tests soon. Meanwhile you could reinterpret cast your bool* to be std::uint8_t*, or use this type in the first place.

davidwendt · 2022-11-21T21:27:10Z

I made this change to the oob_default declaration that seemed to work

KeyT oob_default = std::is_same_v<KeyT, bool> ? !IS_DESCENDING : reinterpret_cast<KeyT &>(default_key_bits);

Perhaps Traits<bool> can be specialized and set with LOWEST_KEY=false and MAX_KEY=true

Fix in CUB DeviceSegmentedSort allows for workaround to removed. The CUB fix is applied as a patch in the libcudf build process. Reference NVIDIA/cub#594 and #12217 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Elias Stehle (https://github.com/elstehle) - Robert Maynard (https://github.com/robertmaynard) URL: #12234

davidwendt mentioned this issue Nov 21, 2022

[BUG] Illegal memory access while sorting lists rapidsai/cudf#12201

Closed

mythrocks mentioned this issue Nov 21, 2022

[RELEASE] cudf v22.12 rapidsai/cudf#12200

Merged

gevtushenko added type: bug: functional Does not work as intended. P0: must have Absolutely necessary. Critical issue, major blocker, etc. labels Nov 21, 2022

gevtushenko added this to the 2.1.0 milestone Nov 21, 2022

davidwendt mentioned this issue Nov 21, 2022

Workaround for CUB segmented-sort bug with boolean keys rapidsai/cudf#12217

Merged

3 tasks

gevtushenko mentioned this issue Nov 22, 2022

Fix seg sort for bool keys #595

Merged

gevtushenko linked a pull request Nov 22, 2022 that will close this issue

Fix seg sort for bool keys #595

Merged

gevtushenko closed this as completed in #595 Nov 23, 2022

davidwendt mentioned this issue Nov 23, 2022

Patch CUB DeviceSegmentedSort and remove workaround rapidsai/cudf#12234

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid result from DeviceSegmentedSort::SortPairs/SortKeys when keys are bool type #594

Invalid result from DeviceSegmentedSort::SortPairs/SortKeys when keys are bool type #594

davidwendt commented Nov 21, 2022

ttnghia commented Nov 21, 2022

gevtushenko commented Nov 21, 2022

davidwendt commented Nov 21, 2022

Invalid result from DeviceSegmentedSort::SortPairs/SortKeys when keys are bool type #594

Invalid result from DeviceSegmentedSort::SortPairs/SortKeys when keys are bool type #594

Comments

davidwendt commented Nov 21, 2022

ttnghia commented Nov 21, 2022

gevtushenko commented Nov 21, 2022

davidwendt commented Nov 21, 2022