Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Memcheck error cudf::io::parquet::detail::<unnamed>::gpuComputeStringPageBounds<unsigned char>() #14633

Closed
davidwendt opened this issue Dec 14, 2023 · 1 comment · Fixed by #14637
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.

Comments

@davidwendt
Copy link
Contributor

Describe the bug
Nightly memcheck build is reporting out-of-bounds read error in cudf::io::parquet::detail::<unnamed>::gpuComputeStringPageBounds<unsigned char>() when running gtest ParquetChunkedReaderTest.TestChunkedReadWithString

[ RUN      ] ParquetChunkedReaderTest.TestChunkedReadWithString
========= Invalid __global__ read of size 8 bytes
=========     at 0x3a50 in void cudf::io::parquet::detail::<unnamed>::gpuComputeStringPageBounds<unsigned char>(cudf::io::parquet::detail::PageInfo *, cudf::device_span<const cudf::io::parquet::detail::ColumnChunkDesc, (unsigned long)18446744073709551615>, unsigned long, unsigned long)
=========     by thread (0,0,0) in block (5,0,0)
=========     Address 0x7f04727bcc08 is out of bounds
=========     and is 489 bytes after the nearest allocation at 0x7f04727bca00 of size 32 bytes
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x3344e0]
=========                in /usr/lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame: [0x1488c]
=========                in /conda/envs/rapids/lib/libcudart.so.11.0
=========     Host Frame:cudaLaunchKernel [0x6c318]
=========                in /conda/envs/rapids/lib/libcudart.so.11.0
=========     Host Frame:cudf::io::parquet::detail::ComputePageStringSizes(cudf::detail::hostdevice_vector<cudf::io::parquet::detail::PageInfo>&, cudf::detail::hostdevice_vector<cudf::io::parquet::detail::ColumnChunkDesc> const&, rmm::device_uvector<unsigned char>&, unsigned long, unsigned long, int, unsigned int, rmm::cuda_stream_view) [0x14fcd2d]
=========                in /conda/envs/rapids/lib/libcudf.so
=========     Host Frame:cudf::io::parquet::detail::reader::impl::decode_page_data(unsigned long, unsigned long) [0x151fe2a]
=========                in /conda/envs/rapids/lib/libcudf.so
=========     Host Frame:cudf::io::parquet::detail::reader::impl::read_chunk_internal(bool, std::optional<std::reference_wrapper<cudf::ast::expression const> >) [0x15234d8]
=========                in /conda/envs/rapids/lib/libcudf.so
=========     Host Frame:cudf::io::parquet::detail::reader::impl::read_chunk() [0x1524292]
=========                in /conda/envs/rapids/lib/libcudf.so
=========     Host Frame:cudf::io::parquet::detail::chunked_reader::read_chunk() const [0x1513303]
=========                in /conda/envs/rapids/lib/libcudf.so
=========     Host Frame:cudf::io::chunked_parquet_reader::read_chunk() const [0x1400f06]
=========                in /conda/envs/rapids/lib/libcudf.so
=========     Host Frame:(anonymous namespace)::chunked_read(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, unsigned long) [clone .constprop.0] [0x2d1545]
=========                in /cudf/cpp/build/gtests/PARQUET_TEST
=========     Host Frame:ParquetChunkedReaderTest_TestChunkedReadWithString_Test::TestBody() [0x2d6311]
=========                in /cudf/cpp/build/gtests/PARQUET_TEST

Steps/Code to reproduce bug
The following can be used to reproduce the error.

compute-sanitizer --tool memcheck gtests/PARQUET_TEST --gtest_filter=ParquetChunkedReaderTest.TestChunkedReadWithString --rmm_mode=cuda

This error may have been introduced sometime in 23.12 but was masked by the memcheck failures due to #14440 which was only recently resolved in 24.02.
@etseidl @nvdbaranec @vuule

@davidwendt davidwendt added bug Something isn't working Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue labels Dec 14, 2023
@etseidl
Copy link
Contributor

etseidl commented Dec 14, 2023

Thanks @davidwendt. On it...

rapids-bot bot pushed a commit that referenced this issue Dec 19, 2023
Fixes #14633

When reading files in multiple passes, some pointer fields in `ColumnChunkDesc` that point to transient memory are not cleared out at the end of each pass. This can lead to trying to dereference deallocated memory during Parquet reader string preprocessing.

Authors:
  - Ed Seidl (https://github.com/etseidl)
  - Nghia Truong (https://github.com/ttnghia)
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: #14637
abellina pushed a commit to abellina/cudf that referenced this issue Dec 28, 2023
Fixes rapidsai#14633

When reading files in multiple passes, some pointer fields in `ColumnChunkDesc` that point to transient memory are not cleared out at the end of each pass. This can lead to trying to dereference deallocated memory during Parquet reader string preprocessing.

Authors:
  - Ed Seidl (https://github.com/etseidl)
  - Nghia Truong (https://github.com/ttnghia)
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: rapidsai#14637
abellina pushed a commit to abellina/cudf that referenced this issue Jan 17, 2024
Fixes rapidsai#14633

When reading files in multiple passes, some pointer fields in `ColumnChunkDesc` that point to transient memory are not cleared out at the end of each pass. This can lead to trying to dereference deallocated memory during Parquet reader string preprocessing.

Authors:
  - Ed Seidl (https://github.com/etseidl)
  - Nghia Truong (https://github.com/ttnghia)
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: rapidsai#14637
@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants