Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Parquet reader segfaults loading file with nested type as map key #7229

Closed
jlowe opened this issue Jan 27, 2021 · 0 comments · Fixed by #7248
Closed

[BUG] Parquet reader segfaults loading file with nested type as map key #7229

jlowe opened this issue Jan 27, 2021 · 0 comments · Fixed by #7248
Assignees
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS

Comments

@jlowe
Copy link
Member

jlowe commented Jan 27, 2021

Describe the bug
libcudf crashed with a segfault while trying to load a Parquet file containing a column with a Map type. This file is a bit interesting since the Map key is a nested type instead of a primitive type. I verified this file can be loaded successfully in Apache Spark.

The crash occurs when trying to specify the column name to load from this file. Without specifying the name of any column to load the crash does not occur.

Steps/Code to reproduce bug
A sample Parquet file is attached that can reproduce the crash.

maptest.parquet.zip

Unzip the archive and use the following test code to reproduce the crash.

#include <cudf/io/parquet.hpp>
#include <cudf/table/table.hpp>

int main() {
  auto path = std::string("maptest.parquet");
  auto opts = cudf::io::parquet_reader_options::builder(cudf::io::source_info(path))
    // NOTE: without specifying the column name it does not crash
    .columns({"value"})
    .build();
  auto result = cudf::io::read_parquet(opts);
  return 0;
}

Expected behavior
libcudf methods should not segfault with valid inputs.

@jlowe jlowe added bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue Spark Functionality that helps Spark RAPIDS labels Jan 27, 2021
@devavret devavret self-assigned this Jan 28, 2021
@rapids-bot rapids-bot bot closed this as completed in #7248 Feb 1, 2021
rapids-bot bot pushed a commit that referenced this issue Feb 1, 2021
Only top level columns can be selected by name

Fixes #7229

Authors:
  - Devavret Makkar (@devavret)

Approvers:
  - Karthikeyan (@karthikeyann)
  - Vukasin Milovanovic (@vuule)
  - @nvdbaranec
  - Keith Kraus (@kkraus14)

URL: #7248
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants