Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes Unsupported column type error due to empty list columns in Nested JSON reader #11897

Merged
merged 5 commits into from
Oct 13, 2022

Conversation

karthikeyann
Copy link
Contributor

Description

Fixes Unsupported column type error during cudf column creation in Nested JSON reader due to empty list column.

During json tree creation, Empty list column does not have device_json_column child because it does have any rows, or a type.
This PR fixes the issue by creating an empty column as element child column. The list column still retains the null, and empty list information.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@karthikeyann karthikeyann added bug Something isn't working 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. 4 - Needs Review Waiting for reviewer to review or respond 4 - Needs cuIO Reviewer non-breaking Non-breaking change labels Oct 11, 2022
@karthikeyann karthikeyann added this to the Nested JSON reader milestone Oct 11, 2022
@karthikeyann karthikeyann self-assigned this Oct 11, 2022
@karthikeyann karthikeyann requested review from a team as code owners October 11, 2022 08:35
@github-actions github-actions bot added the Python Affects Python cuDF API. label Oct 11, 2022
@codecov
Copy link

codecov bot commented Oct 11, 2022

Codecov Report

Base: 87.40% // Head: 88.11% // Increases project coverage by +0.70% 🎉

Coverage data is based on head (61ae5af) compared to base (f72c4ce).
Patch coverage: 85.21% of modified lines in pull request are covered.

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-22.12   #11897      +/-   ##
================================================
+ Coverage         87.40%   88.11%   +0.70%     
================================================
  Files               133      133              
  Lines             21833    21881      +48     
================================================
+ Hits              19084    19281     +197     
+ Misses             2749     2600     -149     
Impacted Files Coverage Δ
python/cudf/cudf/core/udf/__init__.py 97.05% <ø> (+47.05%) ⬆️
python/cudf/cudf/io/orc.py 92.94% <ø> (-0.09%) ⬇️
python/cudf/cudf/utils/ioutils.py 79.47% <ø> (ø)
...thon/dask_cudf/dask_cudf/tests/test_distributed.py 18.86% <ø> (+4.94%) ⬆️
python/cudf/cudf/core/_base_index.py 82.20% <43.75%> (-3.35%) ⬇️
python/cudf/cudf/io/text.py 91.66% <66.66%> (-8.34%) ⬇️
python/strings_udf/strings_udf/__init__.py 86.27% <76.00%> (-10.61%) ⬇️
python/cudf/cudf/core/index.py 92.91% <98.24%> (+0.28%) ⬆️
python/cudf/cudf/__init__.py 90.69% <100.00%> (ø)
python/cudf/cudf/core/column/categorical.py 89.34% <100.00%> (ø)
... and 12 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ looks good to me.

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small request, otherwise looks fine.

@@ -689,19 +689,24 @@ std::pair<std::unique_ptr<column>, std::vector<column_name_info>> device_json_co
size_type num_rows = json_col.child_offsets.size() - 1;
std::vector<column_name_info> column_names{};
column_names.emplace_back("offsets");
column_names.emplace_back(json_col.child_columns.begin()->first);
column_names.emplace_back(
json_col.child_columns.empty() ? "element" : json_col.child_columns.begin()->first);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this name is only ever seen internally, so it probably doesn't matter much, but I would still prefer using some extremely obvious placeholder name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving list_child_name as global constexpr.

cpp/src/io/json/nested_json_gpu.cu Outdated Show resolved Hide resolved
@karthikeyann
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 662f309 into rapidsai:branch-22.12 Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants