Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix out of bound write handling for get_json_object_multiple_paths #2285

Merged
merged 5 commits into from
Aug 1, 2024

Conversation

ttnghia
Copy link
Collaborator

@ttnghia ttnghia commented Jul 31, 2024

This fixes the code path for handling out-of-bound write when there are more than one JSON paths in get_json_object_multiple_paths. Currently there are two issues with it:

  • The kernel launcher does not allow running the kernel again with different number of paths, and
  • The indices of the out-of-bound write buffers are not correctly identified.

A unit test is also added to check for this particular code path.

Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
@ttnghia ttnghia added the bug Something isn't working label Jul 31, 2024
@ttnghia ttnghia requested review from revans2 and res-life July 31, 2024 17:35
@ttnghia ttnghia self-assigned this Jul 31, 2024
@ttnghia
Copy link
Collaborator Author

ttnghia commented Jul 31, 2024

build

@ttnghia ttnghia changed the title Fix out of bound write for get_json_object_multiple_paths Fix out of bound write handling for get_json_object_multiple_paths Jul 31, 2024
Comment on lines +1107 to +1114
for (std::size_t idx = 0; idx < oob_indices.size(); ++idx) {
auto const out_idx = oob_indices[idx];
output[out_idx] =
cudf::make_strings_column(input.size(),
std::move(out_offsets_and_sizes[idx].first),
out_char_buffers[idx].release(),
out_null_masks_and_null_counts[idx].second,
std::move(out_null_masks_and_null_counts[idx].first));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indices out_idx and idx should be different. Previously they were the same. That is incorrect.

@@ -1112,7 +1093,7 @@ std::vector<std::unique_ptr<cudf::column>> get_json_object(
h_path_data, stream, rmm::mr::get_current_device_resource());
thrust::uninitialized_fill(
rmm::exec_policy(stream), d_has_out_of_bound.begin(), d_has_out_of_bound.end(), 0);
kernel.exec(*d_input_ptr, d_path_data, stream);
kernel_launcher::exec(*d_input_ptr, d_path_data, stream);
Copy link
Collaborator Author

@ttnghia ttnghia Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When launching the kernel a second time, the number of paths is different from that number in launching the first time. Thus, the kernel launch parameters are different.

@revans2
Copy link
Collaborator

revans2 commented Aug 1, 2024

I ran through all of the JSON tests and redid the performance testing I did before. Everything passed as expected, because this is just here for a very odd corner case that is not going to ever be needed in practice.

@revans2 revans2 merged commit 4faa6cf into NVIDIA:branch-24.08 Aug 1, 2024
4 checks passed
@ttnghia ttnghia deleted the fix_oob_handling branch August 1, 2024 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants