Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Sorted sum aggregations are not performed in the result type #10246

Closed
jlowe opened this issue Feb 8, 2022 · 6 comments · Fixed by #10250
Closed

[BUG] Sorted sum aggregations are not performed in the result type #10246

jlowe opened this issue Feb 8, 2022 · 6 comments · Fixed by #10250
Assignees
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS

Comments

@jlowe
Copy link
Member

jlowe commented Feb 8, 2022

Describe the bug
When libcudf performs a sum aggregation on INT32 inputs it produces an INT64 output. This works as expected when the aggregation is using a hash-based implementation, but when cudf decides to use a sort-based implementation the output type is indeed INT64 but apparently the sum is being performed in INT32 and only casted to INT64 at the end. This can result in overflows that would not have occurred if the sum operation was performed in INT64.

Steps/Code to reproduce bug
Apply the following patch and run GROUPBY_TEST. Note how the hash-based aggregation passes but the sort-based aggregation fails, producing a sum of 0 instead of -4294967296

diff --git a/cpp/tests/groupby/sum_tests.cpp b/cpp/tests/groupby/sum_tests.cpp
index 5947e309be..5b2fd3f1b2 100644
--- a/cpp/tests/groupby/sum_tests.cpp
+++ b/cpp/tests/groupby/sum_tests.cpp
@@ -36,6 +36,21 @@ using supported_types =
   cudf::test::Concat<cudf::test::Types<int8_t, int16_t, int32_t, int64_t, float, double>,
                      cudf::test::DurationTypes>;
 
+struct PromotionTest : public cudf::test::BaseFixture {
+};
+
+TEST_F(PromotionTest, Sum_Promotion)
+{
+  fixed_width_column_wrapper<int32_t> keys{0, 0};
+  fixed_width_column_wrapper<int32_t> vals{-2147483648, -2147483648};
+  fixed_width_column_wrapper<int32_t> expect_keys{0};
+  fixed_width_column_wrapper<int64_t> expect_vals{-4294967296L};
+  auto agg = make_sum_aggregation<groupby_aggregation>();
+  test_single_agg(keys, vals, expect_keys, expect_vals, std::move(agg));
+  auto agg2 = make_sum_aggregation<groupby_aggregation>();
+  test_single_agg(keys, vals, expect_keys, expect_vals, std::move(agg2), force_use_sort_impl::YES);
+}
+
 TYPED_TEST_SUITE(groupby_sum_test, supported_types);
 
 TYPED_TEST(groupby_sum_test, basic)

Expected behavior
Sum aggregations should be performed in the result type rather than the input type. In addition, the behavior of hash-based and sort-based aggregations for an individual output row should be equivalent from the caller's perspective for sorted inputs.

@jlowe jlowe added bug Something isn't working Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS labels Feb 8, 2022
@jrhemstad
Copy link
Contributor

The input values should be cast to the result type when doing the reduction, but they aren't:

auto inp_values = cudf::detail::make_counting_transform_iterator(
0, null_replaced_value_accessor{*d_values_ptr, init, values.has_nulls()});

@jrhemstad
Copy link
Contributor

@ttnghia could you fix this? I believe you worked on this code most recently.

@ttnghia
Copy link
Contributor

ttnghia commented Feb 8, 2022

Sure. I'll look into it.

@bdice
Copy link
Contributor

bdice commented Feb 9, 2022

Issue #9988 is somewhat related -- both pertain to performing computations in the output precision rather than the input precision. It might be worth investigating whether this problem is commonly occurring in the codebase.

@ttnghia
Copy link
Contributor

ttnghia commented Feb 9, 2022

Hah, yes that is similar. Fixing everything should be a bigger PR instead.

@bdice
Copy link
Contributor

bdice commented Feb 9, 2022

Keep PR scopes small and fix issues one at a time — I just wanted to make a note that this might not be an isolated problem. We should keep our eyes out for similar cases in other algorithms since at least two such issues have been found.

edit: I see the previous comment was edited and now agrees with keeping PR scopes small. 😉

rapids-bot bot pushed a commit that referenced this issue Feb 14, 2022
… of target type (#10250)

In groupby reductions, some operations such as SUM on integers have the output type different from the source type. For example, target type is int64 while the source type is int32. This is to prevent overflow of the computation result.

This fixes a bug in groupby reductions that perform computation using the data in source type instead of target type.

Closes #10246.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Mike Wilson (https://github.com/hyperbolic2346)
  - Conor Hoekstra (https://github.com/codereport)

URL: #10250
@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants