Fix crash when casting decimals to long #6103

razajafri · 2022-07-26T22:33:31Z

This PR avoids type mismatch that can occur if the min/max decimal values are a different precision than the column being casted in ANSI mode.

e.g.
Casting 222.22 to long will result in a call to check to see the values are in range before converting them to long values. That check will cause a crash as the minimum value possible in decimal(5,2) will result in a Decimal 128 value and the assertValuesInRange method in GpuCast will result in a type mismatch error from cudf when it calls lessThan.

@ttnghia Suggested a way around this is to first find the minimum value in the input column and compare that value to the decimal 128 value using a Java BigDecimal.compareTo which can handle comparing Decimals of different precisions.

fixes #6128

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

ttnghia · 2022-07-27T13:51:00Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCast.scala

+        withResource(input.min()) { min =>
+          withResource(input.max()) { max =>


min or max should be just a number, right? So we won't need to wrap them.

This is not just a number, it is a Scalar object (https://github.com/rapidsai/cudf/blob/branch-22.08/java/src/main/java/ai/rapids/cudf/Scalar.java#L35), so it has to be closed.

If it is scalar then we may want to call isValid to check the min and max values. Otherwise, if the input is all nulls, these values will be invalid.

tgravescs · 2022-07-27T14:59:41Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCast.scala

            .setScale(dt.scale, BigDecimal.RoundingMode.DOWN).bigDecimal
-        assertValuesInRange(input, Scalar.fromDecimal(min), Scalar.fromDecimal(max))
+        withResource(input.min()) { min =>


add comment about why we are doing this

In fact, the assertValuesInRange function is inefficient: It calls less operator then any then greater then any, and all these operations are O(N). We can achieve the result by half of computation by using this new approach: min then max in O(N) then compare the min/max values with the boundary in O(1).

I've filed a corresponding issue: #6130

tgravescs · 2022-07-27T15:00:27Z

it would be nice to have an issue associated with this that describes the problem and reproducing it

tgravescs · 2022-07-27T15:04:20Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCast.scala

+          withResource(input.max()) { max =>
+            if (min.getBigDecimal().compareTo(bigDecimalMin) == -1 ||
+                max.getBigDecimal().compareTo(bigDecimalMax) == 1) {
+              throw new IllegalStateException(GpuCast.INVALID_INPUT_MESSAGE)


this same thing throws in Spark on Cpu?

Maybe we need to add an integration test to compare?

this same thing throws in Spark on Cpu?

CPU throws an ArithmeticException but I wanted to match the existing behavior. I have changed it to match the CPU.

Maybe we need to add an integration test to compare?

Just adding an integration test to for a handful of values or testing a wider range? I experimented with this and it will be a lot more involved test as there are many other types of exceptions that can be thrown in ANSI e.g. overflow

I can imagine that this will be the case for other cast operations.
If we do not have integration tests to cover the behavior of CPU vs GPU, then we can create a new followup issue to improve the tests including the "Decimal to Long".

+1 for adding a test to make sure that we are not regressing on the exception.

amahussein · 2022-07-27T17:04:05Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCast.scala

+          withResource(input.max()) { max =>
+            if (min.getBigDecimal().compareTo(bigDecimalMin) == -1 ||
+                max.getBigDecimal().compareTo(bigDecimalMax) == 1) {
+              throw new IllegalStateException(GpuCast.INVALID_INPUT_MESSAGE)


Maybe we need to add an integration test to compare?

amahussein · 2022-07-27T17:05:17Z

tests/src/test/scala/com/nvidia/spark/rapids/AnsiCastOpSuite.scala

+    generateValidValuesDecimalDF(Short.MinValue, Short.MaxValue, 18, 3), sparkConf) {
+    frame => testCastTo(DataTypes.LongType)(frame)
+  }
+


Is it possible to test that we actually throw the expected exception?

This test is only for the valid values. May be I need to add a test that tests just that an exception is thrown?

We won't need to have a new case as you are throwing the same exception because this will be covered in testCastFailsForBadInputs("ansi_cast overflow decimals to longs",..).
If we change the behavior to match the CPU, then we would need a new test.

OK, then that test is already testing the exception that is thrown.

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

gerashegalov · 2022-07-28T15:36:30Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCast.scala

+        val bigDecimalMin = BigDecimal(Long.MinValue)
            .setScale(dt.scale, BigDecimal.RoundingMode.DOWN).bigDecimal
-        val max = BigDecimal(Long.MaxValue)
+        val bigDecimalMax = BigDecimal(Long.MaxValue)


bigDecimalMax/Min are not minimums of BigDecimal, the names are confusing. The conversion from long is probably non-trivial, might be worthwhile to make these class vals

The BigDecimal will have to be re-scaled and most of the calculation is done in that method. I am not sure what we will gain by just making a BigDecimal(Long.MIN) as a class val. I can still do it but I just wanted to make sure I was understanding you correctly.

Are we gaining anything by keeping recomputing the constants like BigDecimal(Long.MaxValue)?

If nothing else it's a bunch of unnecessary object allocations

https://github.com/scala/scala/blob/2.12.x/src/library/scala/math/BigDecimal.scala#L211
https://github.com/frohoff/jdk8u-dev-jdk/blob/da0da73ab82ed714dc5be94acd2f0d00fbdfe2e9/src/share/classes/java/math/BigDecimal.java#L1217-L1223
https://github.com/frohoff/jdk8u-dev-jdk/blob/da0da73ab82ed714dc5be94acd2f0d00fbdfe2e9/src/share/classes/java/math/BigDecimal.java#L1217-L1223

Note BigDecimal instances are just like Integer immutable. So setScale will produce another object generated from the objects we can cache such as BigDecimal(Long.MaxValue)

It is a valid argument depending on how frequent casting Decimals to Longs is done in a workload. The other concern is using static/constants increases the footprint of the VM and slows down the initialization.
Anyway, if this is going to be addressed within the same PR, then I suggest to create constants in a util to be used in different classes.
The repo has two other locations that use constants BigDecimal.
in arithemtic.scala there are two local variables

val zero = BigDecimal(0).bigDecimal

Then we can create three constants BigDecimal(0).bigDecimal, BigDecimal(Long.MaxValue), and BigDecimal(Long.MinValue)

All other constant BigDecimals are in test classes which we can ignore.

Can we please handle creating of the util class in a separate PR?

I agree it is fair. Both BigDecimal(Long.MaxValue), and BigDecimal(Long.MinValue) were not introduced in this PR.

I think we don't have yet to worry about the constant pool because of the two constants being added. We can do a more sweeping refactoring in a separate PR.

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

razajafri · 2022-07-29T00:13:25Z

build

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

gerashegalov

LGTM but would be great to see this case be validated against CPU
https://github.com/NVIDIA/spark-rapids/pull/6103/files#diff-e981882f5ee2f922528de849ae5397dd30e5bfe9dd5fdbe3421de4733a0eae1aR392

amahussein

LGTM.

A minor styling issue is to have constants in capital letters.
I am not sure that we have a clear formatting rule for this, but it looks like all the constants in GpuCast in capital letters.

amahussein · 2022-07-29T17:50:35Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCast.scala

@@ -177,6 +177,8 @@ object GpuCast extends Arm {
  private val TIMESTAMP_TRUNCATE_REGEX = "^([0-9]{4}-[0-9]{2}-[0-9]{2} " +
    "[0-9]{2}:[0-9]{2}:[0-9]{2})" +
    "(.[1-9]*(?:0)?[1-9]+)?(.0*[1-9]+)?(?:.0*)?$"
+  private val bigDecimalLongMin = BigDecimal(Long.MinValue)
+  private val bigDecimalLongMax = BigDecimal(Long.MaxValue)


private val BIG_DECIMAL_LONG_MIN = BigDecimal(Long.MinValue) private val BIG_DECIMAL_LONG_MAX = BigDecimal(Long.MaxValue)

amahussein · 2022-07-29T17:51:58Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCast.scala

-        val max = BigDecimal(Long.MaxValue)
-            .setScale(dt.scale, BigDecimal.RoundingMode.DOWN).bigDecimal
+        val min = bigDecimalLongMin.setScale(dt.scale, BigDecimal.RoundingMode.DOWN).bigDecimal
+        val max = bigDecimalLongMax.setScale(dt.scale, BigDecimal.RoundingMode.DOWN).bigDecimal


val min = BIG_DECIMAL_LONG_MIN.setScale(dt.scale, BigDecimal.RoundingMode.DOWN).bigDecimal val max = BIG_DECIMAL_LONG_MAX.setScale(dt.scale, BigDecimal.RoundingMode.DOWN).bigDecimal

razajafri · 2022-07-29T18:34:17Z

build

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

amahussein

Thanks Raza!

LGTM.

sameerz · 2022-08-01T00:39:09Z

build

Fix Decimal to Long cast

61151c9

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

razajafri self-assigned this Jul 26, 2022

razajafri requested review from revans2, jlowe and ttnghia July 26, 2022 22:35

sameerz requested review from tgravescs and gerashegalov July 27, 2022 00:33

sameerz added the bug Something isn't working label Jul 27, 2022

ttnghia reviewed Jul 27, 2022

View reviewed changes

tgravescs reviewed Jul 27, 2022

View reviewed changes

amahussein reviewed Jul 27, 2022

View reviewed changes

razajafri added 2 commits July 27, 2022 15:45

addressed review comments

37d4225

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

added comment

e0d7755

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

tgravescs previously approved these changes Jul 28, 2022

View reviewed changes

gerashegalov requested changes Jul 28, 2022

View reviewed changes

gerashegalov mentioned this pull request Jul 28, 2022

[BUG] cast timezone-awareness check positive for date/time-unrelated types #6138

Closed

razajafri added 2 commits July 28, 2022 11:49

addressed review comments

96c2afa

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

renamed var

bc27e77

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

razajafri dismissed tgravescs’s stale review via bc27e77 July 29, 2022 00:13

refactored local vals to class vals

2abec0f

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

razajafri requested review from gerashegalov and amahussein July 29, 2022 17:36

gerashegalov previously approved these changes Jul 29, 2022

View reviewed changes

amahussein previously approved these changes Jul 29, 2022

View reviewed changes

capitalize constants

a267a7d

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

razajafri dismissed stale reviews from amahussein and gerashegalov via a267a7d July 29, 2022 18:54

amahussein approved these changes Jul 29, 2022

View reviewed changes

ttnghia approved these changes Jul 31, 2022

View reviewed changes

revans2 approved these changes Aug 1, 2022

View reviewed changes

razajafri merged commit e77b40f into NVIDIA:branch-22.08 Aug 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix crash when casting decimals to long #6103

Fix crash when casting decimals to long #6103

razajafri commented Jul 26, 2022 •

edited

Loading

ttnghia Jul 27, 2022

abellina Jul 27, 2022

ttnghia Jul 28, 2022 •

edited

Loading

tgravescs Jul 27, 2022

ttnghia Jul 28, 2022

ttnghia Jul 28, 2022

tgravescs commented Jul 27, 2022

tgravescs Jul 27, 2022

amahussein Jul 27, 2022

razajafri Jul 27, 2022

amahussein Jul 29, 2022

gerashegalov Jul 29, 2022

amahussein Jul 27, 2022

amahussein Jul 27, 2022

razajafri Jul 27, 2022

amahussein Jul 29, 2022

razajafri Jul 29, 2022

gerashegalov Jul 28, 2022

razajafri Jul 29, 2022

gerashegalov Jul 29, 2022

gerashegalov Jul 29, 2022 •

edited

Loading

amahussein Jul 29, 2022

razajafri Jul 29, 2022

amahussein Jul 29, 2022

gerashegalov Jul 29, 2022

razajafri commented Jul 29, 2022

gerashegalov left a comment

amahussein left a comment

amahussein Jul 29, 2022

amahussein Jul 29, 2022

razajafri commented Jul 29, 2022

amahussein left a comment

sameerz commented Aug 1, 2022

		withResource(input.min()) { min =>
		withResource(input.max()) { max =>

Fix crash when casting decimals to long #6103

Fix crash when casting decimals to long #6103

Conversation

razajafri commented Jul 26, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ttnghia Jul 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgravescs commented Jul 27, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gerashegalov Jul 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

razajafri commented Jul 29, 2022

gerashegalov left a comment

Choose a reason for hiding this comment

amahussein left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

razajafri commented Jul 29, 2022

amahussein left a comment

Choose a reason for hiding this comment

sameerz commented Aug 1, 2022

razajafri commented Jul 26, 2022 •

edited

Loading

ttnghia Jul 28, 2022 •

edited

Loading

gerashegalov Jul 29, 2022 •

edited

Loading