[BUG] `GpuToUnixTimestampImproved` off by 1 on GPU when handling timestamp before epoch #10018

NVnavkumar · 2023-12-11T23:54:55Z

Describe the bug
Tested this with 23.12 and 24.02

When passing a "negative" timestamp (ie earlier than the epoch) to to_unix_timestamp, the result actually is off by 1. When looking through the code I noticed this snippet in GpuToUnixTImestampImproved (due to rapidsai/cudf#5166).

val longSecs = withResource(lhs.getBase.asTimestampSeconds()) { secs =>
        secs.asLongs()
      }
      withResource(longSecs) { secs =>
        val plusOne = withResource(Scalar.fromLong(1)) { one =>
          secs.add(one)
        }
        withResource(plusOne) { plusOne =>
          withResource(Scalar.fromLong(0)) { zero =>
            withResource(secs.lessThan(zero)) { neg =>
              neg.ifElse(plusOne, secs)
            }
          }
        }
      }

It looks like cuDF was updated at some point to "fix" this issue, and this plusOne logic might no longer be needed

Steps/Code to reproduce bug
PySpark reproduce:

>>> import datetime
>>>  df = spark.createDataFrame([[datetime.datetime(1969,1,1,0,0,0, tzinfo=datetime.timezone.utc)]], ["a"])
>>> spark.conf.set("spark.rapids.sql.improvedTimeOps.enabled", "true")
>>> df.show()
+-------------------+
|                  a|
+-------------------+
|1969-01-01 00:00:00|
+-------------------+
>>>  df.selectExpr("to_unix_timestamp(a)").show()
+-----------------------------------------+
|to_unix_timestamp(a, yyyy-MM-dd HH:mm:ss)|
+-----------------------------------------+
|                                -31535999|
+-----------------------------------------+
>>> spark.conf.set("spark.rapids.sql.enabled", "false")
>>> df.selectExpr("to_unix_timestamp(a)").show()
+-----------------------------------------+
|to_unix_timestamp(a, yyyy-MM-dd HH:mm:ss)|
+-----------------------------------------+
|                                -31536000|
+-----------------------------------------+

The text was updated successfully, but these errors were encountered:

NVnavkumar · 2023-12-12T20:20:38Z

More context here, it looks like this doesn't happen for all negative timestamp values. It's possible this could be a rounding issue that has been masked somehow in testing.

revans2 · 2023-12-12T20:39:12Z

Yes, but let's just delete it. No one knows it exists. I added it because it was frustrating to have to jump through hoops to match what Spark was doing. But it was dumb to add it, because it really is just dead code.

NVnavkumar added bug Something isn't working ? - Needs Triage Need team to review and classify labels Dec 11, 2023

NVnavkumar added the tech debt label Dec 12, 2023

mattahrens assigned winningsix and NVnavkumar Dec 12, 2023

mattahrens removed the ? - Needs Triage Need team to review and classify label Dec 12, 2023

NVnavkumar unassigned winningsix Dec 12, 2023

NVnavkumar mentioned this issue Dec 12, 2023

Remove GpuToTimestampImproved and spark.rapids.sql.improvedTimeOps.enabled #10033

Merged

winningsix closed this as completed in #10033 Dec 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] `GpuToUnixTimestampImproved` off by 1 on GPU when handling timestamp before epoch #10018

[BUG] `GpuToUnixTimestampImproved` off by 1 on GPU when handling timestamp before epoch #10018

NVnavkumar commented Dec 11, 2023

NVnavkumar commented Dec 12, 2023

revans2 commented Dec 12, 2023

[BUG] GpuToUnixTimestampImproved off by 1 on GPU when handling timestamp before epoch #10018

[BUG] GpuToUnixTimestampImproved off by 1 on GPU when handling timestamp before epoch #10018

Comments

NVnavkumar commented Dec 11, 2023

NVnavkumar commented Dec 12, 2023

revans2 commented Dec 12, 2023

[BUG] `GpuToUnixTimestampImproved` off by 1 on GPU when handling timestamp before epoch #10018

[BUG] `GpuToUnixTimestampImproved` off by 1 on GPU when handling timestamp before epoch #10018