Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] GpuToUnixTimestampImproved off by 1 on GPU when handling timestamp before epoch #10018

Closed
NVnavkumar opened this issue Dec 11, 2023 · 2 comments · Fixed by #10033
Closed
Assignees
Labels
bug Something isn't working tech debt

Comments

@NVnavkumar
Copy link
Collaborator

Describe the bug
Tested this with 23.12 and 24.02

When passing a "negative" timestamp (ie earlier than the epoch) to to_unix_timestamp, the result actually is off by 1. When looking through the code I noticed this snippet in GpuToUnixTImestampImproved (due to rapidsai/cudf#5166).

val longSecs = withResource(lhs.getBase.asTimestampSeconds()) { secs =>
        secs.asLongs()
      }
      withResource(longSecs) { secs =>
        val plusOne = withResource(Scalar.fromLong(1)) { one =>
          secs.add(one)
        }
        withResource(plusOne) { plusOne =>
          withResource(Scalar.fromLong(0)) { zero =>
            withResource(secs.lessThan(zero)) { neg =>
              neg.ifElse(plusOne, secs)
            }
          }
        }
      }

It looks like cuDF was updated at some point to "fix" this issue, and this plusOne logic might no longer be needed

Steps/Code to reproduce bug
PySpark reproduce:

>>> import datetime
>>>  df = spark.createDataFrame([[datetime.datetime(1969,1,1,0,0,0, tzinfo=datetime.timezone.utc)]], ["a"])
>>> spark.conf.set("spark.rapids.sql.improvedTimeOps.enabled", "true")
>>> df.show()
+-------------------+
|                  a|
+-------------------+
|1969-01-01 00:00:00|
+-------------------+
>>>  df.selectExpr("to_unix_timestamp(a)").show()
+-----------------------------------------+
|to_unix_timestamp(a, yyyy-MM-dd HH:mm:ss)|
+-----------------------------------------+
|                                -31535999|
+-----------------------------------------+
>>> spark.conf.set("spark.rapids.sql.enabled", "false")
>>> df.selectExpr("to_unix_timestamp(a)").show()
+-----------------------------------------+
|to_unix_timestamp(a, yyyy-MM-dd HH:mm:ss)|
+-----------------------------------------+
|                                -31536000|
+-----------------------------------------+
@NVnavkumar NVnavkumar added bug Something isn't working ? - Needs Triage Need team to review and classify labels Dec 11, 2023
@NVnavkumar
Copy link
Collaborator Author

More context here, it looks like this doesn't happen for all negative timestamp values. It's possible this could be a rounding issue that has been masked somehow in testing.

@revans2
Copy link
Collaborator

revans2 commented Dec 12, 2023

Yes, but let's just delete it. No one knows it exists. I added it because it was frustrating to have to jump through hoops to match what Spark was doing. But it was dumb to add it, because it really is just dead code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working tech debt
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants