-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support unix_timestamp and to_unix_timestamp with non-UTC timezones (non-DST) #9816
Conversation
Signed-off-by: Ferdinand Xu <ferdinandx@nvidia.com>
Signed-off-by: Ferdinand Xu <ferdinandx@nvidia.com>
@@ -262,6 +262,13 @@ def test_unsupported_fallback_to_unix_timestamp(data_gen): | |||
"to_unix_timestamp(a, b)"), | |||
"ToUnixTimestamp") | |||
|
|||
# TODO: has another test for this called test_unix_timestamp | |||
@pytest.mark.parametrize('data_gen', date_n_time_gens, ids=idfn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also test string gen.
Please refer to the Product plan doc, it requires String data type.
Also closes #9606 |
build |
LGTM |
failOnError) | ||
} | ||
case _: DateType => | ||
timeZoneId match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use zoneId
defined in TimeZoneAwareExpression
@transient lazy val zoneId: ZoneId = DateTimeUtils.getZoneId(timeZoneId.get)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still timeZoneId
to match non case but use it later.
@@ -892,8 +909,20 @@ abstract class GpuToTimestampImproved extends GpuToTimestamp { | |||
failOnError) | |||
} | |||
} else if (lhs.dataType() == DateType){ | |||
lhs.getBase.asTimestampSeconds() | |||
} else { // Timestamp | |||
timeZoneId match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use zoneId
defined in TimeZoneAwareExpression
@transient lazy val zoneId: ZoneId = DateTimeUtils.getZoneId(timeZoneId.get)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
def test_unix_timestamp(data_gen): | ||
assert_gpu_and_cpu_are_equal_collect( | ||
lambda spark : unary_op_df(spark, data_gen).select(f.unix_timestamp(f.col('a')))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add cases for not supported Time Zone.
e.g.:
@pytest.mark.skipif(not is_supported_time_zone(), reason="not all time zones are supported now, refer to https://github.com/NVIDIA/spark-rapids/issues/6839, please update after all time zones are supported")
def test_from_unixtime(data_gen, date_format):
@pytest.mark.skipif(is_supported_time_zone(), reason="not all time zones are supported now, refer to https://github.com/NVIDIA/spark-rapids/issues/6839, please update after all time zones are supported")
def test_from_unixtime_fall_back(data_gen, date_format):
And test TZs locally:
export TZ=Iran
export TZ= America/Los_Angeles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
build |
LGTM |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Eventually, need to do cleanup based on NVIDIA/spark-rapids-jni#1643
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this just got merged, but we need to go back and fix the tests. I am fine if it is a fast follow on, otherwise we should revert it and do it right. The ProjectExec allowed to not be on the GPU in all test cases is not acceptable.
@allow_non_gpu(*non_utc_allow) | ||
def test_string_to_unix_timestamp(data_gen, date_form, ansi_enabled): | ||
@allow_non_gpu('ProjectExec') | ||
def test_string_to_unix_timestamp(data_gen, date_form, ansi_enabled, non_utc_timezone_enabled): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parameter is not used and nonUTC.enabled is going away.
@pytest.mark.parametrize('data_gen,date_form', str_date_and_format_gen, ids=idfn) | ||
@allow_non_gpu(*non_utc_allow) | ||
def test_string_to_unix_timestamp(data_gen, date_form, ansi_enabled): | ||
@allow_non_gpu('ProjectExec') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to have a hard coded ProjectExec that does not know if we should fall back to the CPU or not!.
@pytest.mark.parametrize('data_gen,date_form', str_date_and_format_gen, ids=idfn) | ||
@allow_non_gpu(*non_utc_allow) | ||
def test_string_unix_timestamp(data_gen, date_form, ansi_enabled): | ||
@allow_non_gpu('ProjectExec') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too.
This fixes #9815, #9606 and #10006
TODOs:
ImprovedTimestamp
path