Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support hour minute second for non-UTC time zone #10024

Merged
merged 3 commits into from
Dec 13, 2023

Conversation

thirtiseven
Copy link
Collaborator

@thirtiseven thirtiseven commented Dec 12, 2023

This PR supports hour minute and second for non-UTC time zone.

Closes #6834
Closes #6842

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@thirtiseven thirtiseven self-assigned this Dec 12, 2023
revans2
revans2 previously approved these changes Dec 12, 2023
assert_gpu_fallback_collect(
lambda spark : unary_op_df(spark, timestamp_gen).selectExpr('hour(a)'),
'ProjectExec',
conf = {'spark.rapids.sql.nonUTC.enabled': True})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a side note, but what is the plan for nonUTC.enabled? If we get the right answer or fall back correctly why not just turn it on by default?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan is to remove the config: #9881. This was only added for the 23.12 release for risk avoidance.

Copy link
Collaborator

@winningsix winningsix Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecated this function in #10038 Motivations listed in PR comments.

input.getBase.hour()
} else {
// Non-UTC time zone
withResource(GpuTimeZoneDB.fromUtcTimestampToTimestamp(input.getBase, zoneId.normalized())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I personally think it would be a lot cleaner to have GpuTimeZoneDB.fromUtcTimestampToTimestamp have the optimized path for UTC to just inc the ref count and return it. It would make the code for users of it a lot simpler and would only have the drawback of an added reference count increment/decrement.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input.getBase.second()
} else {
// Non-UTC time zone
withResource(GpuTimeZoneDB.fromUtcTimestampToTimestamp(input.getBase, zoneId.normalized())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: does it make sense to just cache the normalized zoneId? Or normalize it before it is passed to the operator?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to normalize it, since it should be normalized automatically by GpuTimeZoneDB.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

winningsix
winningsix previously approved these changes Dec 12, 2023
Copy link
Collaborator

@winningsix winningsix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

def test_hour():
assert_gpu_and_cpu_are_equal_collect(
lambda spark : unary_op_df(spark, timestamp_gen).selectExpr('hour(a)'))
lambda spark : unary_op_df(spark, timestamp_gen).selectExpr('hour(a)'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we can consider merging these two cases together.

non_utc_allow = ['ProjectExec'] if is_supported_time_zone() else []

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done for all three cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one downside of this change is that we don't truly verify that we fell back to the CPU for the operator we care about. Getting the right answer is probably good enough though.


@allow_non_gpu('ProjectExec')
@pytest.mark.skipif(is_supported_time_zone(), reason="not all time zones are supported now, refer to https://github.com/NVIDIA/spark-rapids/issues/6839, please update after all time zones are supported")
def test_minute_fallback():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

conf = {'spark.rapids.sql.nonUTC.enabled': True})

@allow_non_gpu('ProjectExec')
@pytest.mark.skipif(is_supported_time_zone(), reason="not all time zones are supported now, refer to https://github.com/NVIDIA/spark-rapids/issues/6839, please update after all time zones are supported")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

@NVnavkumar
Copy link
Collaborator

Also closes #6842

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@thirtiseven thirtiseven dismissed stale reviews from winningsix and revans2 via f3cbbb4 December 13, 2023 06:53
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
def test_hour():
assert_gpu_and_cpu_are_equal_collect(
lambda spark : unary_op_df(spark, timestamp_gen).selectExpr('hour(a)'))
lambda spark : unary_op_df(spark, timestamp_gen).selectExpr('hour(a)'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one downside of this change is that we don't truly verify that we fell back to the CPU for the operator we care about. Getting the right answer is probably good enough though.

@revans2
Copy link
Collaborator

revans2 commented Dec 13, 2023

build

@winningsix winningsix merged commit 4d7ec5b into NVIDIA:branch-24.02 Dec 13, 2023
37 of 38 checks passed
@sameerz sameerz added the feature request New feature or request label Dec 16, 2023
@thirtiseven thirtiseven deleted the hour_timezone branch December 19, 2023 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
5 participants