-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support hour minute second for non-UTC time zone #10024
Conversation
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
assert_gpu_fallback_collect( | ||
lambda spark : unary_op_df(spark, timestamp_gen).selectExpr('hour(a)'), | ||
'ProjectExec', | ||
conf = {'spark.rapids.sql.nonUTC.enabled': True}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a side note, but what is the plan for nonUTC.enabled? If we get the right answer or fall back correctly why not just turn it on by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan is to remove the config: #9881. This was only added for the 23.12 release for risk avoidance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deprecated this function in #10038 Motivations listed in PR comments.
input.getBase.hour() | ||
} else { | ||
// Non-UTC time zone | ||
withResource(GpuTimeZoneDB.fromUtcTimestampToTimestamp(input.getBase, zoneId.normalized())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I personally think it would be a lot cleaner to have GpuTimeZoneDB.fromUtcTimestampToTimestamp
have the optimized path for UTC to just inc the ref count and return it. It would make the code for users of it a lot simpler and would only have the drawback of an added reference count increment/decrement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed this NVIDIA/spark-rapids-jni#1643
input.getBase.second() | ||
} else { | ||
// Non-UTC time zone | ||
withResource(GpuTimeZoneDB.fromUtcTimestampToTimestamp(input.getBase, zoneId.normalized())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: does it make sense to just cache the normalized zoneId? Or normalize it before it is passed to the operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to normalize it, since it should be normalized automatically by GpuTimeZoneDB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
def test_hour(): | ||
assert_gpu_and_cpu_are_equal_collect( | ||
lambda spark : unary_op_df(spark, timestamp_gen).selectExpr('hour(a)')) | ||
lambda spark : unary_op_df(spark, timestamp_gen).selectExpr('hour(a)'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we can consider merging these two cases together.
non_utc_allow = ['ProjectExec'] if is_supported_time_zone() else []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done for all three cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one downside of this change is that we don't truly verify that we fell back to the CPU for the operator we care about. Getting the right answer is probably good enough though.
|
||
@allow_non_gpu('ProjectExec') | ||
@pytest.mark.skipif(is_supported_time_zone(), reason="not all time zones are supported now, refer to https://github.com/NVIDIA/spark-rapids/issues/6839, please update after all time zones are supported") | ||
def test_minute_fallback(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
conf = {'spark.rapids.sql.nonUTC.enabled': True}) | ||
|
||
@allow_non_gpu('ProjectExec') | ||
@pytest.mark.skipif(is_supported_time_zone(), reason="not all time zones are supported now, refer to https://github.com/NVIDIA/spark-rapids/issues/6839, please update after all time zones are supported") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
Also closes #6842 |
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
def test_hour(): | ||
assert_gpu_and_cpu_are_equal_collect( | ||
lambda spark : unary_op_df(spark, timestamp_gen).selectExpr('hour(a)')) | ||
lambda spark : unary_op_df(spark, timestamp_gen).selectExpr('hour(a)'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one downside of this change is that we don't truly verify that we fell back to the CPU for the operator we care about. Getting the right answer is probably good enough though.
build |
This PR supports
hour
minute
andsecond
for non-UTC time zone.Closes #6834
Closes #6842