Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Inconsistency between the time zone in the fallback reason and the actual time zone checked in RapidsMeta.checkTImeZoneId #5678

Closed
gerashegalov opened this issue May 27, 2022 · 5 comments · Fixed by #5767
Assignees
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@gerashegalov
Copy link
Collaborator

gerashegalov commented May 27, 2022

Describe the bug
RapidsMeta.checkTImeZoneId compares the default time zone of the JVM to UTC in order to fallback on CPU, but quotes the session time zone as the offending actual

Steps/Code to reproduce bug

  1. Start a Spark REPL with an unsupported i.e., non UTC default time zone such as GMT-8:
pyspark --driver-java-options -Duser.timezone=GMT-8 \
  --packages com.nvidia:rapids-4-spark_2.12:22.04.0 \
  --conf spark.rapids.sql.enabled=true \
  --conf spark.plugins=com.nvidia.spark.SQLPlugin \
  --conf spark.rapids.sql.explain=ALL
  1. Set session's time zone to UTC
>>> spark.conf.set('spark.sql.session.timeZone', 'UTC')
  1. Execute a query with a TimestampType
    And observe a confusing message Only UTC zone id is supported. Actual zone id: UTC
>>> spark.createDataFrame([ ['2022-06-01 07:45'] ], 'a string').selectExpr('hour(a)').collect()
22/05/27 08:23:58 WARN GpuOverrides: 
!Exec <ProjectExec> cannot run on GPU because not all expressions can be replaced
  @Expression <Alias> hour(cast(a#4 as timestamp), Some(UTC)) AS hour(a)#6 could run on GPU
    !Expression <Hour> hour(cast(a#4 as timestamp), Some(UTC)) cannot run on GPU because input expression Cast cast(a#4 as timestamp) (TimestampType is not supported when the JVM system timezone is set to GMT-08:00. Set the timezone to UTC to enable TimestampType support); Only UTC zone id is supported. Actual zone id: UTC
      !Expression <Cast> cast(a#4 as timestamp) cannot run on GPU because the GPU only supports a subset of formats when casting strings to timestamps. Refer to the CAST documentation for more details. To enable this operation on the GPU, set spark.rapids.sql.castStringToTimestamp.enabled to true.; Cast from StringType to TimestampType is not supported; Parsing the full rage of supported years is not supported. If your years are limited to 4 positive digits set spark.rapids.sql.hasExtendedYearValues to false.
        @Expression <AttributeReference> a#4 could run on GPU
  ! <RDDScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.RDDScanExec
    @Expression <AttributeReference> a#4 could run on GPU

[Row(hour(a)=7)]

Expected behavior
A clear actionable message after determining if we can support non-UTC session time zone provided that the JVM time zone is the supported UTC

Environment details (please complete the following information)

  • Any

Additional context
rapidsai/cudf#2477
h/t @viadea

@gerashegalov gerashegalov added bug Something isn't working ? - Needs Triage Need team to review and classify labels May 27, 2022
@gerashegalov gerashegalov changed the title [BUG] Inconsistency between fallback reason and actual timezone in RapidsMeta.checkTImeZoneId [BUG] Inconsistency between fallback reason and the actual time zone checked in RapidsMeta.checkTImeZoneId May 27, 2022
@gerashegalov gerashegalov changed the title [BUG] Inconsistency between fallback reason and the actual time zone checked in RapidsMeta.checkTImeZoneId [BUG] Inconsistency between the time zone in the fallback reason and the actual time zone checked in RapidsMeta.checkTImeZoneId May 27, 2022
@sameerz sameerz added good first issue Good for newcomers and removed ? - Needs Triage Need team to review and classify labels May 31, 2022
@firestarman
Copy link
Collaborator

Shall we require both of session time zone id and the JVM time zone id to be "UTC" ?

@gerashegalov
Copy link
Collaborator Author

gerashegalov commented Jun 1, 2022

Shall we require both of session time zone id and the JVM time zone id to be "UTC" ?

+1 It was my vote too on the internal thread.

@firestarman
Copy link
Collaborator

+1 It was my vote too on the internal thread.

Thanks for the information. Do we have a deal now ?

@gerashegalov
Copy link
Collaborator Author

Feel free to propose a PR @firestarman

@firestarman
Copy link
Collaborator

@gerashegalov Here is the fix #5767, could you help review ?

firestarman added a commit that referenced this issue Jun 14, 2022
There are two settings for time zone, the Spark session local time zone and JVM's default time zone. And they may be different from each other according to issue #5678.

With this PR, plugin now requires both of the two settings being equal to UTC to support timestamp type.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants