[BUG] driver time zone check does not cover run-time default timezone changes #5820

gerashegalov · 2022-06-14T06:24:08Z

Describe the bug
We have the init code in the executor that is supposed to prevent non-UTC default timezone on the executor side if the driver side is on UTC

it's does not deal with the fact that JVM's default timezone is mutable.

Steps/Code to reproduce bug

Start driver & executor in GMT-8

 $SPARK_HOME/bin/spark-shell \
  --jars ./dist/target/rapids-4-spark_2.12-22.08.0-SNAPSHOT-cuda11.jar \
  --driver-java-options -Duser.timezone="GMT-8" \
  --conf spark.executor.extraJavaOptions="-Duser.timezone=GMT-8" \
  --conf spark.plugins=com.nvidia.spark.SQLPlugin \
  --conf spark.rapids.sql.explain=ALL \
  --master local-cluster[1,1,1200]

to avoid the check

spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/Plugin.scala

Lines 219 to 225 in 6a4599d

    
           if (TypeChecks.areTimestampsSupported(driverTimezone)) { 
        
             val executorTimezone = ZoneId.systemDefault() 
        
             if (executorTimezone.normalized() != driverTimezone.normalized()) { 
        
               throw new RuntimeException(s" Driver and executor timezone mismatch. " + 
        
                   s"Driver timezone is $driverTimezone and executor timezone is " + 
        
                   s"$executorTimezone. Set executor timezone to $driverTimezone.") 
        
             }

Change the default timezone to UTC on the driver

scala> java.util.TimeZone.setDefault(java.util.TimeZone.getTimeZone("UTC"))

Read the orc file from test_basic_reads on GPU

scala> spark.read.orc("integration_tests/src/test/resources/timestamp-date-test.orc").select($"time").take(1)
22/06/15 22:27:49 WARN GpuOverrides:
!Exec <CollectLimitExec> cannot run on GPU because the Exec CollectLimitExec has been disabled, and is disabled by default because Collect Limit replacement can be slower on the GPU, if huge number of rows in a batch it could help by limiting the number of rows transferred from GPU to CPU. Set spark.rapids.sql.exec.CollectLimitExec to true if you wish to enable it
  @Partitioning <SinglePartition$> could run on GPU
  *Exec <FileSourceScanExec> will run on GPU

res45: Array[org.apache.spark.sql.Row] = Array([1900-05-05 12:34:56.1])

read on CPU

scala> spark.conf.set("spark.rapids.sql.enabled", false)

scala> spark.read.orc("integration_tests/src/test/resources/timestamp-date-test.orc").select($"time").take(1)
res47: Array[org.apache.spark.sql.Row] = Array([1900-05-05 20:34:56.1])

and observe an 8-hour difference between CPU and GPU

Expected behavior

the check introduced Check executor timezone is same as driver timezone when running on GPU #4129 should correctly deal with the dynamic default timezone

Environment details (please complete the following information)

Environment location: any
Spark configuration settings related to the issue: see repro

Additional context
Add any other context about the problem here.

Originally posted by @gerashegalov in #5767 (comment)

gerashegalov added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jun 16, 2022

gerashegalov changed the title ~~[BUG] driver time zone check is brittle~~ [BUG] driver time zone check does not cover run-time default timezone changes Jun 16, 2022

sameerz added P1 Nice to have for release and removed ? - Needs Triage Need team to review and classify labels Jun 21, 2022

mattahrens removed the P1 Nice to have for release label Aug 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] driver time zone check does not cover run-time default timezone changes #5820

[BUG] driver time zone check does not cover run-time default timezone changes #5820

gerashegalov commented Jun 14, 2022 •

edited

Loading

[BUG] driver time zone check does not cover run-time default timezone changes #5820

[BUG] driver time zone check does not cover run-time default timezone changes #5820

Comments

gerashegalov commented Jun 14, 2022 • edited Loading

Start driver & executor in GMT-8

Change the default timezone to UTC on the driver

Read the orc file from test_basic_reads on GPU

read on CPU

gerashegalov commented Jun 14, 2022 •

edited

Loading