Skip to content

Commit

Permalink
Fix failed cases for non-utc time zone (#10060)
Browse files Browse the repository at this point in the history
* Fix failed cases for non-utc time zone

Signed-off-by: Chong Gao <res_life@163.com>

* Add doc

* Update doc

---------

Signed-off-by: Chong Gao <res_life@163.com>
Co-authored-by: Chong Gao <res_life@163.com>
  • Loading branch information
res-life and Chong Gao authored Dec 19, 2023
1 parent 217d702 commit e94d6be
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 1 deletion.
20 changes: 20 additions & 0 deletions integration_tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -343,6 +343,26 @@ integration tests. For example:
$ DATAGEN_SEED=1702166057 SPARK_HOME=~/spark-3.4.0-bin-hadoop3 integration_tests/run_pyspark_from_build.sh
```
### Running with non-UTC time zone
For the new added cases, we should check non-UTC time zone is working, or the non-UTC nightly CIs will fail.
The non-UTC nightly CIs are verifing all cases with non-UTC time zone.
But only a small amout of cases are verifing with non-UTC time zone in the pre-merge CI due to limited GPU resources.
When adding cases, should also check non-UTC is working besides the default UTC time zone.
Please test the following time zones:
```shell
$ TZ=Iran ./integration_tests/run_pyspark_from_build.sh
$ TZ=America/Los_Angeles ./integration_tests/run_pyspark_from_build.sh
```
`Iran` is non-DST(Daylight Savings Time) time zone and `America/Los_Angeles` is DST time zone.
If the new added cases failed with non-UTC, then should allow the operator(does not support non-UTC) fallback,
For example, add the following annotation to the case:
```python
non_utc_allow_for_sequence = ['ProjectExec'] # Update after non-utc time zone is supported for sequence
@allow_non_gpu(*non_utc_allow_for_sequence)
test_my_new_added_case_for_sequence_operator()
```
### Reviewing integration tests in Spark History Server
If the integration tests are run using [run_pyspark_from_build.sh](run_pyspark_from_build.sh) we have
Expand Down
5 changes: 4 additions & 1 deletion integration_tests/src/main/python/array_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from asserts import assert_gpu_and_cpu_are_equal_collect, assert_gpu_and_cpu_are_equal_sql, assert_gpu_and_cpu_error, assert_gpu_fallback_collect
from data_gen import *
from conftest import is_databricks_runtime
from marks import incompat
from marks import incompat, allow_non_gpu
from spark_session import is_before_spark_313, is_before_spark_330, is_databricks113_or_later, is_spark_330_or_later, is_databricks104_or_later, is_spark_33X, is_spark_340_or_later, is_spark_330, is_spark_330cdh
from pyspark.sql.types import *
from pyspark.sql.types import IntegralType
Expand Down Expand Up @@ -332,11 +332,14 @@ def do_it(spark):

assert_gpu_and_cpu_are_equal_collect(do_it)

non_utc_allow_for_sequence = ['ProjectExec'] # Update after non-utc time zone is supported for sequence
@allow_non_gpu(*non_utc_allow_for_sequence)
def test_array_transform_non_deterministic():
assert_gpu_and_cpu_are_equal_collect(
lambda spark : spark.range(1).selectExpr("transform(sequence(0, cast(rand(5)*10 as int) + 1), x -> x * 22) as t"),
conf={'spark.rapids.sql.castFloatToIntegralTypes.enabled': True})

@allow_non_gpu(*non_utc_allow_for_sequence)
def test_array_transform_non_deterministic_second_param():
assert_gpu_and_cpu_are_equal_collect(
lambda spark : debug_df(spark.range(1).selectExpr("transform(sequence(0, cast(rand(5)*10 as int) + 1), (x, i) -> x + i) as t")),
Expand Down

0 comments on commit e94d6be

Please sign in to comment.