Skip to content

Commit

Permalink
Fix TimestampGen to generate value not too close to the minimum all…
Browse files Browse the repository at this point in the history
…owed timestamp [databricks] (#9736)

* Add check for nested types

* Add check for nested types

* Recursively check for rebasing

* Extract common code

* Allow nested type in rebase check

* Enable nested timestamp in roundtrip test

* Fix another test

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Enable `LEGACY` rebase in read

* Remove comment

* Change function/class signatures

* Complete modification

* Misc

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Add explicit type

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Rename file and add some stuff in DateTimeRebaseHelpers.scala

* Move file and rename class

* Adopt new enum type

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Add name for the enum classes

* Change exception messages

* Does not yet support legacy rebase in read

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Change legacy to corrected mode

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Extract common code

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Rename functions

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Reformat

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Make classes serializable

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Revert "Support rebase checking for nested dates and timestamps (#9617)"

This reverts commit 401d0d8.

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

# Conflicts:
#	sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetScan.scala

* Implement date time rebase

* Optimize rebase op

* Change comment

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Move tests

* Add test for datatime rebase

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Various changes

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Various changes

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

# Conflicts:
#	sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetScan.scala

* Fix compile errors

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Fix comments

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Fix indentations

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Change comments and indentations

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Allow nested check for rebase

* Write different timestamp types in test

* Fix conversion if timestamp is not micros

* Rename var

* Dont have to down cast after up cast

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Change comment

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Still cast timestamp to the old type after rebasing

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Rename test

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Should not transform non-datetime types

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Fix test

* Update tests

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Enable int96 rebase in write

* Change tests

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Complete tests

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Revert unrelated changes

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Change configs

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Merge tests

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Simplify test data

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Add a new write test

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Add a mixed rebase test

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Change tests

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Fix `seed` in tests

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Rename tests

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Remove seed override

* Change TimestampGen

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Remove default seed

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Add default seed

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Remove default seed

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

---------

Signed-off-by: Nghia Truong <nghiat@nvidia.com>
  • Loading branch information
ttnghia authored Nov 17, 2023
1 parent 30c3df3 commit 244ceab
Show file tree
Hide file tree
Showing 4 changed files with 2 additions and 6 deletions.
1 change: 0 additions & 1 deletion integration_tests/src/main/python/csv_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -402,7 +402,6 @@ def test_read_valid_and_invalid_dates(std_input_path, filename, v1_enabled_list,
"'T'HH:mm[:ss]",
"'T'HH:mm"]

@datagen_overrides(seed=0, reason='https://github.com/NVIDIA/spark-rapids/issues/9701')
@pytest.mark.parametrize('ts_part', csv_supported_ts_parts)
@pytest.mark.parametrize('date_format', csv_supported_date_formats)
@pytest.mark.parametrize('v1_enabled_list', ["", "csv"])
Expand Down
4 changes: 2 additions & 2 deletions integration_tests/src/main/python/data_gen.py
Original file line number Diff line number Diff line change
Expand Up @@ -578,9 +578,9 @@ def __init__(self, start=None, end=None, nullable=True, tzinfo=timezone.utc):
# Spark supports times starting at
# "0001-01-01 00:00:00.000000"
# but it has issues if you get really close to that because it tries to do things
# in a different format which causes roundoff, so we have to add a few days,
# in a different format which causes roundoff, so we have to add a few days, even a month,
# just to be sure
start = datetime(1, 1, 3, tzinfo=tzinfo)
start = datetime(1, 2, 1, tzinfo=tzinfo)
elif not isinstance(start, datetime):
raise RuntimeError('Unsupported type passed in for start {}'.format(start))

Expand Down
1 change: 0 additions & 1 deletion integration_tests/src/main/python/parquet_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -311,7 +311,6 @@ def test_parquet_pred_push_round_trip(spark_tmp_path, parquet_gen, read_func, v1
lambda spark: rf(spark).select(f.col('a') >= s0),
conf=all_confs)

@datagen_overrides(seed=0, reason='https://github.com/NVIDIA/spark-rapids/issues/9701')
@pytest.mark.parametrize('parquet_gens', [parquet_nested_datetime_gen], ids=idfn)
@pytest.mark.parametrize('ts_type', parquet_ts_write_options)
@pytest.mark.parametrize('ts_rebase_write', [('CORRECTED', 'LEGACY'), ('LEGACY', 'CORRECTED')])
Expand Down
2 changes: 0 additions & 2 deletions integration_tests/src/main/python/parquet_write_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -458,7 +458,6 @@ def generate_map_with_empty_validity(spark, path):
lambda spark, path: spark.read.parquet(path),
data_path)

@datagen_overrides(seed=0, reason='https://github.com/NVIDIA/spark-rapids/issues/9701')
@pytest.mark.parametrize('data_gen', parquet_nested_datetime_gen, ids=idfn)
@pytest.mark.parametrize('ts_write', parquet_ts_write_options)
@pytest.mark.parametrize('ts_rebase_write', ['EXCEPTION'])
Expand All @@ -475,7 +474,6 @@ def writeParquetCatchException(spark, data_gen, data_path):
lambda spark: writeParquetCatchException(spark, data_gen, data_path),
conf=all_confs)

@datagen_overrides(seed=0, reason='https://github.com/NVIDIA/spark-rapids/issues/9701')
@pytest.mark.parametrize('data_gen', parquet_nested_datetime_gen, ids=idfn)
@pytest.mark.parametrize('ts_write', parquet_ts_write_options)
@pytest.mark.parametrize('ts_rebase_write', [('CORRECTED', 'LEGACY'), ('LEGACY', 'CORRECTED')])
Expand Down

0 comments on commit 244ceab

Please sign in to comment.