Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TimestampGen to generate value not too close to the minimum allowed timestamp [databricks] #9736

Merged
merged 86 commits into from
Nov 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
c578a64
Add check for nested types
ttnghia Aug 28, 2023
e368aa6
Add check for nested types
ttnghia Aug 28, 2023
7da416b
Recursively check for rebasing
ttnghia Nov 2, 2023
df8f861
Extract common code
ttnghia Nov 2, 2023
95d19ee
Allow nested type in rebase check
ttnghia Nov 2, 2023
b426610
Enable nested timestamp in roundtrip test
ttnghia Nov 2, 2023
7343b17
Fix another test
ttnghia Nov 2, 2023
0d48f57
Merge branch 'check_rebase_nested' into rebase_datatime
ttnghia Nov 2, 2023
024e6c9
Enable `LEGACY` rebase in read
ttnghia Nov 2, 2023
9a39628
Remove comment
ttnghia Nov 2, 2023
e686bb0
Change function/class signatures
ttnghia Nov 2, 2023
b49963e
Merge branch 'branch-23.12' into rebase_datatime
ttnghia Nov 3, 2023
2c232f8
Complete modification
ttnghia Nov 3, 2023
ac0f3e4
Misc
ttnghia Nov 3, 2023
c773794
Add explicit type
ttnghia Nov 3, 2023
29df7cd
Rename file and add some stuff in DateTimeRebaseHelpers.scala
ttnghia Nov 3, 2023
1b5112d
Move file and rename class
ttnghia Nov 4, 2023
63342a9
Adopt new enum type
ttnghia Nov 4, 2023
6b2d795
Add name for the enum classes
ttnghia Nov 4, 2023
37aa40b
Change exception messages
ttnghia Nov 4, 2023
d4cdc1b
Merge branch 'branch-23.12' into refactor_parquet_scan
ttnghia Nov 4, 2023
03f681e
Does not yet support legacy rebase in read
ttnghia Nov 5, 2023
14f230f
Change legacy to corrected mode
ttnghia Nov 5, 2023
1b464ec
Extract common code
ttnghia Nov 5, 2023
0d26d97
Rename functions
ttnghia Nov 5, 2023
c2504fd
Reformat
ttnghia Nov 5, 2023
edb6c81
Make classes serializable
ttnghia Nov 5, 2023
ea86e8f
Revert "Support rebase checking for nested dates and timestamps (#9617)"
ttnghia Nov 6, 2023
b14463f
Merge branch 'refactor_parquet_scan' into rebase_datatime
ttnghia Nov 6, 2023
adc8ae2
Implement date time rebase
ttnghia Nov 6, 2023
791573c
Optimize rebase op
ttnghia Nov 6, 2023
54e959f
Merge branch 'branch-23.12' into refactor_parquet_scan
ttnghia Nov 6, 2023
3f01690
Change comment
ttnghia Nov 6, 2023
6d9c20b
Merge branch 'refactor_parquet_scan' into rebase_datatime
ttnghia Nov 6, 2023
8c63273
Move tests
ttnghia Nov 6, 2023
1b1fdc3
Add test for datatime rebase
ttnghia Nov 6, 2023
e6559ce
Various changes
ttnghia Nov 6, 2023
74fe84a
Various changes
ttnghia Nov 6, 2023
a455a90
Fix compile errors
ttnghia Nov 6, 2023
b87493c
Fix comments
ttnghia Nov 6, 2023
321e516
Fix indentations
ttnghia Nov 6, 2023
4bc33be
Merge branch 'refactor_parquet_scan' into rebase_datatime
ttnghia Nov 6, 2023
4aab36b
Change comments and indentations
ttnghia Nov 6, 2023
1b4744a
Merge branch 'rebase_datatime' into rebase_nested_timestamp
ttnghia Nov 6, 2023
70310db
Allow nested check for rebase
ttnghia Nov 7, 2023
c615925
Merge branch 'branch-23.12' into rebase_datatime
ttnghia Nov 7, 2023
be92368
Write different timestamp types in test
ttnghia Nov 7, 2023
b09c61f
Fix conversion if timestamp is not micros
ttnghia Nov 7, 2023
00d96e4
Rename var
ttnghia Nov 7, 2023
7d81311
Dont have to down cast after up cast
ttnghia Nov 7, 2023
116bf3e
Change comment
ttnghia Nov 7, 2023
273b2c4
Still cast timestamp to the old type after rebasing
ttnghia Nov 7, 2023
996d9d4
Rename test
ttnghia Nov 7, 2023
5fd6ef5
Should not transform non-datetime types
ttnghia Nov 7, 2023
d53ecfa
Merge branch 'rebase_datatime' into rebase_nested_timestamp
ttnghia Nov 7, 2023
4144655
Fix test
ttnghia Nov 7, 2023
5a8b44c
Update tests
ttnghia Nov 7, 2023
a33bfd6
Merge branch 'rebase_datatime' into rebase_nested_timestamp
ttnghia Nov 7, 2023
e366e5a
Enable int96 rebase in write
ttnghia Nov 7, 2023
247f47f
Change tests
ttnghia Nov 7, 2023
8eba053
Complete tests
ttnghia Nov 7, 2023
bda59ef
Revert unrelated changes
ttnghia Nov 7, 2023
bbcd9d9
Merge branch 'branch-23.12' into int96_rebase_write
ttnghia Nov 7, 2023
fbe37d7
Merge branch 'branch-23.12' into rebase_datatime
ttnghia Nov 7, 2023
4a92d54
Change configs
ttnghia Nov 8, 2023
54c53d3
Merge branch 'rebase_datatime' into rebase_nested_timestamp
ttnghia Nov 8, 2023
2f30ce9
Merge branch 'int96_rebase_write' into rebase_nested_timestamp
ttnghia Nov 8, 2023
af817de
Merge tests
ttnghia Nov 8, 2023
13242f4
Simplify test data
ttnghia Nov 8, 2023
e1d9f74
Add a new write test
ttnghia Nov 8, 2023
82012b6
Add a mixed rebase test
ttnghia Nov 8, 2023
76694ad
Merge branch 'branch-23.12' into rebase_nested_timestamp
ttnghia Nov 15, 2023
cbef912
Change tests
ttnghia Nov 15, 2023
1474dda
Merge branch 'branch-23.12' into rebase_nested_timestamp
ttnghia Nov 15, 2023
14487bf
Fix `seed` in tests
ttnghia Nov 15, 2023
0fff5e6
Rename tests
ttnghia Nov 15, 2023
d47d55f
Remove seed override
ttnghia Nov 15, 2023
8bfca59
Merge branch 'branch-23.12' into rebase_nested_timestamp
ttnghia Nov 16, 2023
9392083
Merge branch 'rebase_nested_timestamp' into fix_9701
ttnghia Nov 16, 2023
3134dde
Change TimestampGen
ttnghia Nov 16, 2023
76b2d0a
Remove default seed
ttnghia Nov 16, 2023
61d7d3d
Add default seed
ttnghia Nov 16, 2023
c6f77e4
Merge branch 'rebase_nested_timestamp' into fix_9701
ttnghia Nov 16, 2023
ffc617a
Remove default seed
ttnghia Nov 16, 2023
33d13e0
Merge branch 'branch-23.12' into fix_9701
ttnghia Nov 16, 2023
107e6cd
Merge branch 'branch-23.12' into fix_9701
ttnghia Nov 16, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion integration_tests/src/main/python/csv_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -402,7 +402,6 @@ def test_read_valid_and_invalid_dates(std_input_path, filename, v1_enabled_list,
"'T'HH:mm[:ss]",
"'T'HH:mm"]

@datagen_overrides(seed=0, reason='https://github.com/NVIDIA/spark-rapids/issues/9701')
@pytest.mark.parametrize('ts_part', csv_supported_ts_parts)
@pytest.mark.parametrize('date_format', csv_supported_date_formats)
@pytest.mark.parametrize('v1_enabled_list', ["", "csv"])
Expand Down
4 changes: 2 additions & 2 deletions integration_tests/src/main/python/data_gen.py
Original file line number Diff line number Diff line change
Expand Up @@ -578,9 +578,9 @@ def __init__(self, start=None, end=None, nullable=True, tzinfo=timezone.utc):
# Spark supports times starting at
# "0001-01-01 00:00:00.000000"
# but it has issues if you get really close to that because it tries to do things
# in a different format which causes roundoff, so we have to add a few days,
# in a different format which causes roundoff, so we have to add a few days, even a month,
# just to be sure
start = datetime(1, 1, 3, tzinfo=tzinfo)
start = datetime(1, 2, 1, tzinfo=tzinfo)
elif not isinstance(start, datetime):
raise RuntimeError('Unsupported type passed in for start {}'.format(start))

Expand Down
1 change: 0 additions & 1 deletion integration_tests/src/main/python/parquet_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -311,7 +311,6 @@ def test_parquet_pred_push_round_trip(spark_tmp_path, parquet_gen, read_func, v1
lambda spark: rf(spark).select(f.col('a') >= s0),
conf=all_confs)

@datagen_overrides(seed=0, reason='https://github.com/NVIDIA/spark-rapids/issues/9701')
@pytest.mark.parametrize('parquet_gens', [parquet_nested_datetime_gen], ids=idfn)
@pytest.mark.parametrize('ts_type', parquet_ts_write_options)
@pytest.mark.parametrize('ts_rebase_write', [('CORRECTED', 'LEGACY'), ('LEGACY', 'CORRECTED')])
Expand Down
2 changes: 0 additions & 2 deletions integration_tests/src/main/python/parquet_write_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -458,7 +458,6 @@ def generate_map_with_empty_validity(spark, path):
lambda spark, path: spark.read.parquet(path),
data_path)

@datagen_overrides(seed=0, reason='https://github.com/NVIDIA/spark-rapids/issues/9701')
@pytest.mark.parametrize('data_gen', parquet_nested_datetime_gen, ids=idfn)
@pytest.mark.parametrize('ts_write', parquet_ts_write_options)
@pytest.mark.parametrize('ts_rebase_write', ['EXCEPTION'])
Expand All @@ -475,7 +474,6 @@ def writeParquetCatchException(spark, data_gen, data_path):
lambda spark: writeParquetCatchException(spark, data_gen, data_path),
conf=all_confs)

@datagen_overrides(seed=0, reason='https://github.com/NVIDIA/spark-rapids/issues/9701')
@pytest.mark.parametrize('data_gen', parquet_nested_datetime_gen, ids=idfn)
@pytest.mark.parametrize('ts_write', parquet_ts_write_options)
@pytest.mark.parametrize('ts_rebase_write', [('CORRECTED', 'LEGACY'), ('LEGACY', 'CORRECTED')])
Expand Down