Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support spark.sql.parquet.int96RebaseModeInWrite=LEGACY [databricks] #9658

Merged
merged 61 commits into from
Nov 14, 2023

Conversation

ttnghia
Copy link
Collaborator

@ttnghia ttnghia commented Nov 7, 2023

This adds support for LEGACY mode in spark.sql.parquet.int96RebaseModeInWrite, which allows writing files containing ancient times before 1582-10-15 with rebasing from Proleptic Gregorian calendar times to Julian calendar times.

Closes:

ttnghia and others added 30 commits November 2, 2023 10:52
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
# Conflicts:
#	sql-plugin/src/main/scala/com/nvidia/spark/RebaseHelper.scala
#	sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetScan.scala
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
…IA#9617)"

This reverts commit 401d0d8.

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

# Conflicts:
#	sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetScan.scala
# Conflicts:
#	sql-plugin/src/main/scala/com/nvidia/spark/RebaseHelper.scala
#	sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetScan.scala
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
@ttnghia ttnghia added the task Work required that improves the product but is not user facing label Nov 7, 2023
@ttnghia ttnghia self-assigned this Nov 7, 2023
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
@ttnghia
Copy link
Collaborator Author

ttnghia commented Nov 7, 2023

build

@ttnghia
Copy link
Collaborator Author

ttnghia commented Nov 7, 2023

build

@ttnghia
Copy link
Collaborator Author

ttnghia commented Nov 8, 2023

build

revans2
revans2 previously approved these changes Nov 8, 2023
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the tests are updated and the code says that we now support rebase for in96 writes. But I don't see anywhere that the code was updated for it. I am assuming that the existing code just covered it and we are now enabling it after testing.

Signed-off-by: Nghia Truong <nghiat@nvidia.com>
@ttnghia
Copy link
Collaborator Author

ttnghia commented Nov 8, 2023

build

@ttnghia
Copy link
Collaborator Author

ttnghia commented Nov 8, 2023

I see the tests are updated and the code says that we now support rebase for in96 writes. But I don't see anywhere that the code was updated for it. I am assuming that the existing code just covered it and we are now enabling it after testing.

Right, the existing code already handles the rebase computation. Now we just enable the corresponding code path and update tests.

Signed-off-by: Nghia Truong <nghiat@nvidia.com>
@ttnghia
Copy link
Collaborator Author

ttnghia commented Nov 8, 2023

build

@ttnghia ttnghia requested a review from revans2 November 9, 2023 14:33
@@ -85,8 +85,6 @@ def do_write(spark, table_name):
@pytest.mark.skipif(not is_hive_available(), reason="Hive is missing")
@pytest.mark.parametrize("gens", [_basic_gens], ids=idfn)
@pytest.mark.parametrize("storage_with_confs", [
("PARQUET", {"spark.sql.legacy.parquet.datetimeRebaseModeInWrite": "LEGACY",
"spark.sql.legacy.parquet.int96RebaseModeInWrite": "LEGACY"}),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why drop these?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fallback test. We now have full support for LEGACY in write thus we don't fallback anymore.

Copy link
Collaborator Author

@ttnghia ttnghia Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ttnghia ttnghia linked an issue Nov 14, 2023 that may be closed by this pull request
@revans2 revans2 merged commit 4fdd7bd into NVIDIA:branch-23.12 Nov 14, 2023
37 checks passed
@ttnghia ttnghia deleted the int96_rebase_write branch November 14, 2023 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support spark.sql.parquet.int96RebaseModeInWrite= LEGACY
2 participants