Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support int96RebaseModeInWrite and int96RebaseModeInRead #3330

Merged
merged 26 commits into from
Sep 23, 2021

Conversation

razajafri
Copy link
Collaborator

While writing/reading a parquet file a user can specify a different config for reading/writing int96 values vs datetime values in general.

This PR takes the value set by users to int96RebaseModeInWrite and int96RebaseModeInRead and matches what Apache spark does.

Signed-off-by: Raza Jafri rjafri@nvidia.com

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri
Copy link
Collaborator Author

build

@sameerz sameerz added the task Work required that improves the product but is not user facing label Aug 28, 2021
@sameerz sameerz added this to the Aug 30 - Sept 10 milestone Aug 28, 2021
@sameerz sameerz linked an issue Aug 28, 2021 that may be closed by this pull request
@jlowe jlowe changed the title Support int96RebaseModeInWrite and int96RebaeModeInRead Support int96RebaseModeInWrite and int96RebaseModeInRead Aug 30, 2021
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting a shim function that would check INT96 timestamp rebase node, and in Spark < 3.1.1 it would check the old datetime config and in Spark >= 3.1.1 it would check the new INT96-specific configs.

@razajafri
Copy link
Collaborator Author

I was expecting a shim function that would check INT96 timestamp rebase node, and in Spark < 3.1.1 it would check the old datetime config and in Spark >= 3.1.1 it would check the new INT96-specific configs.

Ironically, that was my first implementation. The reason why I didn't like that was because we will be returning the value of dateTimeRebase mode instead of int96RebaseMode when asked for it, its even worse when asking for int96RebaseModeWrite.key, but I can see that it's still better than running into a RTE when/if we ever support another version of 3.0.x. I will make the change.

@jlowe
Copy link
Member

jlowe commented Aug 30, 2021

The reason why I didn't like that was because we will be returning the value of dateTimeRebase mode instead of int96RebaseMode when asked for it

But that is exactly what Spark < 3.1.1 is doing today. The shim function name would indicate we're trying to get the int96 rebase mode, and on Spark >= 3.1.1 it would check the Int96-specific config, and on Spark < 3.1.1 it would check the same config the Spark code is checking in that situation.

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri
Copy link
Collaborator Author

I have made some unnecessary changes to the SparkBaseShim in every version of 311. Will revert in a bit

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri
Copy link
Collaborator Author

build

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Databricks shims will also need to be updated accordingly.

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri
Copy link
Collaborator Author

build

@pxLi
Copy link
Collaborator

pxLi commented Aug 31, 2021

NFS issue, reported to SRE

@pxLi
Copy link
Collaborator

pxLi commented Aug 31, 2021

build

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri
Copy link
Collaborator Author

build

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri
Copy link
Collaborator Author

build

1 similar comment
@razajafri
Copy link
Collaborator Author

build

@pxLi
Copy link
Collaborator

pxLi commented Sep 1, 2021

@razajafri hi, I saw you triggered premerge twice but not abort the previous build, may I ask why? each build would take 2GPUs

@jlowe
Copy link
Member

jlowe commented Sep 22, 2021

Need to resolve merge conflicts but otherwise lgtm.

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri
Copy link
Collaborator Author

build

jlowe
jlowe previously approved these changes Sep 22, 2021
@razajafri
Copy link
Collaborator Author

Looks like a failure unrelated to my change in DB, root causing it

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri
Copy link
Collaborator Author

Skipping test_no_fallback_when_ansi_enabled until a resolution to #3611

@razajafri
Copy link
Collaborator Author

build

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
This reverts commit 14658b4.

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri
Copy link
Collaborator Author

build

Co-authored-by: Jason Lowe <jlowe@nvidia.com>
@razajafri
Copy link
Collaborator Author

build

@razajafri
Copy link
Collaborator Author

@revans2 or @jlowe can you please +1 this and merge before we see any more merge conflicts?

@jlowe jlowe changed the title Support int96RebaseModeInWrite and int96RebaseModeInRead [databricks] Support int96RebaseModeInWrite and int96RebaseModeInRead Sep 23, 2021
@jlowe jlowe merged commit fc40c00 into NVIDIA:branch-21.10 Sep 23, 2021
tgravescs added a commit to tgravescs/spark-rapids that referenced this pull request Sep 23, 2021
@tgravescs
Copy link
Collaborator

this pr is not up to date with latest moves of of shim files, reverted #3627

This was referenced Sep 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Spark3.1.0 test spark.sql.legacy.parquet.int96RebaseModeInRead
6 participants