Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-4678][RFC-61] RFC for Snapshot view management #6576
base: master
Are you sure you want to change the base?
[HUDI-4678][RFC-61] RFC for Snapshot view management #6576
Changes from all commits
7491925
113ee82
dd90f94
3053af4
23817c0
3404d10
57a1fc4
63524a5
c48b81e
6d5c80f
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would this incremental processing of snapshot views differ from the existing incremental processing of the hudi table itself? is it intended for bigger incremental pull window? if this understanding is correct, then in btw the snapshots, there will be missing original commits with changed data. not sure how practical this is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, users can only get one snapshot per savepoint, I think it may satisfy some SCD scenarios. WDYT? @xushiyan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
savepoint commit will record all base files at that point of time and those files will be retained in the hudi table. so it's still the full data at that point. what storage saving is this compared against?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if some base files have not changed between two savepoints, these two savepoints can share base files instead of retain two full data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One critical fix we might have to do on this regards is supporting archival beyond savepoint.
https://issues.apache.org/jira/browse/HUDI-3884
even though the ticket is closed, we have not certified across all diff table states, all diff queries etc. If not, archival will stop at the first savepointed commit.
w/ 1.x, things might change. CC @danny0405 @codope