Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] [Databricks 12.2] Add support for deletion vectors #8654

Open
andygrove opened this issue Jul 3, 2023 · 0 comments
Open

[FEA] [Databricks 12.2] Add support for deletion vectors #8654

andygrove opened this issue Jul 3, 2023 · 0 comments
Labels
feature request New feature or request

Comments

@andygrove
Copy link
Contributor

Is your feature request related to a problem? Please describe.

Deletion vectors are a storage optimization feature that can be enabled on Delta Lake tables. By default, when a single row in a data file is deleted, the entire Parquet file containing the record must be rewritten. With deletion vectors enabled for the table, some Delta operations use deletion vectors to mark existing rows as removed without rewriting the Parquet file. Subsequent reads on the table resolve current table state by applying the deletions noted by deletion vectors to the most recent table version.

Describe the solution you'd like
We need to ensure that we respect deletion vectors in Databricks 12.2 and later when performing reads. We should also add support for writing deletion vectors when performing writes.

Describe alternatives you've considered
None

Additional context
https://docs.delta.io/2.4.0/delta-deletion-vectors.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants