Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow restricting table features to auto-update protocol versions #1660

Closed
wants to merge 10 commits into from

Conversation

xupefei
Copy link
Contributor

@xupefei xupefei commented Mar 23, 2023

Description

This PR allows marking a metadata table feature as "auto-update capable", such that during ALTER TABLE when its metadata requirements are satisfied, the table's protocol will be automatically and silently bumped to the required version.

Note that this mechanism only applies to "metadata requirements". 'delta.feature.featureName' = 'supported' table prop will still auto-update the table's protocol to support table features.

Also, this mechanism only affects existing tables. For new tables, the protocol is always set to the min version that satisfies all enabled features, aka. all features are auto-update capable.

For compatibility, legacy table features (features supported by Protocol(2, 6)) are always auto-update capable. Specifically, Column Mapping implements its own mechanism to block the usage without protocol version bumps.

How was this patch tested?

New tests.

Does this PR introduce any user-facing changes?

No. This PR affects only feature developers.

Copy link
Contributor

@c27kwan c27kwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, pushing parts of my review + questions. Will review the rest later. :)

Comment on lines 662 to 670
"Your table schema requires the following table feature(s) that require manual enablement: <features>.",
"",
"To do this, run the following command for each of features listed above:",
" ALTER TABLE table_name SET TBLPROPERTIES ('delta.feature.feature_name' = 'supported')",
"Replace \"table_name\" and \"feature_name\" with real values.",
"",
"Required Delta protocol: <requiredVersion>",
"Current Delta protocol: <currentVersion>"
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Does that mean we now allow users to specify support for all table features? I thought they were not meant to be user-facing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, they are allowed to be user-facing. So a user can specify support for DVs like this or add CDC support and change the metadata at the same time, if they wish 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A feature can also be controlled exclusively by table features, such as TimestampNtz: #1626. In this case the config must be user-facing.

core/src/main/resources/error/delta-error-classes.json Outdated Show resolved Hide resolved
"Your table schema requires the following table feature(s) that require manual enablement: <features>.",
"",
"To do this, run the following command for each of features listed above:",
" ALTER TABLE table_name SET TBLPROPERTIES ('delta.feature.feature_name' = 'supported')",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Table name can be passed in to this error message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems not trivial because we have only the Delta Log instance which is based on a path. The name is already lost at this stage.

* ([[FeatureAutomaticallyEnabledByMetadata.automaticallyUpdateProtocolOfExistingTables]] is set
* to `true`), or being supported in the same transaction via a table property.
*/
private def assertTableFeaturesAutomaticallySupported(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the description, rename to assertTableFeaturesSupportedByProtocol? It doesn't necessarily only check automatic supports.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about assertRequiredFeaturesCanBeSupportedByProtocol 😄?

Comment on lines +114 to +115
* When the feature's metadata requirements are satisfied for <b>new tables</b>, or for
* <b>existing tables when [[automaticallyUpdateProtocolOfExistingTables]] set to `true`</b>, the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we allow html annotations in our doc comments? I thought it was just markdown?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so... Someone could correct me if I'm wrong

Copy link
Contributor

@c27kwan c27kwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the other half of the review. Looks good, just some more nits and questions.

Map(TestReaderWriterMetadataNoAutoUpdateFeature.TABLE_PROP_KEY -> "true"),
tableProtocol =
Protocol(TABLE_FEATURES_MIN_READER_VERSION, TABLE_FEATURES_MIN_WRITER_VERSION),
expectedExceptionClass = Some("DELTA_FEATURES_REyQUIRE_MANUAL_ENABLEMENT"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: AFAICT, for this test function this is always the same exception? Can we change it to always intercept the right exception and only make it more generic when it is necessary? Class string matching is more prone to mistakes where we make a typo and it compiles, but the test fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discovered I didn't do assert around intercept[] so the test can pass even with the typo 😂

Comment on lines 1419 to 1422
s"ALTER TABLE delta.`${dir.getCanonicalPath}` SET TBLPROPERTIES (" +
s" '${TestReaderWriterMetadataAutoUpdateFeature.TABLE_PROP_KEY}' = 'true'," +
s" '${TestWriterMetadataNoAutoUpdateFeature.TABLE_PROP_KEY}' = 'true')")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely interesting to test the combination of settings multiple features together. Do we already have test for the combination of two different auto-update features (e.g. CDC and Column Mapping)? A test for auto-update feature with the same feature delta.feature.xx = true/enabled/supported being set manually? For an auto-update feature that is set to not supported(e.g. default in the session), make sure it doesn't accidentally do a protocol update?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added tests from this matrix:

  • legacy vs TF protocol
  • non-auto-update feature vs auto-update feature
  • enabled via metadata vs via delta.feature.xxx

Also the following:

  • a non-auto-update feature enabled via both metadata and delta.feature.xxx
  • a non-auto-update feature, an auto-update feature, and some legacy features enabled via metadata
  • the above with a session default

Should be enough.

@xupefei xupefei requested a review from c27kwan April 3, 2023 10:01
@xupefei xupefei requested a review from bart-samwel April 3, 2023 14:57
Copy link
Collaborator

@bart-samwel bart-samwel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just nits left from me. Please still process those!

@xupefei xupefei deleted the tf-auto-update-feature branch April 19, 2023 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants