Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Fix create v1 table on REST Catalog #10369

Merged
merged 2 commits into from
Jul 5, 2024
Merged

Conversation

hantangwangd
Copy link
Contributor

This PR fix the problems of Cannot downgrade v2 table to v1 when creating v1 table on REST Catalog.

Fix issue: #8756

@hantangwangd
Copy link
Contributor Author

Hi @nastra, can you please take a look at this fix when available? Thanks a lot!

@@ -861,6 +862,19 @@ public static Builder buildFromEmpty() {
return new Builder();
}

public static Builder buildFromMetadataUpdates(List<MetadataUpdate> metadataUpdates) {
Copy link
Contributor

@nastra nastra Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks correct but I don't think we should expose this in TableMetadata. It should be fine to have this inside CatalogHandlers#create

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion, that seems better, fixed!

SCHEMA,
PartitionSpec.unpartitioned(),
ImmutableMap.of("format-version", formatVersion));
assertThatNoException().isThrownBy(() -> createTableTransaction1.commitTransaction());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than checking that no exception was thrown it would be better to just load the table and ensure it exists.

Something like this short piece should be enough:

@ParameterizedTest
  @ValueSource(ints = {1, 2})
  public void createTableTransaction(int formatVersion) {
    RESTCatalogAdapter adapter = Mockito.spy(new RESTCatalogAdapter(backendCatalog));
    RESTCatalog catalog = catalog(adapter);

    if (requiresNamespaceCreate()) {
      catalog.createNamespace(NS);
    }

    catalog
        .newCreateTableTransaction(
            TABLE,
            SCHEMA,
            PartitionSpec.unpartitioned(),
            ImmutableMap.of("format-version", String.valueOf(formatVersion)))
        .commitTransaction();

    BaseTable table = (BaseTable) catalog.loadTable(TABLE);
    assertThat(table.operations().current().formatVersion()).isEqualTo(formatVersion);
  }

Can you please also update the other test accordingly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact we should move this to CatalogTests, so that we make sure this works across different catalog implementations:

@ParameterizedTest
  @ValueSource(ints = {1, 2})
  public void createTableTransaction(int formatVersion) {
    if (requiresNamespaceCreate()) {
      catalog().createNamespace(NS);
    }

    catalog()
            .newCreateTableTransaction(
                    TABLE,
                    SCHEMA,
                    PartitionSpec.unpartitioned(),
                    ImmutableMap.of("format-version", String.valueOf(formatVersion)))
            .commitTransaction();

    BaseTable table = (BaseTable) catalog().loadTable(TABLE);
    assertThat(table.operations().current().formatVersion()).isEqualTo(formatVersion);
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests have been moved to CatalogTests, and modified as your suggestion. Please take a look when available, thanks a lot.

@@ -374,8 +374,7 @@ private static TableMetadata create(TableOperations ops, UpdateTableRequest requ
// the only valid requirement is that the table will be created
request.requirements().forEach(requirement -> requirement.validate(ops.current()));

TableMetadata.Builder builder = TableMetadata.buildFromEmpty();
request.updates().forEach(update -> update.applyTo(builder));
TableMetadata.Builder builder = TableMetadata.buildFromMetadataUpdates(request.updates());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just inline the fix here when iterating updates

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -991,7 +991,7 @@ public Builder assignUUID(String newUuid) {

// it is only safe to set the format version directly while creating tables
// in all other cases, use upgradeFormatVersion
private Builder setInitialFormatVersion(int newFormatVersion) {
public Builder setInitialFormatVersion(int newFormatVersion) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changes in this PR LGTM, but we need to be careful about making this method public (see also #8381 (comment)). Let me think a bit about this to see whether there's a slightly different way to approach this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one alternative would be to lazily initialize formatVersion inside TableMetadata.Builder so that calling setInitialFormatVersion wouldn't be necessary from CatalogHandlers during table creation. Let's see what others think here
/cc @aokolnychyi @rdblue @amogh-jahagirdar @RussellSpitzer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be made public.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hantangwangd I've raised a PR to your branch https://github.com/hantangwangd/iceberg/pull/1/files, but missed @nastra on the lazy formatVersion initialization. Not sure entirely what that meant, but in my approach we add a new buildFromEmpty(int formatVersion API so that the format version is only initialized once and never mutated (unless going through an actual upgrade API call)

.forEach(
update -> {
if (update instanceof UpgradeFormatVersion) {
builder.setInitialFormatVersion(((UpgradeFormatVersion) update).formatVersion());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right way to set the format version. The initial metadata needs to be created as v1, rather than dangerously setting the version to 1 after constructing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdblue Thanks for your comment. Do you mean that we should expose something like buildFromSpecifiedVersion(formatVersion) in TableMetadata for getting a builder with specified format version, and check the updates list in advance here to get the format version before creating the builder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or should we use a builder that lazily initialize formatVersion as @nastra described above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stepped through the code/debugger with the new tests, I think the right solution is to use the TableMetadata#newTableMetadata API in the create method rather than buildFromEmpty and apply the updates. That will build the initial metadata as V1 rather than trying to mutate it later after it's built, which is what I think @rdblue is saying to avoid. I'll let him confirm though.

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or an alternative is to add a buildFromEmpty(int formatVersion) API and apply the updates. Either way, agree with the idea that the format version should be defined once upfront when building a new table metadata instance for creation rather than mutate it later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hantangwangd I've raised a PR to your branch https://github.com/hantangwangd/iceberg/pull/1/files to show what I mean. cc @nastra @rdblue @aokolnychyi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amogh-jahagirdar Thank you for the change, that makes sense. I absorbed it into this PR.

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks for fixing @hantangwangd and @amogh-jahagirdar

@nastra nastra merged commit 24d26b6 into apache:main Jul 5, 2024
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants