Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avro inferred mapping issues with nested struct #2774

Closed
jzhuge opened this issue Jul 1, 2021 · 3 comments
Closed

Avro inferred mapping issues with nested struct #2774

jzhuge opened this issue Jul 1, 2021 · 3 comments
Labels

Comments

@jzhuge
Copy link
Member

jzhuge commented Jul 1, 2021

When nameMapping is null, projected schema is automatically inferred from read schema. We discovered issues when there is extra nested struct.

  1. When there are two extra nested structs: projected schema can not be serialized to json because 2 types in the nested struct have the same name rnull
  2. When there is one extra nested struct: json serialization is fine but schema type name rnull still does not seem right

Unit tests:

  @Test
  public void testInferredMappingNestedStruct() throws IOException {
    Schema writeSchema = new Schema(
        Types.NestedField.required(0, "id", Types.LongType.get())
    );

    Record record = new Record(AvroSchemaUtil.convert(writeSchema, "table"));
    record.put("id", 34L);

    Schema readSchema = new Schema(
        Types.NestedField.required(0, "id", Types.LongType.get()),
        Types.NestedField.optional(3, "location", Types.StructType.of(
            Types.NestedField.required(1, "lat", Types.FloatType.get()),
            Types.NestedField.required(2, "long", Types.FloatType.get())))
    );

    Record projected = writeAndRead(writeSchema, readSchema, record, null);
    AvroTestHelpers.assertEquals(writeSchema.asStruct(), record, projected);

    validateJsonSerialization(projected);
    validateNewNestedStruct(projected, readSchema.findField("location"));
  }

  @Test
  public void testInferredMappingTwoNestedStructs() throws IOException {
    Schema writeSchema = new Schema(
        Types.NestedField.required(0, "id", Types.LongType.get())
    );

    Record record = new Record(AvroSchemaUtil.convert(writeSchema, "table"));
    record.put("id", 34L);

    Schema readSchema = new Schema(
        Types.NestedField.required(0, "id", Types.LongType.get()),
        Types.NestedField.optional(3, "location", Types.StructType.of(
            Types.NestedField.required(1, "lat", Types.FloatType.get()),
            Types.NestedField.required(2, "long", Types.FloatType.get()))),
        Types.NestedField.optional(4, "address", Types.StructType.of(
            Types.NestedField.required(5, "street", Types.IntegerType.get())))
    );

    Record projected = writeAndRead(writeSchema, readSchema, record, null);
    AvroTestHelpers.assertEquals(writeSchema.asStruct(), record, projected);

    validateJsonSerialization(projected);
    validateNewNestedStruct(projected, readSchema.findField("location"));
    validateNewNestedStruct(projected, readSchema.findField("address"));
  }

  private void validateJsonSerialization(Record projected) {
    Assert.assertFalse("Projected schema can be serialized to json", projected.getSchema().toString().isEmpty());
  }

  private void validateNewNestedStruct(Record projected, Types.NestedField field) {
    String newTypeName = "r" + field.fieldId();
    String newFieldName = field.name() + "_" + newTypeName;
    Assert.assertNull(field.name() + " field should be null", projected.get(newFieldName));
    Assert.assertNotNull(field.name() + " field is renamed to " + newFieldName,
        projected.getSchema().getField(newFieldName));
    Assert.assertEquals(field.name() + " field should have schema type name " + newTypeName,
        newTypeName, projected.getSchema().getField(newFieldName).schema().getTypes().get(1).getName());
  }
jzhuge added a commit to jzhuge/iceberg that referenced this issue Jul 1, 2021
Fixes apache#2774.

Co-authored-by: Wei Liu <liuwei7923bupt@gmail.com>
jzhuge added a commit to jzhuge/iceberg that referenced this issue Jul 1, 2021
Fixes apache#2774.

Co-authored-by: Wei Liu <liuwei7923bupt@gmail.com>
@jzhuge
Copy link
Member Author

jzhuge commented Jul 6, 2021

@rdblue @rdsr Could you take a look?

Copy link

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

@github-actions github-actions bot added the stale label Jun 20, 2024
Copy link

github-actions bot commented Jul 4, 2024

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant