Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-14020] Adding SchemaTransform, SchemaTransformProvider, TypedSchemaTransformProvider, and PCollectionRowTuple #16958

Merged
merged 3 commits into from
Mar 17, 2022

Conversation

laraschmidt
Copy link
Contributor

@laraschmidt laraschmidt commented Feb 26, 2022

Adding SchemaTransform, SchemaTransformProvider, and PCollectionRowTuple. This is an interface which allows for Schema-aware transforms and will eventually replace SchemaIO.
R:@TheNeuralBit

Doc: https://s.apache.org/beam-schema-transform

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@github-actions github-actions bot added the java label Feb 26, 2022
@laraschmidt laraschmidt force-pushed the schema_fix3 branch 2 times, most recently from 804916e to 3c99343 Compare February 26, 2022 00:27
@laraschmidt laraschmidt changed the title Adding SchemaTransform, SchemaTransformProvider, and PCollectionRowTuple [BEAM-14020] Adding SchemaTransform, SchemaTransformProvider, and PCollectionRowTuple Mar 1, 2022
@laraschmidt
Copy link
Contributor Author

Run Java PreCommit

@TheNeuralBit TheNeuralBit self-requested a review March 2, 2022 19:03
*
* // Create an empty PCollectionTuple:
* Pipeline p = ...;
* PCollectionTuple pcs2 = PCollectionTuple.empty(p);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: looks like you need to s/PCollectionTuple/PCollectionRowTuple/

It's too bad there's so much duplication from PCollectionTuple, but I can't think of a way to structure this that avoids it. Maybe @kennknowles
has an idea (but I know he dislikes inheritance, so maybe he prefers it this way :P)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I wanted that too, it seems like it should be easy to do. But I think unless we template the underlying class there's no real way to make this work. We could maybe make PCollectionTupleBase and then Have PCollectionTuple extends Object and PCollectionRowTuple extend Row. But then we'd end up with Object instead of ?. I'm not actually sure how well that would work. Could give it a try but it still seems kind of messy.

@laraschmidt laraschmidt changed the title [BEAM-14020] Adding SchemaTransform, SchemaTransformProvider, and PCollectionRowTuple [BEAM-14020] Adding SchemaTransform, SchemaTransformProvider, TypedSchemaTransformProvider, and PCollectionRowTuple Mar 7, 2022
@laraschmidt laraschmidt force-pushed the schema_fix3 branch 2 times, most recently from fbc00ff to ad1014d Compare March 8, 2022 21:22
Copy link
Member

@TheNeuralBit TheNeuralBit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one minor nit. Thank you!

@TheNeuralBit
Copy link
Member

Run Java PreCommit

List<Row> inputs = toRows(Arrays.asList(3, -42, 77), intSchema);

PCollection<Row> mainInput = pipeline.apply(Create.of(inputs));
PCollection<Row> secondInput = pipeline.apply(Create.of(inputs));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is causing an actual failure in Java PreCommit:

java.lang.IllegalStateException: Pipeline update will not be possible because the following transforms do not have stable unique names: Create.Values.

Conflicting instances:
- name=Create.Values:
    - Create.Values
    - Create.Values

You can fix it adding a name when you call apply(): pipeline.apply(<name>, <transform>).
	at org.apache.beam.sdk.Pipeline.validate(Pipeline.java:619)
	at org.apache.beam.sdk.Pipeline.run(Pipeline.java:322)
	at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:399)
	at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:335)
	at org.apache.beam.sdk.values.PCollectionRowTupleTest.testComposePCollectionRowTuple(PCollectionRowTupleTest.java:101)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)

@TheNeuralBit
Copy link
Member

Run Java_Examples_Dataflow PreCommit

1 similar comment
@laraschmidt
Copy link
Contributor Author

Run Java_Examples_Dataflow PreCommit

@TheNeuralBit
Copy link
Member

Run Java PreCommit

1 similar comment
@laraschmidt
Copy link
Contributor Author

Run Java PreCommit


public static final Schema intSchema = Schema.of(Field.of("int", FieldType.INT32));
public static final Schema stringSchema = Schema.of(Field.of("str", FieldType.STRING));
public static final Schema boolSchema = Schema.of(Field.of("str", FieldType.BOOLEAN));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkstyle is failing in Java PreCommit now, it wants these to be ALL_CAPS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants