Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-13647] Use role for Go worker binary. #16729

Merged
merged 4 commits into from
Feb 8, 2022
Merged

Conversation

lostluck
Copy link
Contributor

@lostluck lostluck commented Feb 4, 2022

Move away from older version of Go Worker Binary, in favour of a dedicated RoleUrn.

  1. Adds "beam:artifact:role:go_worker_binary:v1" urn to proto.
  2. On pipeline execution, updates the default Go env in the pipeline proto with the worker binary, as appropriate:
  • Continue to use file references for universal runners
  • Start using URL types pointing to GCS staging location for Dataflow.
  1. Retain legacy logic in case of old SDKs using newer SDK container for some reason.
  2. Newer SDKs can still use older SDK containers as there's no simple mechanism at present to add additional dependencies to the Go environment. Therefore there's often only 1 dependency and it will always be the worker. However, from 2.37.0 on, the role will be usable and expected.

Validated "old container" behavior manually against Spark and Dataflow, with the pipelines successfully executing with each.

Minor breaking change to the dataflow lib, unexporting the StageFile function, and also have it return the SHA256 hash for the binary. dataflowlib was never intended to be used outside of the dataflow runner code anyway.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

ValidatesRunner compliance status (on master branch)

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- Build Status Build Status Build Status Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Python --- Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status ---
XLang Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status ---

Examples testing status on various runners

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- --- --- --- --- --- ---
Java --- Build Status
Build Status
Build Status
--- --- --- --- ---
Python --- --- --- --- --- --- ---
XLang --- --- --- --- --- --- ---

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Go Java Python
Build Status Build Status Build Status
Build Status
Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status Build Status --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@codecov
Copy link

codecov bot commented Feb 4, 2022

Codecov Report

Merging #16729 (17e7890) into master (730b941) will increase coverage by 28.17%.
The diff coverage is 46.15%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #16729       +/-   ##
===========================================
+ Coverage   46.49%   74.67%   +28.17%     
===========================================
  Files         202      654      +452     
  Lines       19940    82133    +62193     
===========================================
+ Hits         9271    61331    +52060     
- Misses       9679    19812    +10133     
  Partials      990      990               
Impacted Files Coverage Δ
sdks/go/pkg/beam/artifact/materialize.go 48.94% <0.00%> (ø)
...o/pkg/beam/runners/dataflow/dataflowlib/execute.go 0.00% <0.00%> (ø)
.../go/pkg/beam/runners/dataflow/dataflowlib/stage.go 0.00% <0.00%> (ø)
sdks/go/pkg/beam/core/runtime/graphx/translate.go 33.49% <100.00%> (+5.12%) ⬆️
sdks/python/apache_beam/io/mongodbio.py 92.85% <0.00%> (ø)
sdks/python/apache_beam/internal/__init__.py 100.00% <0.00%> (ø)
...on/apache_beam/portability/api/metrics_pb2_grpc.py 0.00% <0.00%> (ø)
sdks/python/apache_beam/utils/annotations.py 100.00% <0.00%> (ø)
...ache_beam/io/gcp/datastore/v1new/query_splitter.py 93.75% <0.00%> (ø)
sdks/python/apache_beam/internal/dill_pickler.py 86.33% <0.00%> (ø)
... and 448 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 730b941...17e7890. Read the comment docs.

@lostluck
Copy link
Contributor Author

lostluck commented Feb 4, 2022

R: @ihji

@lostluck
Copy link
Contributor Author

lostluck commented Feb 4, 2022

Run Go Flink ValidatesRunner

@lostluck
Copy link
Contributor Author

lostluck commented Feb 4, 2022

Run Go Spark ValidatesRunner

@lostluck
Copy link
Contributor Author

lostluck commented Feb 4, 2022

Run Go Samza ValidatesRunner

Copy link
Contributor

@ihji ihji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM. Just one minor comment.

sdks/go/container/boot.go Outdated Show resolved Hide resolved
@ihji
Copy link
Contributor

ihji commented Feb 7, 2022

Please fix codecov failure before merging it.

@lostluck
Copy link
Contributor Author

lostluck commented Feb 7, 2022

Please fix codecov failure before merging it.

Note: All this code is covered by the integration tests, otherwise it wouldn't work. Adding test coverage to boot.go would require a complete refactor out of the single main method. I'll see what I can do about the other code though, in particular the binary substitution.

@lostluck
Copy link
Contributor Author

lostluck commented Feb 8, 2022

Added tests to the core changes in the graphx package. The other packages changes require various bits of file IO, without which, the SDK wouldn't function at all. They would need to be refactored a bit, or have specialized file-io based tests to validate them, which is a bit much for this change.

@lostluck
Copy link
Contributor Author

lostluck commented Feb 8, 2022

Run Java PreCommit

@lostluck lostluck merged commit 56e5c2c into apache:master Feb 8, 2022
Naireen pushed a commit to Naireen/beam that referenced this pull request Feb 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants