Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Envoy 1.13.0 #2318

Merged
merged 2 commits into from
Apr 28, 2020
Merged

Upgrade to Envoy 1.13.0 #2318

merged 2 commits into from
Apr 28, 2020

Conversation

LukeShu
Copy link
Contributor

@LukeShu LukeShu commented Feb 14, 2020

Description

This upgrades our copy of Envoy from 1.12.2+a few patches to 1.13.0+a few patches. The specific commit updated to is currently the tip of the lukeshu/1.13 branch in github.com/datawire/envoy.

Envoy builds, everything generated from Envoy builds. I've pushed the Envoy build as
quay.io/datawire/ambassador-base:envoy-7.4f6ae4e71699d1c7ee1a6dd8be55e65bfd085b3d.opt which will be downloaded for others building from this PR so they won't have to build Envoy themselves.

It probably makes the most sense to review this commit-by-commit. The reviewer should also verify that there has been no loss of fidelity to the Datawire patches from envoy.git tag ambassador-1.0.0 to lukeshu/1.13.

Testing

It builds.

Todos

I have not (yet):

  • make check-envoy
  • make test KAT_RUN_MODE=envoy
  • Verified whether apro.git needs adjustments

@LukeShu
Copy link
Contributor Author

LukeShu commented Feb 18, 2020

I got make check-envoy to pass:

out of 550 tests: 550 tests pass.

It's taking 211GB, but when I had it on a 250GB drive, about half of the tests refused to run (and 1 failed), because There were tests whose specified size is too big.. The passing run came from a 400GB drive.

$ sudo du -h --max-depth 0 /var/lib/docker/volumes/envoy-build
206G	/var/lib/docker/volumes/envoy-build

$ df -h /var/lib/docker
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb        400G  211G  190G  53% /var/lib/docker

@LukeShu
Copy link
Contributor Author

LukeShu commented Feb 18, 2020

It looks like KAT passing with this is blocked by https://github.com/datawire/apro/issues/713 ?

@LukeShu
Copy link
Contributor Author

LukeShu commented Feb 20, 2020

WTAF. How is CI passing? It shouldn't be passing.

Edit: Oh, because of gold, and CI running without KAT_RUN_MODE=envoy.

@LukeShu LukeShu changed the title Upgrade to Envoy 1.13.0 [WIP] Upgrade to Envoy 1.13.0 Feb 20, 2020
@LukeShu LukeShu force-pushed the lukeshu/envoy-1.13 branch 2 times, most recently from 4522ac4 to 9dc2069 Compare February 21, 2020 21:10
@alphashr
Copy link
Contributor

Getting 2 test failures on both master and also on lukeshu/envoy-1.13 running on Kubernaut which I think is completely unrelated to Envoy upgrade:

  • FAILED python/tests/test_ambassador.py::main[ConsulTest] - AssertionError: http://consultest/consultest_consul/: expected status code 503, got 200 instead with error None
  • FAILED python/tests/test_scout.py::test_scout - AssertionError: test failed

@LukeShu
Copy link
Contributor Author

LukeShu commented Apr 22, 2020

Getting 2 test failures on both master and also on lukeshu/envoy-1.13 running on Kubernaut which I think is completely unrelated to Envoy upgrade:

* FAILED python/tests/test_ambassador.py::main[ConsulTest] - AssertionError: http://consultest/consultest_consul/: expected status code 503, got 200 instead with error None

* FAILED python/tests/test_scout.py::test_scout - AssertionError: test failed

Both of those are tests that can't be run multiple times in a row, because they don't clean up after themselves.

@LukeShu
Copy link
Contributor Author

LukeShu commented Apr 22, 2020

Assuming you ran the tests with KAT_RUN_MODE=envoy after merging master, looks good to me!

@alphashr alphashr requested a review from kflynn April 22, 2020 18:07
@alphashr alphashr changed the title [WIP] Upgrade to Envoy 1.13.0 Upgrade to Envoy 1.13.0 Apr 22, 2020
@alphashr
Copy link
Contributor

Envoy tests report:

Three tests failed:

1. //test/extensions/filters/common/ratelimit:ratelimit_impl_test           FAILED in 2.0s
  /root/.cache/bazel/_bazel_root/4e47b77d977b209382cf04b3ae32963e/execroot/envoy/bazel-out/k8-dbg/testlogs/test/extensions/filters/common/ratelimit/ratelimit_impl_test/test.log


2. //test/integration:ratelimit_integration_test                            FAILED in 14.5s
  /root/.cache/bazel/_bazel_root/4e47b77d977b209382cf04b3ae32963e/execroot/envoy/bazel-out/k8-dbg/testlogs/test/integration/ratelimit_integration_test/test.log


3. //test/integration:vhds_integration_test                                 FAILED in 7.8s
  /root/.cache/bazel/_bazel_root/4e47b77d977b209382cf04b3ae32963e/execroot/envoy/bazel-out/k8-dbg/testlogs/test/integration/vhds_integration_test/test.log

ratelimit_integration_test and ratelimit_impl_test is expected to break because of the hard coded legacy code that we already have for ratelimit. It is expected to be fixed when we migrate to Envoy v1.14.

Ran only vhds_integration_test for the second time: PASSED

root@92ce1f18c3ce:~/envoy# time bazel test --test_env=ENVOY_IP_TEST_VERSIONS=v4only //test/integration:vhds_integration_test; 

INFO: 243 processes: 243 linux-sandbox.
INFO: Build completed successfully, 249 total actions
//test/integration:vhds_integration_test                                 PASSED in 3.4s

Executed 1 out of 1 test: 1 test passes.
INFO: Build completed successfully, 249 total actions

real    1m30.908s
user    0m0.011s
sys     0m0.037s

@LukeShu
Copy link
Contributor Author

LukeShu commented Apr 23, 2020

Agree on the RLS tests. The VirtualHosts Discovery Service test flake concerns me a bit. I don't suppose you still have the log file referred to by the failing run?

@alphashr
Copy link
Contributor

Agree on the RLS tests. The VirtualHosts Discovery Service test flake concerns me a bit. I don't suppose you still have the log file referred to by the failing run?

Because it passed on the 2nd tun, I didn't grab the log failure file offline.
This test flakiness is seen [1], [2] and [3]: failure on thread sanitizer run

Copy link
Member

@kflynn kflynn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable overall? I do have some questions though...

go.mod Outdated
Comment on lines 5 to 7
// `go mod tidy` is going to try to remove `github.com/cncf/udpa`--don't let it!
// That will break `make generate`. Keep the github.com/cncf/udpa version
// in-sync with the github.com/cncf/udpa/go version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WTAF. This feels like something that'll burn us repeatedly. 🙁

python/ambassador/envoy/v2/v2listener.py Show resolved Hide resolved
cxx/envoy.mk Show resolved Hide resolved
go.mod Outdated
// `go mod tidy` is going to try to remove `github.com/cncf/udpa`--don't let it!
// That will break `make generate`. Keep the github.com/cncf/udpa version
// in-sync with the github.com/cncf/udpa/go version.
// If you're editing this file, there's a few things you should know:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we keep that comment about go mod tidy? Or is it no longer relevant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LukeShu any opinion on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the comment is still applicable. It seems that it was dropped in the 8918d35 merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe a (and I can't believe I'm saying this, after last week's rants against replace) a replace line could effectively pin it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I tried that; it didn't work right 🤷

@LukeShu
Copy link
Contributor Author

LukeShu commented Apr 23, 2020

And also between the fixed RLS patch and the bump to google.golang.org/grpc from master, this is due for another make generate.

…ci-skip]

This does all the work of upgrading, except for running `make generate`,
which is left for a separate commit for code-review purposes.

Authored-by: Luke Shumaker <lukeshu@datawire.io>
Authored-by: Shahriar Rostami <srdsecond@gmail.com>
kflynn
kflynn previously approved these changes Apr 27, 2020
Copy link
Member

@kflynn kflynn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hit it! 🙂 😱 🙂

@kflynn kflynn dismissed their stale review April 27, 2020 16:18

Realized that there's a release question.

@kflynn
Copy link
Member

kflynn commented Apr 27, 2020

I just revoked my approval not because of a sudden issue with the PR, but because I need to resolve a question around its release vehicle. Hold one. :)

Copy link
Member

@kflynn kflynn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK -- hit it!

@alphashr alphashr merged commit 6be8ee9 into master Apr 28, 2020
@alphashr alphashr deleted the lukeshu/envoy-1.13 branch April 28, 2020 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants