Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: track bazel remote cache related flakiness #4407

Closed
lizan opened this issue Sep 12, 2018 · 4 comments
Closed

ci: track bazel remote cache related flakiness #4407

lizan opened this issue Sep 12, 2018 · 4 comments

Comments

@lizan
Copy link
Member

lizan commented Sep 12, 2018

Description:
bazelbuild/bazel#5908

Seems happened a couple times, recent failure:
https://circleci.com/gh/envoyproxy/envoy/94262

This manifests as an ar link error, e.g.

ERROR: /build/tmp/_bazel_bazel/400406edc57d332f0b9b805d2b8e33a1/external/envoy/test/exe/BUILD:36:1: Linking of rule '@envoy//test/exe:signals_test_lib' failed (Exit 1): ar failed: error executing command 
  (cd /build/tmp/_bazel_bazel/400406edc57d332f0b9b805d2b8e33a1/execroot/envoy && \
  exec env - \
    HOME=/tmp/fake_home \
    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
    PWD=/proc/self/cwd \
    PYTHONUSERBASE=/tmp/fake_home \
  /usr/bin/ar @bazel-out/k8-dbg/bin/external/envoy/test/exe/libsignals_test_lib.lo-2.params)

/usr/bin/ar: bazel-out/k8-dbg/bin/external/envoy/test/exe/libsignals_test_lib.lo: File format not recognized
INFO: Elapsed time: 2609.640s, Critical Path: 234.37s
@dnoe
Copy link
Contributor

dnoe commented Sep 12, 2018

Seems a pretty clear cut case that these failures are cache related.

@zuercher
Copy link
Member

Looking at bazelbuild/bazel#4558, it seems to imply that using the same flags with different compilers or versions of compilers might cause this problem.

That said in #4404 I experimented with printing compiler package info and it seems that we only ever use different compilers with different combinations of bazel flags (e.g. release is gcc/-c opt, ipv6 and api are clang/-c fastbuild, asan is clang/-c dbg/--config=clang-asan, tsan is clang/-c dbg/--config=clang-tsan).

The gcc/clang version always comes from our build image, so it's stable. (Unless someone's writing to the build cache from outside CI.)

AndresGuedez added a commit to AndresGuedez/envoy that referenced this issue Sep 17, 2018
Remote caching issue is being tracked in envoyproxy#4407.

Signed-off-by: Andres Guedez <aguedez@google.com>
@zuercher
Copy link
Member

zuercher commented Sep 17, 2018

@lizan I posted this in the maintainers slack channel, but is it possible that when the build completes the bazel server process is killed abruptly while still writing cache? And if so, would using --batch help? (Or I guess, the modern way is bazel shutdown.)

@lizan
Copy link
Member Author

lizan commented Sep 17, 2018

@zuercher OK let's try bazel shutdown and if it doesn't work then we can disable the cache.

lizan added a commit that referenced this issue Sep 29, 2018
Description:
Seems it causing problem frequently, disable it to see if it helps.

Risk Level: Low
Testing: CI
Docs Changes:
Release Notes:
Fixes #4407

Signed-off-by: Lizan Zhou <lizan@tetrate.io>
aa-stripe pushed a commit to aa-stripe/envoy that referenced this issue Oct 11, 2018
Description:
Seems it causing problem frequently, disable it to see if it helps.

Risk Level: Low
Testing: CI
Docs Changes:
Release Notes:
Fixes envoyproxy#4407

Signed-off-by: Lizan Zhou <lizan@tetrate.io>
Signed-off-by: Aaltan Ahmad <aa@stripe.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants