Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] Ubuntu version mismatch between machines causes false cache hits #14695

Closed
milesdai opened this issue Sep 1, 2022 · 1 comment · Fixed by #14697
Closed

[ci] Ubuntu version mismatch between machines causes false cache hits #14695

milesdai opened this issue Sep 1, 2022 · 1 comment · Fixed by #14697
Assignees
Labels
Component:CI Continuous Integration (Azure Pipelines & Co.) Priority:P0 Priority: critical

Comments

@milesdai
Copy link
Contributor

milesdai commented Sep 1, 2022

The recent bump to Ubuntu 20.04 in #14432 caused a version mismatch in glibc (among probably many other things) between the Azure runners and the self-hosted runners. The remote cache is not able to differentiate builds between these two platforms, leading to cache hits where there should not be. This is leading to our self-hosted runners (using Ubuntu 18.04) pulling cached artifacts that were built on Azure runners (using Ubuntu 20.04).

This is a known issue mentioned in the Remote Caching documentation and is tracked in bazelbuild/bazel#4558.

The immediate solution is going to be sharding the cache by including the OS version in the build actions, similarly to what's described here. In the short term, we need to prioritize upgrading the GCP nodes and the FPGA runner to Ubuntu 20.04. In the long term, making the entire build process hermetic will solve this problem entirely.

@milesdai milesdai added Priority:P0 Priority: critical Component:CI Continuous Integration (Azure Pipelines & Co.) labels Sep 1, 2022
@milesdai milesdai self-assigned this Sep 1, 2022
@milesdai
Copy link
Contributor Author

milesdai commented Sep 1, 2022

Example output from a failing ROM E2E Test:

Testing //.../rom/e2e:e2e_bootstrap_entry_fpga_cw310; 0s local
==================== Test output for //sw/device/silicon_creator/rom/e2e:e2e_bootstrap_entry_fpga_cw310:
Invoking test: sw/host/tests/rom/e2e_bootstrap_entry/e2e_bootstrap_entry --cw310-uarts=/dev/ttyACM_CW310_1,/dev/ttyACM_CW310_0 --rcfile= --logging=info --interface=cw310 --rom-kind=rom --bitstream=external/bitstreams/cache/7cbb64c42451e388246c37f46252db1fbd8cd550/lowrisc_systems_chip_earlgrey_cw310_0.1.bit.splice --bootstrap=sw/device/silicon_creator/rom/e2e/e2e_bootstrap_entry_prog_fpga_cw310.bin
sw/host/tests/rom/e2e_bootstrap_entry/e2e_bootstrap_entry: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.29' not found (required by sw/host/tests/rom/e2e_bootstrap_entry/e2e_bootstrap_entry)
sw/host/tests/rom/e2e_bootstrap_entry/e2e_bootstrap_entry: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.28' not found (required by sw/host/tests/rom/e2e_bootstrap_entry/e2e_bootstrap_entry)
sw/host/tests/rom/e2e_bootstrap_entry/e2e_bootstrap_entry: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by sw/host/tests/rom/e2e_bootstrap_entry/e2e_bootstrap_entry)
================================================================================

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component:CI Continuous Integration (Azure Pipelines & Co.) Priority:P0 Priority: critical
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant