Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skymeld crashes the build by overusing system resources. #20302

Closed
layus opened this issue Nov 23, 2023 · 10 comments
Closed

Skymeld crashes the build by overusing system resources. #20302

layus opened this issue Nov 23, 2023 · 10 comments
Labels
team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: bug untriaged

Comments

@layus
Copy link
Contributor

layus commented Nov 23, 2023

Description of the bug:

Enabling skymeld (which happened while testing 7.0.0 because it is the new default) lead to many crashes because normal actions are now competing with repository actions for system resources. As our repository actions are resource-intensive (disk, cpu and network) a lot of these actions crash when the system starves.

This is mainly because there is no way to declare the amount of resources a repository action needs. We used to address that with --loading_phase_threads=10 but skymeld makes this option is irrelevant.

Is there another way to configure expected resource usage for repository actions ? Right now we probably want to disable skymeld so failures in repository rules do not affect normal rules.

Which category does this issue belong to?

Core, External Dependency

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Not trivial, you need repository rules that use a lot of resources, enough to decrease the available system resources below the amount needed to run normal actions.

Which operating system are you running Bazel on?

Ubuntu

What is the output of bazel info release?

7.0.0rc2

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

It's kind of a feature of skymeld.

Have you found anything relevant by searching the web?

Skymeld discussion: #14057

Any other information, logs, or outputs that you want to share?

No response

@layus
Copy link
Contributor Author

layus commented Nov 23, 2023

I should probably mention that this issue arises from using rules_nixpkgs, where repository rules are (ab) used to build a whole tree of subpackages.

@layus
Copy link
Contributor Author

layus commented Nov 23, 2023

/cc @joeleba as main dev behind skymeld, AFAIKT

@joeleba
Copy link
Member

joeleba commented Nov 23, 2023

Hi. What's the nature of this crash? OOM-ing? If that's the case then you can try increasing the max heap size. https://bazel.build/advanced/performance/memory

Other than that, I think the right thing to do in this scenario is to disable Skymeld, unfortunately. At least until we can have a mechanism to automatically tell bazel to "slow down" when there's resource constraints.

For your convenience: you can disable Skymeld by passing build --noexperimental_merged_skyframe_analysis_execution in your bazelrc file.

@layus
Copy link
Contributor Author

layus commented Nov 23, 2023

Thanks @joeleba,

The crash comes from the actions themselves, not the JVM OOMing. And indeed, I have reluctantly added build --noexperimental_merged_skyframe_analysis_execution to my bazelrc file. Which feels bad because skymeld would help performance a lot in the cases where our repository actions do not consume many resources.

@sgowroji sgowroji added the team-Core Skyframe, bazel query, BEP, options parsing, bazelrc label Nov 27, 2023
@layus layus closed this as not planned Won't fix, can't repro, duplicate, stale Jan 29, 2024
@GorshkovNikita
Copy link

Hi! We are experiencing the same issue (we also use nixpkgs_rules) after upgrading to bazel 7. Adding --noexperimental_merged_skyframe_analysis_execution doesn't seem to work in our case, because we have too many repository actions, which exhaust system resources by themselves. Is there a way to make --loading_phase_threads work in bazel 7.1.2?

@fmeum
Copy link
Collaborator

fmeum commented Jun 4, 2024

@GorshkovNikita This could be #21803

@layus
Copy link
Contributor Author

layus commented Jun 4, 2024

@GorshkovNikita Is --loading_phase_threads somehow broken in 7.1.2 ? I cannot find any issue related to that.

--noexperimental_merged_skyframe_analysis_execution and --loading_phase_threads is the solution we are currently using.

@GorshkovNikita
Copy link

@GorshkovNikita This could be #21803

Thanks! I can confirm, that the problem is gone, while using bazel 7.0.2.

@fmeum
Copy link
Collaborator

fmeum commented Jun 4, 2024

@GorshkovNikita Do you mean 7.2.0?

@GorshkovNikita
Copy link

@GorshkovNikita Do you mean 7.2.0?

No, I tried it with bazel 7.0.2, like in the issue #21803.
We had bazel 6 before and upgraded to 7.1.2 right away and faced this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: bug untriaged
Projects
None yet
Development

No branches or pull requests

7 participants