Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote_download_minimal: downloaded inputs shouldn't be deleted after the build #12855

Closed
Tracked by #12665
brentleyjones opened this issue Jan 19, 2021 · 9 comments
Closed
Tracked by #12665
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request

Comments

@brentleyjones
Copy link
Contributor

Description of the problem / feature request:

When using --remote_download_minimal, downloaded inputs shouldn't be deleted after the build.

Feature requests: what underlying problem are you trying to solve with this feature?

Currently, when bazel needs to download inputs for a locally run action while using --remote_download_minimal, it deletes these inputs after the build:

/**
* Delete any input files that have been fetched from the remote cache during the build. This is
* so that Bazel's view of the output base is identical with the output base after a build i.e.
* files that Bazel thinks exist only remotely actually do.
*/
private void deleteDownloadedInputs() throws IOException {
if (actionInputFetcher == null) {
return;
}
IOException deletionFailure = null;
for (Path file : actionInputFetcher.downloadedFiles()) {
try {
file.delete();
} catch (IOException e) {
logger.atSevere().withCause(e).log(
"Failed to delete remote output '%s' from the output base.", file);
deletionFailure = e;
}
}
if (deletionFailure != null) {
throw deletionFailure;
}
}

This makes incremental compilation less efficient.

For example, if we set a bundling rule to run locally, bazel will download all of the various artifacts that need to be bundled together, and if we incrementally only change a single artifact in that bundle, bazel will have to re-download all of the artifacts, even though they didn't change since the last build. The fact that disk_cache doesn't work with remote builds ensure that this is a slow download as well.

What operating system are you running Bazel on?

macOS 10.15.7

What's the output of bazel info release?

release 4.0.0rc10

Have you found anything relevant by searching the web?

@coeuvre coeuvre self-assigned this Jan 20, 2021
@coeuvre coeuvre added P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request labels Jan 20, 2021
bazel-io pushed a commit that referenced this issue Mar 16, 2021
Before this change, when remote build without bytes is enabled, intermediate outputs which are also inputs to local actions will be downloaded for local execution. After build finished, these downloaded files are deleted to keep Bazel's view of the output base identical with the output base i.e. files that Bazel thinks exist only remotely actually do. However, these intermediate outputs maybe used later in which case Bazel need to download them again which is a performance issue. #12855

This change fix the issue by removing the code used to delete downloaded files. Bazel should be able to take the local file as the source of truth if it exists (otherwise, it is a bug).

This is also an essential step to implement separation of downloads. #12665

PiperOrigin-RevId: 363131672
@coeuvre
Copy link
Member

coeuvre commented Mar 16, 2021

Fixed by f0c32de.

@brentleyjones
Copy link
Contributor Author

Thank you!

@brentleyjones
Copy link
Contributor Author

brentleyjones commented Mar 17, 2021

@coeuvre One "benefit" of this issue, that I've come to like since filing the bug, is that it allows large working sets to happen on space constrained machines (ones that the build would fail on before reaching the end, where one could call bazel clean). I wonder if we want this to be configurable in some way? And if so, could it be scoped (so we could apply it via a tag, and thus use --modify_execution_info)?

@coeuvre
Copy link
Member

coeuvre commented Mar 18, 2021

I don't think unconditionally delete downloaded intermediate outputs is a good way to manage the content of output base. bazel clean is a valid work around in this case since you will hit remote cache next time running bazel build and only download necessary files.

One way I could image is that we can collect the usages of these downloaded output files and delete those which are not used after a certain period of time.

@brentleyjones
Copy link
Contributor Author

For my (new) specific use case, the build will fail if it doesn't delete the intermediate outputs during the build, since the local machine doesn't have enough space to have all of them at once. If it deletes as it goes the build succeeds. I'm not saying that my use case warrants a change, but this might count as a breaking change because of that?

@coeuvre
Copy link
Member

coeuvre commented Mar 19, 2021

deleteDownloadedInputs was called after a build command. I don't think we were able to delete intermediate outputs during the build.

@brentleyjones
Copy link
Contributor Author

Hmm. I was seeing that the local inputs were deleted during the build. For example, we run tests locally, so the tests themselves are being downloaded, but then deleted right after each one is run (at least it looks that way).

@brentleyjones
Copy link
Contributor Author

brentleyjones commented Aug 26, 2021

I'm running into an issue on HEAD (though it probably existed since this was fixed), where even though the inputs are no longer deleted, they aren't used on subsequent runs, causing them to be downloaded again. If I generate the inputs locally, so they aren't downloaded for the local action, they aren't downloaded on subsequent runs.

Of note for this, the inputs being downloaded are outputs (though not all the outputs) from previous actions in the graph.

@coeuvre
Copy link
Member

coeuvre commented Aug 27, 2021

Can you please create a new issue and maybe include a repro there?

glukasiknuro pushed a commit to glukasiknuro/bazel that referenced this issue Sep 28, 2021
Before this change, when remote build without bytes is enabled, intermediate outputs which are also inputs to local actions will be downloaded for local execution. After build finished, these downloaded files are deleted to keep Bazel's view of the output base identical with the output base i.e. files that Bazel thinks exist only remotely actually do. However, these intermediate outputs maybe used later in which case Bazel need to download them again which is a performance issue. bazelbuild#12855

This change fix the issue by removing the code used to delete downloaded files. Bazel should be able to take the local file as the source of truth if it exists (otherwise, it is a bug).

This is also an essential step to implement separation of downloads. bazelbuild#12665

PiperOrigin-RevId: 363131672
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request
Projects
None yet
Development

No branches or pull requests

2 participants