-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Centralize PathMapper
construction
#18098
Conversation
64387c0
to
51031f1
Compare
@gregestren @aranguyen This is the promised refactoring. I couldn't find a way to split this into more manageable parts, but at least it removes quite a bit of code. :-) |
Test failures are caused by bazelbuild/continuous-integration#1582 and not this PR. |
I pushed an additional commit that makes the jdeps rewriting independent of the actual |
34e9d6b
to
36d5222
Compare
@justinhorvitz I noticed your recent work on removing fields from |
36d5222
to
1eb1d03
Compare
Thanks for drawing the connection to my recent work. The |
Sounds good, then I will not make any further changes to this PR for now. As we probably want to reference another flag value in the future, at that point I would look into passing in the values rather than storing them in the action (the new value will be a list, so subclassing probably wouldn't help). |
02c4d87
to
9ec5aaf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a big and interesting change.
I really like the idea that you're able to late-instantiate PathMapper
now. I'm impressed you got that to work so cleanly.
The main thing I'm wary of is locking further down to a centralized registration fro which actions trigger this. I think we ultimately have to allow actions to declare their participation, and do so in their own code.
I don't mind an experimental flag that can optionally opt in actions. But I'd still like the pattern of a generic --output_paths=strip
or --output_paths=map
flag, where whatever actions are known to be compatible with that choose to opt themselves in (or maybe one day be the few that opt themselves out).
Can we still preserve that for the native actions?
ImmutableList.Builder<String> arguments = ImmutableList.builder(); | ||
for (CommandLineAndParamFileInfo pair : getCommandLines()) { | ||
arguments.addAll(pair.commandLine.arguments(/*artifactExpander=*/ null, stripPaths)); | ||
arguments.addAll(pair.commandLine.arguments(/*artifactExpander=*/ null, PathMapper.NOOP)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no longer any path where allArguments()
or its callers apply path stripping? What's the implication of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overloads that do not accept a non-trivial ArtifactExpander
are only used during analysis, not during execution (they couldn't possibly handle tree artifacts correctly). I think that path mapping should only apply to execution, otherwise it could break Starlark analysis tests or cause aquery output to vary based on the path stripping setting. That is why I removed the PathStripper
arguments, which as an added benefit gets rid of a number of overloads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some spelunking shows me that alArguments(PathMapper)
was first added to add aquery support, to support shell-based integration testing: 9ad0a73
That's the non-open sourced test we'e talked about. It relies on aquery output to reveal command lines and check them against path stripping. There's nothing secret in that test so I'm happy to share the contents (recall the only reason it's internal is because it runs on the internal executor).
It's quite possible we don't need to rely on aquery for this kind of test. Can you think of a good alternative, especially with all this change? (again, I know it's hard to evaluate without seeing the test itself).
If we don't have a quick alternative, I'd probably want to keep in the allArguments()
overloads so we don't break the existing test coverage.
Or we could fast-track collaborating on better testing.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the test, see https://gist.github.com/gregestren/cfcdd55a7620b6c6ce3d5e40f633ce88.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative to aquery
could be to run a bazel clean
in the beginning of the test and then use --subcommands
. What do you think about that? Mid-term aquery
won't work as PathMapper
s could use the hashes of input files to form paths, but aquery
doesn't have access to this information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vaguely recall --subcommands
having problems seeing the same paths as the actual action. Maybe that's not an issue with your latest refactorings? I think this is another case of trying to follow which callers call which variations of arguments expansion.
We could also consider a Java integration test that intercepts the logic just before it goes to an actual executor. That might require a mock executor to work generically. But it could offer universal testing regardless of executor. Then use shell tests for black-box execution coverage, i.e "expect build B works and don't care how the logic is implemented".
What would the story for full Bazel test support look like? We've been leaning toward excluding local execution from path mapping because it's not fundamentally compatible with the feature. But some variations of local execution are compatible, and I believe your intention is that your code naturally works with those implementations? Or would it be easier to support in-test RBE calls than I imagine? Or should we make a generic mock executor? I worry a mock would be complicated since part of "build success" involves knowing Java, etc. compilation runs correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from details there's also process:
I'm not thrilled about it, but we could drop existing testing for the purpose of landing your refactoring. Then build new testing as a hopefully quick followup. Most of @aranguyen 's current testing is running standard CI with a custom Bazel that uses path mapping. She can still do that even if direct tests for this feature are disabled.
src/main/java/com/google/devtools/build/lib/actions/PathMapper.java
Outdated
Show resolved
Hide resolved
OutputPathsMode effectiveOutputPathsMode = getEffectiveOutputPathsMode(outputPathsMode, | ||
mnemonic); | ||
if (effectiveOutputPathsMode == OutputPathsMode.STRIP) { | ||
fingerprint.addString(StrippingPathMapper.GUID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this GUID specifically? Isn't the PathMappers interface more general than just StrippingPathMapper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is more general, but in order to map output path modes to actual instances, we still need to reference them here. I could see adding another layer of indirection once we get more modes, but felt that this would have been too much at this time given that we only really have one.
@@ -112,8 +110,8 @@ public class SpawnAction extends AbstractAction implements CommandAction { | |||
|
|||
private final ResourceSetOrBuilder resourceSetOrBuilder; | |||
private final ImmutableMap<String, String> executionInfo; | |||
private final Consumer<Pair<ActionExecutionContext, List<SpawnResult>>> resultConsumer; | |||
private final boolean stripOutputPaths; | |||
private final ResultConsumer resultConsumer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this refactoring for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The consumer now has to accept a third argument, the PathMapper
. There is no ready-made Triple
interface, so I crafted my own @FunctionalInterface
- that's all there is to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will go away with the resolution of merge conflicts as @justinhorvitz has gone rid of it.
* <p>Since we don't currently have a proper Starlark API for this, we hard-code support for | ||
* specific actions we're interested in. | ||
* | ||
* <p>The difference between this and {@link PathMapper#mapCustomStarlarkArgs} is this triggers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment still relevant or no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partially yes. Now that the opt-in logic has moved to PathMapper
, it is in the same place as the logic that maps custom Starlark args, which may make this comment less necessary.
args.accept(CommandLineItem.expandToCommandLine(object)); | ||
} | ||
}; | ||
final boolean isJavaAction = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only for the purpose of handling --javacopts
, right?
I realize there are some specific action references in this file. Over time I think we want to aim for eliminating such references in preference for embedding them in the action definition itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think that this one will remain the only one that we need to handle specially: Starlarkified native rules already use custom location expansion logic that in large parts doesn't go through ctx.expand_locations
. I am working on a pure Starlark alternative that would emit an Args
objects rather than strings, which would allow the regular structured command line path mapping to apply here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment to clarify that?
Oh, this is big enough where I'm unsure of the testing story. We have some integration tests in Google, which are Google-internal because they're hooked up to the Google executor. When we're comfortable with this we should run this PR by them to check for bugs. We don't have any testing for just the command-line rewriting (independent of executor), do we? This could be a good opportunity to add some. Would you be interested? I can work with you on that too. |
I am interested in adding tests. My follow-up PR includes integration tests for path stripping in Bazel, but I agree that more fine-grained tests would be helpful too. Do you already have a concept for such a test in mind? |
src/main/java/com/google/devtools/build/lib/analysis/actions/CustomCommandLine.java
Show resolved
Hide resolved
- Put path stripping assertion logic into a helper library. - Open-source Java and in-memory jdeps tests - Don't yet open-source Android tests (takes more setup to work in Bazel). All Bazel tests disabled until #18098 (and followup) is merged. That's because we need a compatible executor for path mapping to work. Partially fixes #19525. PiperOrigin-RevId: 570518512 Change-Id: Ic8ee9718b93150df3cb071d9e5e977ca6244f382
This provides a simple and backwards compatible way for actions to experiment with path mapping: * rules authors can add `supports-path-mapping` to the `execution_requirements` of a Starlark action * end users can use tags and `--modify_execution_info` to opt individual targets or actions in and out of path mapping. Also includes StrippingPathMapperTest, which wasn't included in the merge of bazelbuild#18098, with an additional test for a Starlark rule that opts in via the mechanisms described above.
This provides a simple and backwards compatible way for actions to experiment with path mapping: * rules authors can add `supports-path-mapping` to the `execution_requirements` of a Starlark action * end users can use tags and `--modify_execution_info` to opt individual targets or actions in and out of path mapping. Also includes StrippingPathMapperTest, which wasn't included in the merge of bazelbuild#18098, with an additional test for a Starlark rule that opts in via the mechanisms described above.
This provides a simple and backwards compatible way for actions to experiment with path mapping: * rules authors can add `supports-path-mapping` to the `execution_requirements` of a Starlark action * end users can use tags and `--modify_execution_info` to opt individual targets or actions in and out of path mapping. Also includes StrippingPathMapperTest, which wasn't included in the merge of bazelbuild#18098, with an additional test for a Starlark rule that opts in via the mechanisms described above.
This provides a simple and backwards compatible way for actions to experiment with path mapping: * rules authors can add `supports-path-mapping` to the `execution_requirements` of a Starlark action * end users can use tags and `--modify_execution_info` to opt individual targets or actions in and out of path mapping. Also includes StrippingPathMapperTest, which wasn't included in the merge of bazelbuild#18098, with an additional test for a Starlark rule that opts in via the mechanisms described above.
This provides a simple and backwards compatible way for actions to experiment with path mapping: * rules authors can add `supports-path-mapping` to the `execution_requirements` of a Starlark action * end users can use tags and `--modify_execution_info` to opt individual targets or actions in and out of path mapping. Also includes StrippingPathMapperTest, which wasn't included in the merge of bazelbuild#18098, with an additional test for a Starlark rule that opts in via the mechanisms described above.
@fmeum I observed another interesting case where bazel/src/main/java/com/google/devtools/build/lib/analysis/actions/StrippingPathMapper.java Line 243 in 1c45f49
--experimental_output_path=off . Any idea on why this could be the case. I'm seeing this for Javac action
|
Could you explain this in more detail? Does the commandline differ between |
This provides a simple and backwards compatible way for actions to experiment with path mapping: * rules authors can add `supports-path-mapping` to the `execution_requirements` of a Starlark action * end users can use tags and `--modify_execution_info` to opt individual targets or actions in and out of path mapping. Also includes `StrippingPathMapperTest`, which wasn't included in the merge of #18098, with an additional test for a Starlark rule that opts in via the mechanisms described above. Closes #19526 Work towards #6526 Closes #19717. PiperOrigin-RevId: 571934863 Change-Id: Ia94101d10d07599c20a42c958417349bb37d4682
Yes that is the exact situation. Here is an example comparison of the commandlines between the twos builds with binaries having This particular action failed during reduced classpath which prompted me to look at the replies from the internal executor and they came out looking different, causing |
@aranguyen I added test coverage for this case to #18155 (both in I am available at 8:30 AM EST and later, just send an invite and we can debug this together. |
@fmeum thanks. Let me take a look at your test in #18155 . It's very possible that this is something internally as you mentioned because the PathMapper in the spawn is also used later to prepare the input mappings for the executor. Let me tripple check the internal side of thing and will report back. If not already done so, would you be able to see if the list of Deps.dependencies is same for both cases here? So let's hold on the video chat while I verify things internally first. |
I pushed a new version of the latest commit that ensures that the library with effective path mapping |
@fmeum I finally understand what is going on. Here is the summary:
|
Thanks for the investigation and detailed explanation, I think I now understand the problem.
Could we fix this type of issue by making sure that I added some debugging output to |
We do have this code here to map paths to the original but I am not how it ties with
sgtm. Thank you and I'll update on my side as well. |
@aranguyen I needed to add a4e1d7b#diff-738efdd95b5eedf99d16e391201cb00e284b0ccf3542506f6c88cb01ffb96830 (a new commit in #18155) to ensure that all |
@fmeum What is this additional |
@aranguyen The earlier call only applies if the reduced classpath spawn succeeds. If it doesn't, the output deps proto is overwritten with the result of the full spawn, which wasn't rewritten correctly without this change. |
I was able to resolve the issue by making the logic in getReduceClasspath to not be sensitive to the mixed paths. Like I said in earlier comment. I still see mixed paths for full class path case. |
* Strips the configuration prefix from an output artifact's exec path. | ||
*/ | ||
private static PathFragment strip(PathFragment execPath) { | ||
return execPath.subFragment(0, 1).getRelative(execPath.subFragment(2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified this to
private static PathFragment strip(PathFragment execPath) {
String config = "/" + execPath.subFragment(1, 2).getPathString();
String mappedPath = execPath.getPathString().replace(config, "");
return PathFragment.create(mappedPath);
}
to take care of paths in the form blaze-out/k8-fasbuilds/bin/foo/blaze-out/k8-fasbuilds/genfiles/bar
. The existing one will make the path look like blaze-out/bin/foo/blaze-out/k8-fasbuilds/genfiles/bar
. Let me know if you have any concern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No concern, but I would be very interested in learning where paths like this show up.
// and "-", followed by another slash. This would miss substrings like | ||
// "bazel-out/k8-fastbuild". But those don't represent actual outputs (all outputs would have | ||
// to have names beneath that path). So we're not trying to replace those. | ||
return Pattern.compile(outputRoot + "/[\\w_-]+/"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we've clarified pretty well what's going on with the .jdeps concern. We found out that Google's Java header compiler runs some actions that where the outputs can embed the paths of the inputs. It unfortunately does this in a way that's really hard to capture and post-process on the Bazel side: JavaCompileAction.createFullOutputDeps isn't powerful enough to capture it (the mapping can be present in paths that aren't part of the action's direct inputs, but rather its transitive inputs). With the failing instance I'm aware of, it turns out it was triggering bazel/src/main/java/com/google/devtools/build/lib/analysis/actions/StrippingPathMapper.java Lines 262 to 263 in 0a2aac4
It disabled path mapping for that Java compilation when it wasn't actually necessary to disable it. The whole failure mode is pretty obscure. I think we can heavily minimize it's likelihood by a) fixing the above TODO (if all Java actions are consistently mapped this problem doesn't occur) The only really problematic use case is when we have Java actions of the form: |
I looked into this a bit more and finally understood the root cause better. The Turbine behavior is pretty well explained in google/turbine@f9f2dec: The paths of transitive dependencies are embedded in the header jars so that compiling against the direct deps is sufficient absent annotation processors. I was able to reproduce failures due to this by compiling Bazel itself with path mapping enabled and stricter checks in I think that this can be fixed by taking the additional classpath entries into account even if they aren't inputs to the action. I submitted a fix in #19872 that allows me to compile Bazel itself without issues. |
The new
PathMappers
helper class is introduced to be the canonical place for actions to createPathMapper
instances from. As a result, the particular kind ofPathMapper
can now be switched out without having to change individual actions.This requires a larger refactoring of how
PathMapper
instances are created and passed around, which results in a number of additional improvements:PathMapper
instances are now no longer retained inAction
, which improves the memory footprint of path mapping. Instances are created only when anAction
is converted into aSpawn
for execution, with one exception: For testing purposes,aquery
causes the creation of aPathMapper
that is applied to the command lines it reports.PathMapper
viaSpawn#getPathMapper
, participating executors automatically use the same instance that was used to map paths in the action's command lines, which ensures consistency.PathMapper
instance, thus ensuring incremental correctness when the value of the--experimental_output_path_mode
flag or the implementation of a particularPathMapper
changes.PathMapper
s are no longer passed into builders for structured command lines, but are supplied when expanding command lines. In addition to not retaining them, this also reduces the number of places in which participating actions have to pass inPathMapper
s.Apart from completely pure refactorings, two more interesting changes were necessary to prevent
PathMapper
instances from being constructed during analysis:--javacopts
), a new function onPathMapper
now allows mapping arbitray string-valued vector args. This is only relevant for native rules as Starlark rules can perform location expansion manually and emit a structured command line. A helper function for this may be provided in the future.