Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

./pants --changed .. does not work for deleted targets #382

Closed
jsirois opened this issue Jul 23, 2014 · 12 comments
Closed

./pants --changed .. does not work for deleted targets #382

jsirois opened this issue Jul 23, 2014 · 12 comments

Comments

@jsirois
Copy link
Contributor

jsirois commented Jul 23, 2014

The WhatChanged task has 2 modes:

  1. list changed files
  2. list the targets owning the changed files (what targets were affected by the change)

Mode 2 uses the data from mode 1 and for each changed file looks for the target that owns the file. This check is done in the current workspace and so no target can ever own deleted files from 1. It seems the task either needs to back up to the prior change safely, check, then roll forward - scary, or else have a way to run payload source globbing against path names without real fs path checks. In otherwords globs would have to run in a predicate mode, "do you select this path" as opposed to the current mode of select all matching paths from the filesystem.

@tejal29
Copy link
Contributor

tejal29 commented Jul 23, 2014

I had some initial groundwork done for this in the Twitter Internal ticket. Will add you as a watcher there.

@areitz
Copy link
Contributor

areitz commented Mar 11, 2015

I agree with John's analysis. Mode #1 (using the --changed-files argument) works fine in the face of deleted files. It is mode #2 which is the issue. In more detail, this line of code:

https://github.com/pantsbuild/pants/blob/master/src/python/pants/backend/core/tasks/what_changed.py#L61

Does two things: generates a list of changed files (which works fine in the case of deleted files), and for each changed file, tries to map it to a target via self._mapper.target_addresses_for_source(changedfile). Digging into this one, pants goes into _find_owners():

https://github.com/pantsbuild/pants/blob/master/src/python/pants/base/lazy_source_mapper.py#L72

Where in a loop, it is analyzing the path of the changed file, and trying to find the nearest BUILD file. Once a candidate BUILD file has been selected, it calls target.sources_relative_to_buildroot():

https://github.com/pantsbuild/pants/blob/master/src/python/pants/base/lazy_source_mapper.py#L100

This is the method that examines the actual files on disk, and maps them to a target.

As John stated, we can either attempt some git-fu to allow target.sources_relative_to_buildroot() to run in the previous SHA (before the file was deleted), or we can create some new machinery to thread the actual changed file path into the code that is looking at the sources that are attached to a target.

@areitz
Copy link
Contributor

areitz commented Mar 11, 2015

I just thought of an interesting hack. What if, pants tracked all of the deleted files separately. Then, before this call:

https://github.com/pantsbuild/pants/blob/master/src/python/pants/backend/core/tasks/what_changed.py#L61

pants could touch all deleted files back into existence, run the check, and on the other side, re-delete them again. It would be doing some unnecessary work (adding 0-byte files & then removing them), but it would avoid changing the guts of lazy_source_mapper.py.

@jsirois
Copy link
Contributor Author

jsirois commented Mar 12, 2015

I don't find that hack any less scary than the hack I described of rolling back, checking and rolling forward. The predicate mode is currently defeated only by pants/base/validation.py:13 afaict which turns a Fileset into a list, throwing away any predicate mode that might be added to Fileset.

@baroquebobcat
Copy link
Contributor

I think if we made globs lazy, we could do this more easily. If globs / target sources were lazy, then targets would have a list of the globs they use for looking up files. We could then compare those globs to the git changed file list to derive the targets that had changed.

@baroquebobcat
Copy link
Contributor

Another issue with the deleted / changed files detection is that if a BUILD file has a target removed, what changed doesn't know to check for dependees of the targets that no longer exist to ensure they are still valid.

@fkorotkov
Copy link
Contributor

Seems we have 2 different cases:

  1. Source/Resource files was changed/removed
  2. BUILD file was changed/removed

For 1 I like Nick's proposal. Make globs/rglobs lazy and use globs in LazySourceMapper instead of mapping from existing files to targets.

For 2 it seems we need to walk the build graph anyway. If a BUILD file was change we should automatically mark all targets in it as changed and walk the graph to find targets that have dependencies on targets that were removed from the BUILD file. We should add them to ChangeCalculator#_directly_changed_targets.

Changes that needs to be implemented:

  1. Make globs lazy
  2. Use globs in LazySourceMapper(Ask a glob if it claims a changed file)
  3. Respect removals in BUILD files(Check build graph for absent dependencies to the BUILD file)

@areitz
Copy link
Contributor

areitz commented Aug 7, 2015

Does change #3 require walking the entire build graph?

@fkorotkov
Copy link
Contributor

@areitz only in case of changes to BUILD files. I don't see other way to find targets that depend on a removed one.

@fkorotkov
Copy link
Contributor

Addressed 1 and 2 in #1944

@stuhood stuhood changed the title ./pants goal changed does not work for deleted files ./pants goal changed does not work for deleted targets Mar 28, 2018
@stuhood stuhood changed the title ./pants goal changed does not work for deleted targets ./pants --changed .. does not work for deleted targets Mar 28, 2018
@stuhood
Copy link
Sponsor Member

stuhood commented Mar 28, 2018

I've updated the title to account for the current state of the world. This is somewhat mitigated by a workaround that forces the equivalent of ./pants list :: in order to compute --changed-include-dependees. This has the effect of detecting dependents of deleted targets and failing, but without --changed-include-dependees, no changes will be detected at all.

A holistic solution to this problem is likely possible in the new engine (and perhaps not even that hard), by adding an explicit mode of operation that applies a filesystem overlay for the length of a request... that would allow us to delta the graph and... do something with it. TBD.

stuhood pushed a commit that referenced this issue Mar 28, 2018
### Problem

#5579 broke detection of deleted targets.

### Solution

As described in #382 (mega classic!), it might be possible to more deeply integrate change detection into the v2 engine itself to compute a delta on the graph. But for now we defer a deeper solution and simply ensure that we fail for deleted targets by transitively expanding targets. Adds a test to cover the behaviour.

### Result

Due to fully hydrating targets, this represents a linear performance regression from #5579: the runtime of `--changed-include-dependees=transitive` for a small set of roots used to be slightly lower than the runtime for `./pants list ::`, because the operation that occurred "on the entire graph" was computing `HydratedStructs`, rather than computing `TransitiveHydratedTargets`.

The impact for exactly that step is constant, and fairly high:
```
# before:
  DEBUG] Root Select(Collection(dependencies=(DescendantAddresses(directory=u''),)), =Collection.of(Struct)) completed.
  DEBUG] computed 1 nodes in 1.709688 seconds. there are 8858 total nodes.

# after:
  DEBUG] Root Select(Collection(dependencies=(DescendantAddresses(directory=u''),)), =TransitiveHydratedTargets) completed.
  DEBUG] computed 1 nodes in 2.989497 seconds. there are 15916 total nodes.
```
... but the impact on overall runtime is dependent on the count of targets that are transitively affected, because for all affected targets, we're going to need to compute transitive roots anyway. So for the example change from #5579 which affects 567 targets:
```
time ./pants --changed-diffspec=22ca0604b1c6ce8de019214b821e922aac66b026^..22ca060 --changed-include-dependees=transitive list | wc -l
# before:
  real	0m4.877s
  user	0m4.081s
  sys	0m1.068s

# after
  real	0m5.294s
  user	0m4.487s
  sys	0m1.142s
```
For a change impacting only 14 targets the difference is slightly more pronounced:
```
$ time ./pants --changed-diffspec=f35e1e6fb1cdf45fcb5080cfe567bdbae8060125^..f35e1e6 --changed-include-dependees=transitive list | wc -l
# before:
  real	0m4.279s
  user	0m3.376s
  sys	0m1.011s

# after:
  real	0m4.954s
  user	0m4.284s
  sys	0m1.120s
```
wisechengyi pushed a commit to wisechengyi/pants that referenced this issue Mar 31, 2018
As described in pantsbuild#382 (mega classic!), it might be possible to more deeply integrate change detection into the v2 engine itself to compute a delta on the graph. But for now we defer a deeper solution and simply ensure that we fail for deleted targets by transitively expanding targets. Adds a test to cover the behaviour.

Due to fully hydrating targets, this represents a linear performance regression from pantsbuild#5579: the runtime of `--changed-include-dependees=transitive` for a small set of roots used to be slightly lower than the runtime for `./pants list ::`, because the operation that occurred "on the entire graph" was computing `HydratedStructs`, rather than computing `TransitiveHydratedTargets`.

The impact for exactly that step is constant, and fairly high:
```
  DEBUG] Root Select(Collection(dependencies=(DescendantAddresses(directory=u''),)), =Collection.of(Struct)) completed.
  DEBUG] computed 1 nodes in 1.709688 seconds. there are 8858 total nodes.

  DEBUG] Root Select(Collection(dependencies=(DescendantAddresses(directory=u''),)), =TransitiveHydratedTargets) completed.
  DEBUG] computed 1 nodes in 2.989497 seconds. there are 15916 total nodes.
```
... but the impact on overall runtime is dependent on the count of targets that are transitively affected, because for all affected targets, we're going to need to compute transitive roots anyway. So for the example change from pantsbuild#5579 which affects 567 targets:
```
time ./pants --changed-diffspec=22ca0604b1c6ce8de019214b821e922aac66b026^..22ca060 --changed-include-dependees=transitive list | wc -l
  real	0m4.877s
  user	0m4.081s
  sys	0m1.068s

  real	0m5.294s
  user	0m4.487s
  sys	0m1.142s
```
For a change impacting only 14 targets the difference is slightly more pronounced:
```
$ time ./pants --changed-diffspec=f35e1e6fb1cdf45fcb5080cfe567bdbae8060125^..f35e1e6 --changed-include-dependees=transitive list | wc -l
  real	0m4.279s
  user	0m3.376s
  sys	0m1.011s

  real	0m4.954s
  user	0m4.284s
  sys	0m1.120s
```
@stuhood
Copy link
Sponsor Member

stuhood commented May 3, 2019

So, after #7581, I've come to the conclusion that using --changed-include-dependees=transitive makes this a non-issue, due to that codepath now being eager, and due to test coverage on that fact.

While we would not report a deleted target in changed, we would successfully detect any dependents of a deleted target and error for them. And I think that that is all that can be reasonably expected for this mode of operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants