Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The trace for an error in a context with a lot of dependents is redundant #3695

Closed
stuhood opened this issue Jul 19, 2016 · 4 comments · Fixed by #6013
Closed

The trace for an error in a context with a lot of dependents is redundant #3695

stuhood opened this issue Jul 19, 2016 · 4 comments · Fixed by #6013
Assignees
Milestone

Comments

@stuhood
Copy link
Sponsor Member

stuhood commented Jul 19, 2016

The Graph.trace for a cycle detected in a context where the failed/cyclic Node has a lot of dependents is very redundant. See @ericzundel 's response to my question on https://rbcommons.com/s/twitter/r/4087/ for an example.

To repro, introduce a failure inside of a graph, and then list a target spec that covers a bunch of things that depend on it... ie: break guava in 3rdparty, and then attempt to run ./pants list ::.

Most likely what we want to do is invert the trace: rather than rendering downward from the roots, we should discover all failed nodes, and then, for each node, choose some number of relevant paths (or just one: the shortest...?) to the root to render.

@baroquebobcat
Copy link
Contributor

I think that's a good idea. Then it becomes a sorting problem to present the most useful error paths.

@stuhood
Copy link
Sponsor Member Author

stuhood commented Mar 22, 2017

This is now related to #4358... because SelectTransitive is used to walk target dependencies, cycle detection messages are now pushed back to where they are in the v1 engine (ie, only during sort_targets). Removing from the milestone.

@stuhood stuhood removed this from the v2 engine: daemon-by-default beta milestone Mar 22, 2017
@stuhood stuhood changed the title The trace for a cycle detected in a context with a lot of dependents is redundant The trace for an error in a context with a lot of dependents is redundant Apr 28, 2017
@stuhood stuhood added this to the Daemon Beta milestone Dec 16, 2017
@stuhood stuhood assigned caveness and unassigned caveness Dec 20, 2017
@stuhood
Copy link
Sponsor Member Author

stuhood commented Dec 20, 2017

Looked into this a bit with @caveness , and determined that since switching to Futures, we now eagerly cancel all other work after the first failure, which results in the effect we want here, with the exception of one small issue:

Cancelled nodes have no EntryStateField value, so they render as <none>. It cleans up the trace considerably to treat cancelled nodes as successful. Will tackle this tonight.

@stuhood stuhood self-assigned this Dec 20, 2017
kwlzn pushed a commit that referenced this issue Dec 27, 2017
Problem
#3695 initially described a solution to a problem experienced in the original "slow-failing" implementation of the engine: if there were multiple paths between a root and a failure, we would render all of them, which was hugely redundant.

But since that ticket was opened, we switched to a fast-failing implementation via the switch to Futures. When one dependency of a Node fails, the rest are effectively cancelled, since Futures are pull based. This (should) have the effect of rendering exactly one failure path per root: the first failure that is encountered while computing that root.

But it was still the case that we were printing redundant/useless information: cancelled dependencies were being rendered, which in the case of things like ./pants list :: resulted in a lot of noise.

Solution
Since a cancelled node is never the source of a failure, don't render them.

Result
The added test no longer renders a <None> entry, like:

Computing Select(<pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =A)
  Computing Task(<class 'pants_test.engine.test_engine.A'>, <pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =A)
    Computing Task(<class 'pants_test.engine.test_engine.D'>, <pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =D)
      <None>
    Computing Task(<function nested_raise at 0xEEEEEEEEE>, <pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =C)
      Throw(An exception for B)
        Traceback (most recent call last):
          File LOCATION-INFO, in extern_invoke_runnable
            val = runnable(*args)
          File LOCATION-INFO, in nested_raise
            fn_raises(x)
          File LOCATION-INFO, in fn_raises
            raise Exception('An exception for {}'.format(type(x).__name__))
        Exception: An exception for B
Fixes #3695.
baroquebobcat pushed a commit to twitter/pants that referenced this issue Dec 28, 2017
Problem
pantsbuild#3695 initially described a solution to a problem experienced in the original "slow-failing" implementation of the engine: if there were multiple paths between a root and a failure, we would render all of them, which was hugely redundant.

But since that ticket was opened, we switched to a fast-failing implementation via the switch to Futures. When one dependency of a Node fails, the rest are effectively cancelled, since Futures are pull based. This (should) have the effect of rendering exactly one failure path per root: the first failure that is encountered while computing that root.

But it was still the case that we were printing redundant/useless information: cancelled dependencies were being rendered, which in the case of things like ./pants list :: resulted in a lot of noise.

Solution
Since a cancelled node is never the source of a failure, don't render them.

Result
The added test no longer renders a <None> entry, like:

Computing Select(<pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =A)
  Computing Task(<class 'pants_test.engine.test_engine.A'>, <pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =A)
    Computing Task(<class 'pants_test.engine.test_engine.D'>, <pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =D)
      <None>
    Computing Task(<function nested_raise at 0xEEEEEEEEE>, <pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =C)
      Throw(An exception for B)
        Traceback (most recent call last):
          File LOCATION-INFO, in extern_invoke_runnable
            val = runnable(*args)
          File LOCATION-INFO, in nested_raise
            fn_raises(x)
          File LOCATION-INFO, in fn_raises
            raise Exception('An exception for {}'.format(type(x).__name__))
        Exception: An exception for B
Fixes pantsbuild#3695.
@stuhood
Copy link
Sponsor Member Author

stuhood commented Mar 16, 2018

Reopening this post #5605 because it repros for the cycle detection triggered by #4358.

@stuhood stuhood reopened this Mar 16, 2018
stuhood pushed a commit that referenced this issue Mar 19, 2018
### Problem

While chasing performance in one area for #4349 we added `SelectTransitive`. This decreased performance in another significant area: executions involving multiple transitive roots can't structure-share at all when expanding dependencies, leading to many redundant walks (#4533). 

Additionally, usability regressed in a few ways: the product `Graph` could not implement cycle detection for dependencies expanded via `SelectTransitive` (#4358), and in the case of a missing or broken dependency, the referring context was lost (#4515). 

### Solution

* Remove usage of `SelectTransitive` for `TransitiveHydratedTargets` to reintroduce structure sharing and improve usability (fixes #4358 and fixes #4515).
* Replace `@rule`s that operate on single-`Spec`s with batch forms that operate on a `Specs` collection (fixes #4533).

### Result

Cycles should be detected much earlier with:
```
Exception message: Build graph construction failed: ExecutionError Received unexpected Throw state(s):
Computing Select(Collection(dependencies=(SingleAddress(directory=u'testprojects/src/java/org/pantsbuild/testproject/cycle1', name=u'cycle1'),)), =TransitiveHydratedTargets)
  Computing Task(transitive_hydrated_targets, Collection(dependencies=(SingleAddress(directory=u'testprojects/src/java/org/pantsbuild/testproject/cycle1', name=u'cycle1'),)), =TransitiveHydratedTargets)
    Computing Task(transitive_hydrated_target, testprojects/src/java/org/pantsbuild/testproject/cycle1:cycle1, =TransitiveHydratedTarget)
      Computing Task(transitive_hydrated_target, testprojects/src/java/org/pantsbuild/testproject/cycle2:cycle2, =TransitiveHydratedTarget)
        Throw(No source of required dependency: Dep graph contained a cycle.)
          Traceback (no traceback):
            <pants native internals>
          Exception: No source of required dependency: Dep graph contained a cycle.
```
_(more polish needed here: re-opened #3695 to clean up `trace`)_

And time taken is proportional to the total number of matched targets, rather than to the sum of the closure sizes of the roots:
```
# before:
$ time ./pants --target-spec-file=targets.txt list | wc -l
    1500

real	0m15.297s
user	0m14.603s
sys	0m1.625s

# after:
$ time ./pants --target-spec-file=targets.txt list | wc -l
    1500

real	0m5.989s
user	0m5.261s
sys	0m1.310s
```

Runtimes for things like `./pants list ::` are unaffected, although memory usage increases by about 5%, likely due to the branchier resulting `Graph`.
stuhood pushed a commit that referenced this issue Mar 20, 2018
While chasing performance in one area for #4349 we added `SelectTransitive`. This decreased performance in another significant area: executions involving multiple transitive roots can't structure-share at all when expanding dependencies, leading to many redundant walks (#4533).

Additionally, usability regressed in a few ways: the product `Graph` could not implement cycle detection for dependencies expanded via `SelectTransitive` (#4358), and in the case of a missing or broken dependency, the referring context was lost (#4515).

* Remove usage of `SelectTransitive` for `TransitiveHydratedTargets` to reintroduce structure sharing and improve usability (fixes #4358 and fixes #4515).
* Replace `@rule`s that operate on single-`Spec`s with batch forms that operate on a `Specs` collection (fixes #4533).

Cycles should be detected much earlier with:
```
Exception message: Build graph construction failed: ExecutionError Received unexpected Throw state(s):
Computing Select(Collection(dependencies=(SingleAddress(directory=u'testprojects/src/java/org/pantsbuild/testproject/cycle1', name=u'cycle1'),)), =TransitiveHydratedTargets)
  Computing Task(transitive_hydrated_targets, Collection(dependencies=(SingleAddress(directory=u'testprojects/src/java/org/pantsbuild/testproject/cycle1', name=u'cycle1'),)), =TransitiveHydratedTargets)
    Computing Task(transitive_hydrated_target, testprojects/src/java/org/pantsbuild/testproject/cycle1:cycle1, =TransitiveHydratedTarget)
      Computing Task(transitive_hydrated_target, testprojects/src/java/org/pantsbuild/testproject/cycle2:cycle2, =TransitiveHydratedTarget)
        Throw(No source of required dependency: Dep graph contained a cycle.)
          Traceback (no traceback):
            <pants native internals>
          Exception: No source of required dependency: Dep graph contained a cycle.
```
_(more polish needed here: re-opened #3695 to clean up `trace`)_

And time taken is proportional to the total number of matched targets, rather than to the sum of the closure sizes of the roots:
```
$ time ./pants --target-spec-file=targets.txt list | wc -l
    1500

real	0m15.297s
user	0m14.603s
sys	0m1.625s

$ time ./pants --target-spec-file=targets.txt list | wc -l
    1500

real	0m5.989s
user	0m5.261s
sys	0m1.310s
```

Runtimes for things like `./pants list ::` are unaffected, although memory usage increases by about 5%, likely due to the branchier resulting `Graph`.
stuhood added a commit to twitter/pants that referenced this issue Jul 3, 2018
stuhood pushed a commit that referenced this issue Jul 3, 2018
### Problem

#4558 introduces a new state for a `Node`: it may be `dirty`, which requires that we remember the previous value of a `Node` while simultaneously computing its next value. I began implementing this state machine with our existing `future::Shared` usage, but ran into the need to either hide values in the closure of the running future, or store values atomically outside of the future's value. This complexity represents a classic case for a state machine.

Additionally, we have seen that `future::Shared` has a relatively high cost in terms of complexity and memory (see #5990).

### Solution

In order to make #4558 easier to implement correctly (and to reduce memory usage), implement the storage of a `Node`'s value in the `Graph` via a state machine.

One of the largest complexities of `Shared` is that it needs to elect one of the waiters of the `Shared` future as the one who will actually do the work of `poll`ing the wrapped future. To avoid that complexity, we introduce usage of the `tokio` `Runtime` for _all_ `Node` executions. This allows waiters to stay very simple, and receive the result value via a `oneshot::channel`.

### Result

#4558 should be much easier to implement. I'll likely wait to land this until #4558 is further along. 

This change reduces memory usage about 200MB further than the fix from #5990 (down to about 2.5GB for `./pants list ::` in Twitter's repo).  Unfortunately, in its current form it also represents a performance regression of about 10% for that same usecase. Although I believe I have a handle on why, I'd like to defer fixing that regression until after #4558 is implemented.

This also fixes #3695, as since we now fail slowly, we are able to render multiple distinct failures at once.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants