The trace for an error in a context with a lot of dependents is redundant #3695

stuhood · 2016-07-19T20:24:37Z

The Graph.trace for a cycle detected in a context where the failed/cyclic Node has a lot of dependents is very redundant. See @ericzundel 's response to my question on https://rbcommons.com/s/twitter/r/4087/ for an example.

To repro, introduce a failure inside of a graph, and then list a target spec that covers a bunch of things that depend on it... ie: break guava in 3rdparty, and then attempt to run ./pants list ::.

Most likely what we want to do is invert the trace: rather than rendering downward from the roots, we should discover all failed nodes, and then, for each node, choose some number of relevant paths (or just one: the shortest...?) to the root to render.

The text was updated successfully, but these errors were encountered:

baroquebobcat · 2017-01-25T21:07:39Z

I think that's a good idea. Then it becomes a sorting problem to present the most useful error paths.

stuhood · 2017-03-22T19:28:04Z

This is now related to #4358... because SelectTransitive is used to walk target dependencies, cycle detection messages are now pushed back to where they are in the v1 engine (ie, only during sort_targets). Removing from the milestone.

stuhood · 2017-12-20T21:35:41Z

Looked into this a bit with @caveness , and determined that since switching to Futures, we now eagerly cancel all other work after the first failure, which results in the effect we want here, with the exception of one small issue:

Cancelled nodes have no EntryStateField value, so they render as <none>. It cleans up the trace considerably to treat cancelled nodes as successful. Will tackle this tonight.

Problem #3695 initially described a solution to a problem experienced in the original "slow-failing" implementation of the engine: if there were multiple paths between a root and a failure, we would render all of them, which was hugely redundant. But since that ticket was opened, we switched to a fast-failing implementation via the switch to Futures. When one dependency of a Node fails, the rest are effectively cancelled, since Futures are pull based. This (should) have the effect of rendering exactly one failure path per root: the first failure that is encountered while computing that root. But it was still the case that we were printing redundant/useless information: cancelled dependencies were being rendered, which in the case of things like ./pants list :: resulted in a lot of noise. Solution Since a cancelled node is never the source of a failure, don't render them. Result The added test no longer renders a <None> entry, like: Computing Select(<pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =A) Computing Task(<class 'pants_test.engine.test_engine.A'>, <pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =A) Computing Task(<class 'pants_test.engine.test_engine.D'>, <pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =D) <None> Computing Task(<function nested_raise at 0xEEEEEEEEE>, <pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =C) Throw(An exception for B) Traceback (most recent call last): File LOCATION-INFO, in extern_invoke_runnable val = runnable(*args) File LOCATION-INFO, in nested_raise fn_raises(x) File LOCATION-INFO, in fn_raises raise Exception('An exception for {}'.format(type(x).__name__)) Exception: An exception for B Fixes #3695.

Problem pantsbuild#3695 initially described a solution to a problem experienced in the original "slow-failing" implementation of the engine: if there were multiple paths between a root and a failure, we would render all of them, which was hugely redundant. But since that ticket was opened, we switched to a fast-failing implementation via the switch to Futures. When one dependency of a Node fails, the rest are effectively cancelled, since Futures are pull based. This (should) have the effect of rendering exactly one failure path per root: the first failure that is encountered while computing that root. But it was still the case that we were printing redundant/useless information: cancelled dependencies were being rendered, which in the case of things like ./pants list :: resulted in a lot of noise. Solution Since a cancelled node is never the source of a failure, don't render them. Result The added test no longer renders a <None> entry, like: Computing Select(<pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =A) Computing Task(<class 'pants_test.engine.test_engine.A'>, <pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =A) Computing Task(<class 'pants_test.engine.test_engine.D'>, <pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =D) <None> Computing Task(<function nested_raise at 0xEEEEEEEEE>, <pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =C) Throw(An exception for B) Traceback (most recent call last): File LOCATION-INFO, in extern_invoke_runnable val = runnable(*args) File LOCATION-INFO, in nested_raise fn_raises(x) File LOCATION-INFO, in fn_raises raise Exception('An exception for {}'.format(type(x).__name__)) Exception: An exception for B Fixes pantsbuild#3695.

stuhood · 2018-03-16T05:16:04Z

Reopening this post #5605 because it repros for the cycle detection triggered by #4358.

### Problem While chasing performance in one area for #4349 we added `SelectTransitive`. This decreased performance in another significant area: executions involving multiple transitive roots can't structure-share at all when expanding dependencies, leading to many redundant walks (#4533). Additionally, usability regressed in a few ways: the product `Graph` could not implement cycle detection for dependencies expanded via `SelectTransitive` (#4358), and in the case of a missing or broken dependency, the referring context was lost (#4515). ### Solution * Remove usage of `SelectTransitive` for `TransitiveHydratedTargets` to reintroduce structure sharing and improve usability (fixes #4358 and fixes #4515). * Replace `@rule`s that operate on single-`Spec`s with batch forms that operate on a `Specs` collection (fixes #4533). ### Result Cycles should be detected much earlier with: ``` Exception message: Build graph construction failed: ExecutionError Received unexpected Throw state(s): Computing Select(Collection(dependencies=(SingleAddress(directory=u'testprojects/src/java/org/pantsbuild/testproject/cycle1', name=u'cycle1'),)), =TransitiveHydratedTargets) Computing Task(transitive_hydrated_targets, Collection(dependencies=(SingleAddress(directory=u'testprojects/src/java/org/pantsbuild/testproject/cycle1', name=u'cycle1'),)), =TransitiveHydratedTargets) Computing Task(transitive_hydrated_target, testprojects/src/java/org/pantsbuild/testproject/cycle1:cycle1, =TransitiveHydratedTarget) Computing Task(transitive_hydrated_target, testprojects/src/java/org/pantsbuild/testproject/cycle2:cycle2, =TransitiveHydratedTarget) Throw(No source of required dependency: Dep graph contained a cycle.) Traceback (no traceback): <pants native internals> Exception: No source of required dependency: Dep graph contained a cycle. ``` _(more polish needed here: re-opened #3695 to clean up `trace`)_ And time taken is proportional to the total number of matched targets, rather than to the sum of the closure sizes of the roots: ``` # before: $ time ./pants --target-spec-file=targets.txt list | wc -l 1500 real 0m15.297s user 0m14.603s sys 0m1.625s # after: $ time ./pants --target-spec-file=targets.txt list | wc -l 1500 real 0m5.989s user 0m5.261s sys 0m1.310s ``` Runtimes for things like `./pants list ::` are unaffected, although memory usage increases by about 5%, likely due to the branchier resulting `Graph`.

While chasing performance in one area for #4349 we added `SelectTransitive`. This decreased performance in another significant area: executions involving multiple transitive roots can't structure-share at all when expanding dependencies, leading to many redundant walks (#4533). Additionally, usability regressed in a few ways: the product `Graph` could not implement cycle detection for dependencies expanded via `SelectTransitive` (#4358), and in the case of a missing or broken dependency, the referring context was lost (#4515). * Remove usage of `SelectTransitive` for `TransitiveHydratedTargets` to reintroduce structure sharing and improve usability (fixes #4358 and fixes #4515). * Replace `@rule`s that operate on single-`Spec`s with batch forms that operate on a `Specs` collection (fixes #4533). Cycles should be detected much earlier with: ``` Exception message: Build graph construction failed: ExecutionError Received unexpected Throw state(s): Computing Select(Collection(dependencies=(SingleAddress(directory=u'testprojects/src/java/org/pantsbuild/testproject/cycle1', name=u'cycle1'),)), =TransitiveHydratedTargets) Computing Task(transitive_hydrated_targets, Collection(dependencies=(SingleAddress(directory=u'testprojects/src/java/org/pantsbuild/testproject/cycle1', name=u'cycle1'),)), =TransitiveHydratedTargets) Computing Task(transitive_hydrated_target, testprojects/src/java/org/pantsbuild/testproject/cycle1:cycle1, =TransitiveHydratedTarget) Computing Task(transitive_hydrated_target, testprojects/src/java/org/pantsbuild/testproject/cycle2:cycle2, =TransitiveHydratedTarget) Throw(No source of required dependency: Dep graph contained a cycle.) Traceback (no traceback): <pants native internals> Exception: No source of required dependency: Dep graph contained a cycle. ``` _(more polish needed here: re-opened #3695 to clean up `trace`)_ And time taken is proportional to the total number of matched targets, rather than to the sum of the closure sizes of the roots: ``` $ time ./pants --target-spec-file=targets.txt list | wc -l 1500 real 0m15.297s user 0m14.603s sys 0m1.625s $ time ./pants --target-spec-file=targets.txt list | wc -l 1500 real 0m5.989s user 0m5.261s sys 0m1.310s ``` Runtimes for things like `./pants list ::` are unaffected, although memory usage increases by about 5%, likely due to the branchier resulting `Graph`.

…y root per distinct failure. Fixes pantsbuild#3695.

### Problem #4558 introduces a new state for a `Node`: it may be `dirty`, which requires that we remember the previous value of a `Node` while simultaneously computing its next value. I began implementing this state machine with our existing `future::Shared` usage, but ran into the need to either hide values in the closure of the running future, or store values atomically outside of the future's value. This complexity represents a classic case for a state machine. Additionally, we have seen that `future::Shared` has a relatively high cost in terms of complexity and memory (see #5990). ### Solution In order to make #4558 easier to implement correctly (and to reduce memory usage), implement the storage of a `Node`'s value in the `Graph` via a state machine. One of the largest complexities of `Shared` is that it needs to elect one of the waiters of the `Shared` future as the one who will actually do the work of `poll`ing the wrapped future. To avoid that complexity, we introduce usage of the `tokio` `Runtime` for _all_ `Node` executions. This allows waiters to stay very simple, and receive the result value via a `oneshot::channel`. ### Result #4558 should be much easier to implement. I'll likely wait to land this until #4558 is further along. This change reduces memory usage about 200MB further than the fix from #5990 (down to about 2.5GB for `./pants list ::` in Twitter's repo). Unfortunately, in its current form it also represents a performance regression of about 10% for that same usecase. Although I believe I have a handle on why, I'd like to defer fixing that regression until after #4558 is implemented. This also fixes #3695, as since we now fail slowly, we are able to render multiple distinct failures at once.

stuhood added the engine label Jul 19, 2016

stuhood mentioned this issue Jul 25, 2016

When failing to find targets in an ignored directory, the error message should reflect the ignorance #3728

Closed

kwlzn added this to the v2 engine adoption milestone Jul 29, 2016

kwlzn added the UX label Jul 29, 2016

stuhood mentioned this issue Oct 31, 2016

Restore trace #4007

Closed

stuhood mentioned this issue Nov 19, 2016

Missing addresses on the command line cause unfriendly error messages in the new engine #3912

Closed

stuhood removed this from the v2 engine: daemon-by-default beta milestone Mar 22, 2017

stuhood mentioned this issue Apr 28, 2017

[engine] undefined names in BUILD files trigger a double trace #4530

Closed

stuhood changed the title ~~The trace for a cycle detected in a context with a lot of dependents is redundant~~ The trace for an error in a context with a lot of dependents is redundant Apr 28, 2017

stuhood added this to the Daemon Beta milestone Dec 16, 2017

stuhood mentioned this issue Dec 19, 2017

Import statements can be banned in BUILD files #5180

Merged

stuhood assigned caveness and unassigned caveness Dec 20, 2017

stuhood self-assigned this Dec 20, 2017

stuhood mentioned this issue Dec 27, 2017

Don't render cancelled nodes in trace #5252

Merged

kwlzn closed this as completed in #5252 Dec 27, 2017

stuhood reopened this Mar 16, 2018

stuhood mentioned this issue Mar 16, 2018

Batch execution of address Specs and remove SelectTransitive #5605

Merged

stuhood mentioned this issue Jun 26, 2018

Switch to a per-entry state machine in Graph #6013

Merged

stuhood added a commit to twitter/pants that referenced this issue Jul 3, 2018

Switch the implementation of tracing to tracing a shortest path to an…

a368f7a

…y root per distinct failure. Fixes pantsbuild#3695.

stuhood closed this as completed in #6013 Jul 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The trace for an error in a context with a lot of dependents is redundant #3695

The trace for an error in a context with a lot of dependents is redundant #3695

stuhood commented Jul 19, 2016 •

edited

Loading

baroquebobcat commented Jan 25, 2017

stuhood commented Mar 22, 2017

stuhood commented Dec 20, 2017

stuhood commented Mar 16, 2018

The trace for an error in a context with a lot of dependents is redundant #3695

The trace for an error in a context with a lot of dependents is redundant #3695

Comments

stuhood commented Jul 19, 2016 • edited Loading

baroquebobcat commented Jan 25, 2017

stuhood commented Mar 22, 2017

stuhood commented Dec 20, 2017

stuhood commented Mar 16, 2018

stuhood commented Jul 19, 2016 •

edited

Loading