Mark nodes "dirty" during invalidation, rather than deleting them #4558

stuhood · 2017-05-06T19:27:01Z

After a change event and during invalidation, we transitively delete Nodes from the graph. But rather than deleting, we should mark Nodes "dirty" instead.

As describe in the bazel/skyframe design, and in the adapton design, marking a Node dirty allows it to be "resurrected" without re-running if all of its inputs are subsequently marked clean with the same values they had before.

This is an extremely efficient cache of the previous execution, because it can happen purely in memory.

The text was updated successfully, but these errors were encountered:

stuhood · 2017-05-06T19:27:55Z

Relates to #3365.

stuhood · 2017-05-10T00:19:05Z

Also relates to #4394... it's possible that implementing this issue would make fixing that one unnecessary.

baroquebobcat · 2017-05-12T21:15:02Z

For me, this maps onto how .pants.d workdirs are handled in v1, where running invalidate causes pants to check the existing workdirs if they still end up with the same hashes, but may result in less work being performed depending on the task.

stuhood · 2017-05-12T21:16:19Z

but may result in less work being performed depending on the task.

Actually, we never reuse them currently... they're always cleared if we toggle back to an older version. They're mostly stable for the purposes of symlinks into dist, etc.

stuhood · 2017-11-14T04:01:20Z

Assigned to @jsirois

stuhood · 2017-11-29T17:47:38Z

@jsirois : One thing I hadn't thought of (and which you may now have thought of): we currently don't account for equality of Node::Output types... a TaskNode outputs core::Value instances, which we do not currently implement equality (in the form of rust's Eq) for. In places where we need both Eq and Hash we convert to a core::Key, which memoizes the core::Value to assign it a unique id... at least in the case of putting a python value in a hashtable, I think the Key abstraction is worthwhile, because we avoid a very large number of calls into python.

I think that in the case of un-dirtying a Node, you would only need equality (and not hash)... in that case, I think that you should be able to use externs::call_method(value1, '__equals__', &[value2]), to avoid converting to a Key.

EDIT: An update on this: I added a externs::equals(Value, Value) -> Boolean method, so this should be unblocked... although we might have to remove a few cases where we override __eq__ for performance.

kwlzn · 2018-02-03T05:32:51Z

I'm slightly worried about pantsd memory utilization with this model (particularly for large repos in e.g. the :: case) - do we have an eviction strategy in place yet for Graph nodes?
we'll probably get this as a side-effect of the new model, but it'd be great to think about optimizing the backing impl here for invalidation performance. as a random datapoint, it takes ~30s currently to invalidate from just touching the finagle root dirs in our monorepo:

I0202 21:13:32.313214 18899 scheduler_service.py:55] enqueuing 2 changes for subscription all_files
I0202 21:14:02.581753 18899 scheduler.py:402] invalidated 356056 nodes for: set([u'', u'finagle-internal', u'finagle'])

(this may also be exacerbated by #4394 tho)

stuhood · 2018-02-03T06:36:17Z

I'm slightly worried about pantsd memory utilization with this model (particularly for large repos in e.g. the :: case) - do we have an eviction strategy in place yet for Graph nodes?

Since we can't have two copies of the same Node (ie, with the same NodeKey), I think that the worst case is that "all of the nodes that represent the :: case" stay in memory. But it should not be the case that you ever end up with 2x that many Nodes in memory unless the entire repository has changed in such a way that all of the NodeKeys change.

kwlzn · 2018-02-03T07:26:44Z

Since we can't have two copies of the same Node (ie, with the same NodeKey)

ah, gotcha

stuhood · 2018-06-19T21:02:47Z

#4394 was less of a solution than we thought, so this is now hindering the efficiency of pantsd in cases where files are created near the root of the repo.

Going to switch to this to see if I can knock it out today.

### Problem #4558 involves some relatively fidgety logic, but `Graph` was built in-situ and does not have any tests of its own. ### Solution Extract and genericize `Graph` to allow for adding unit tests of the new behaviour in #4558. Add a very basic test; more useful ones will follow.

### Problem #4558 introduces a new state for a `Node`: it may be `dirty`, which requires that we remember the previous value of a `Node` while simultaneously computing its next value. I began implementing this state machine with our existing `future::Shared` usage, but ran into the need to either hide values in the closure of the running future, or store values atomically outside of the future's value. This complexity represents a classic case for a state machine. Additionally, we have seen that `future::Shared` has a relatively high cost in terms of complexity and memory (see #5990). ### Solution In order to make #4558 easier to implement correctly (and to reduce memory usage), implement the storage of a `Node`'s value in the `Graph` via a state machine. One of the largest complexities of `Shared` is that it needs to elect one of the waiters of the `Shared` future as the one who will actually do the work of `poll`ing the wrapped future. To avoid that complexity, we introduce usage of the `tokio` `Runtime` for _all_ `Node` executions. This allows waiters to stay very simple, and receive the result value via a `oneshot::channel`. ### Result #4558 should be much easier to implement. I'll likely wait to land this until #4558 is further along. This change reduces memory usage about 200MB further than the fix from #5990 (down to about 2.5GB for `./pants list ::` in Twitter's repo). Unfortunately, in its current form it also represents a performance regression of about 10% for that same usecase. Although I believe I have a handle on why, I'd like to defer fixing that regression until after #4558 is implemented. This also fixes #3695, as since we now fail slowly, we are able to render multiple distinct failures at once.

…nged (#6059) ### Problem As described in #4558, we currently completely delete `Node`s from the `Graph` when their inputs have changed. One concrete case where this is problematic is that all `Snapshots` in the graph end up with a dependency on the `scandir` outputs of all of their parent directories, because we need to expand symlinks recursively from the root when consuming a `Path` (in order to see whether any path component on the way down is a symlink). This means that changes anywhere above a `Snapshot` invalidate that `Snapshot`, and changes at the root of the repo invalidate _all_ `Snapshots` (although 99% of the syscalls they depend on are not invalidated, having no dependencies of their own). But this case is just one of many cases affected by the current implementation: there are many other times where we re-compute more than we should due to the current `Node` invalidation strategy. ### Solution Implement node "dirtying", as described on #4558. There are a few components to this work: * In addition to being `Entry::clear`ed (which will force a `Node` to re-run), a `Node` may be `Entry::dirty`ed. A "dirty" `Node` is eligible to be "cleaned" if its dependencies have not changed since it was dirtied. * Each `Node` records a `Generation` value that acts as proxy for "my output has changed". The `Node` locally maintains this value, and when a Node re-runs for any reason (either due to being `dirtied` or `cleared`), it compares its new output value to its old output value to determine whether to increment the `Generation`. * Each `Node` records the `Generation` values of the dependencies that it used to run, at the point when it runs. When a dirtied `Node` is deciding whether to re-run, it compares the previous generation values of its dependencies to their current dependency values: if they are equal, then the `Node` can be "cleaned": ie, its previous value can be used without re-running it. This patch also expands the testing of `Graph` to differentiate dirtying a `Node` from clearing it, and confirms that the correct `Nodes` re-run in each of those cases. ### Result Cleaning all `Nodes` involved in `./pants list ::` after touching `pants.ini` completes 6 times faster than recomputing them from scratch (56 seconds vs 336 seconds in our repository). More gains are likely possible by implementing the performance improvement(s) described on #6013. Fixes #4558 and fixes #4394.

stuhood added the engine label May 6, 2017

stuhood added this to the 1.4.0 milestone May 6, 2017

stuhood mentioned this issue Oct 26, 2017

Inlining of PathGlob expansion causes excess invalidation with the daemon #4394

Closed

stuhood assigned jsirois Nov 14, 2017

jsirois added the in-progress label Nov 21, 2017

jsirois removed the in-progress label Dec 6, 2017

jsirois removed their assignment Dec 6, 2017

stuhood modified the milestones: 1.4.x, 1.5.x Jan 12, 2018

kwlzn added the pantsd label Feb 3, 2018

kwlzn added this to the Daemon Post-Beta milestone Mar 1, 2018

stuhood assigned jsirois Apr 1, 2018

jsirois added the in-progress label Apr 2, 2018

This was referenced Apr 3, 2018

Only create native engine resource when needed. #5649

Merged

Release script cleanups. #5650

Merged

stuhood mentioned this issue May 7, 2018

Implement support for @rules that can directly request Subsystems #5788

Closed

stuhood assigned stuhood and unassigned jsirois Jun 19, 2018

stuhood mentioned this issue Jun 23, 2018

Extract and genericize Graph for easier testing #6010

Merged

stuhood mentioned this issue Jun 24, 2018

Switch to a per-entry state machine in Graph #6013

Merged

stuhood mentioned this issue Jul 3, 2018

Add support for reusing Graph node values if their inputs haven't changed #6059

Merged

cosmicexplorer modified the milestones: Daemon Backlog, 1.9.x Jul 3, 2018

stuhood closed this as completed in #6059 Jul 6, 2018

stuhood mentioned this issue Nov 30, 2018

Minimize Param usage in Task identities #6845

Closed

jsirois removed the in-progress label Jun 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mark nodes "dirty" during invalidation, rather than deleting them #4558

Mark nodes "dirty" during invalidation, rather than deleting them #4558

stuhood commented May 6, 2017 •

edited

Loading

stuhood commented May 6, 2017

stuhood commented May 10, 2017

baroquebobcat commented May 12, 2017

stuhood commented May 12, 2017

stuhood commented Nov 14, 2017

stuhood commented Nov 29, 2017 •

edited

Loading

kwlzn commented Feb 3, 2018

stuhood commented Feb 3, 2018 •

edited

Loading

kwlzn commented Feb 3, 2018

stuhood commented Jun 19, 2018

Mark nodes "dirty" during invalidation, rather than deleting them #4558

Mark nodes "dirty" during invalidation, rather than deleting them #4558

Comments

stuhood commented May 6, 2017 • edited Loading

stuhood commented May 6, 2017

stuhood commented May 10, 2017

baroquebobcat commented May 12, 2017

stuhood commented May 12, 2017

stuhood commented Nov 14, 2017

stuhood commented Nov 29, 2017 • edited Loading

kwlzn commented Feb 3, 2018

stuhood commented Feb 3, 2018 • edited Loading

kwlzn commented Feb 3, 2018

stuhood commented Jun 19, 2018

stuhood commented May 6, 2017 •

edited

Loading

stuhood commented Nov 29, 2017 •

edited

Loading

stuhood commented Feb 3, 2018 •

edited

Loading