Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge subjects and variants into Params, and remove Noop #6170

Merged
merged 13 commits into from
Sep 20, 2018

Conversation

stuhood
Copy link
Sponsor Member

@stuhood stuhood commented Jul 17, 2018

Problem

As described in #5788: @rules need a way to rely on values that are provided at request time, but which do not necessarily participate in the signatures of all @rules between the root and the consuming @rule. This is related to the existing concept of "variants", but requires an implementation that does not introduce variants into Node identities in subgraphs where they are not required.

Additionally, as described a while back in #4304, it should be possible to generate concrete subgraphs by removing ambiguity from the RuleGraph... but ambiguity is currently a "feature" required for composability of @rules that do not know about one another.

Solution

This change merges Variants and "subjects" into Params, and statically determines which Params are required in each subgraph. In order to handle cases where multiple providers of an @rule type dependency are available with different required input Params, the change "monomorphizes" (duplicates) RuleGraph entries per used parameter set. This allows us to remove runtime Noops, because every RuleGraph entry (and thus Node) has exactly one @rule provider for each of its declared dependencies.

Result

Lays groundwork for #4020 and #5788. Fixes #4304 by monomorphizing RuleGraph entries and removing Noop. Fixes #4027 by... deleting that code.

This change does not yet expose any sort of UX for providing more than one Param in a Get or root request, but it was already way too large, so I've opened #6478 for followup.

Copy link
Contributor

@illicitonion illicitonion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is getting a lot simpler :) Thanks!

A few questions from having a look through (definitely not close review, as it's WIP :))...

/// as sorted string tuples.
/// Params represent a TypeId->Key map.
///
/// For efficiency and hashability, they're stored as sorted Keys (with distinct TypeIds), and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description made me assume this would be a Cow<BTreeSet<Key>> rather than an Arc<Vec<Key>>...

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to use a BTreeSet, we'd still need to verify unique typeids, so would need a wrapper around Key to redefine Ord in terms of typeids.

It's possible that the Arc is overkill here though (since Key is so small).

impl Params {
pub fn new(mut params: Vec<Key>) -> Params {
params.sort_by(|l, r| l.type_id().cmp(r.type_id()));
params.dedup_by(|l, r| l.type_id() == r.type_id());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like it should be an error for this line to actually do anything?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Will add.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO?

entries: edges.entries_for(&select_key),
}
pub fn new(product: TypeConstraint, params: Params, edges: &rule_graph::RuleEdges) -> Select {
Self::new_with_selector(selectors::Select::without_variant(product), params, edges)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan for Select to disappear in favour of just a TypeConstraint (or maybe a wrapper-type around it), and for tasks_add_select_variant to disappear?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, now that without_variant doesn't contrast with anything, this feels like a weird API...

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed: I'll clean these up today.

use std::io;

use core::{Function, Key, TypeConstraint, TypeId, Value, ANY_TYPE};
use externs;
use selectors::{Get, Select};
use tasks::{Intrinsic, Task, Tasks};

type ParamTypes = BTreeSet<TypeId>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the big difference between this and the Params?

Copy link
Sponsor Member Author

@stuhood stuhood Jul 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key difference is Params contains Keys (ie, actual python values). But also, in theory performance is less critical in this case because the vast majority of Entries should be created and destroyed during RuleGraph initialization.

@stuhood stuhood force-pushed the stuhood/subsingliant branch 2 times, most recently from 5378da5 to 438e607 Compare July 25, 2018 04:49
@stuhood stuhood changed the title WIP: Merge subjects and variants into Params Merge subjects and variants into Params, and remove Noop Jul 25, 2018
@stuhood
Copy link
Sponsor Member Author

stuhood commented Jul 25, 2018

This is now reviewable.

It contains a ton of TODOs that I'll either open issues for or resolve, and still has some (trivially, I think) failing tests. But it shouldn't need any fundamental changes.

The Compute used parameters... and Add "AggregationRule"... commits are highly overlapping, and should probably be reviewed together. The rest are relatively independent. Sorry for the big mess!

Copy link
Contributor

@illicitonion illicitonion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are Aggregations actually used? Can you point at an example (which is tested)?

I'm not convinced all of the tests you're deleting should be deleted... Are some of those to be restored?

I think this could do with a decent prosaic description of what "kinds" of rules we support. Would you be able to put one together (or does one already exist)?

I also have some outstanding comments from my first pass which I think still stand.

entries: edges.entries_for(&select_key),
}
pub fn new(product: TypeConstraint, params: Params, edges: &rule_graph::RuleEdges) -> Select {
Self::new_with_selector(selectors::Select::without_variant(product), params, edges)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, now that without_variant doesn't contrast with anything, this feels like a weird API...

///
fn gen_nodes(&self, context: &Context) -> Vec<NodeFuture<Value>> {
/// TODO: This could take `self` by value and avoid cloning.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was planning to leave this TODO here long enough to land this, and then inline this method into Select, as the TODO near the end of the method suggests. Reasonable?

.core
.rule_graph
.edges_for_inner(&self.entry)
.expect("Expected edges to exist for Aggregation.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include a debug output to say what aggregation it is?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. And will try to move most of edges_for_inner(..).expect calls into RuleGraph, as dependency edges not existing would represent a bug there rather than here.

let externs::Get(product, subject) = get;
let entries = context
.map(|externs::Get(product, subject)| {
// TODO: The subject of the get is a new parameter, but params from the context should be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely needs a ticket :)

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I may do this before landing, but will open a ticket otherwise.

@@ -20,24 +24,27 @@ impl UnreachableError {
UnreachableError {
task_rule: task_rule,
diagnostic: Diagnostic {
subject_type: ANY_TYPE,
params: Default::default(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My pet peeve: ParamTypes::default() :)

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix.

return Ok(ConstructGraphResult::Unfulfillable);
}
};
let simplified_entries_only = simplified_entries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplified_entries.keys().cloned().collect()?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

let mut diagnostics = Vec::new();
for available_params in params_powerset {
let available_params = available_params.into_iter().collect();
// If a subset of these parameters is already satisfied, skip. This has the effect of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow this... Is this required? Or an optimisation?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is required. This dedupes entries that use subsets of one-another's parameters. IE, if an @rule is satisfiable with an Address and Address+OtherThing, we prefer the subset. This also differentiates between the rule that would use only OtherThing and the rule that would use only Address.

But I think it needs unification with the logic in choose_dependency that does a similar thing... and choose_dependency likely needs some sort of unification with rhs. Will look into what can be deduped here before landing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you follow up on this? Did it turn out no unification could happen, or should there be a ticket to follow up?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed up, and didn't see an obvious unification. It's possible that when I dive into the tests tomorrow I can tease apart a unification, but I'm not optimistic.

return Ok(None);
}
1 => {
combination.push((key, *chosen_entries.last().unwrap()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

combination.push((key, *chosen_entries[0]));?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO


// Prefer a Singleton, then a Param, then the non-ambiguous rule with the smallest set of
// input Params.
// TODO: We should likely prefer Rules to Params.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we prefer Rules to Params? In fact; why are these directly comparable at all? They feel like they're separate; what's an @rule signature with an example of a Params and Rules where we need to resolve between the two?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. This needs unification with rhs, but: the non-intuitive thing I discovered while implementing this is that in order to keep Node identities small (ie, containing the fewest Params possible), we want to prefer Entries that need fewer parameters. This biases toward avoiding propagating information from callers to callees unless they really need it to be executed (and avoids the case where absolutely anything used anywhere in the graph is potentially a parameter used at the root of the graph: that will continue to be bounded by RootRule).

In short: we would generally prefer a @rule to a Param, because depending on a rule does not affect our identity, while using a Param does (because it comes from the caller).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add this comment as a code comment?

rules_by_kind
.entry(wd.simplified(BTreeSet::new()))
.and_modify(|e| {
// TODO: Param set size isn't sufficient to fully differentiate alternatives: would
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we prefer larger not smaller?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is stale: this code is actually prefering the smaller set. As mentioned above, needs some unification.

Copy link
Sponsor Member Author

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are Aggregations actually used? Can you point at an example (which is tested)?

They were used in (and necessary for) planners.py, but there was at least one more source of ambiguity in those rules (related to #4005), and I couldn't justify adding more features here.

I plan to heavily extend tests/python/pants_test/engine/test_rules.py today to cover use of AggregationRule, and the new graph ambiguity errors.

I'm not convinced all of the tests you're deleting should be deleted... Are some of those to be restored?

All of the deleted tests consumed planners.py, and about 50% of them were already skipped as we deleted expensive features that they depended on. While I think that there was only one additional feature needed in this diff (a resolution to #4005) to support them, the tests in test_rules.py and etc are significantly easier to maintain, while providing more useful examples of behaviour.

If we feel strongly that we should keep them, I think I'd need to dive in on #4005 to allow planners.py to continue to use the weird ducktyping it is using currently. But I believe that that would actually be redundant with #4535, which proposes to use a more principled approach to extension of a union type.

I think this could do with a decent prosaic description of what "kinds" of rules we support. Would you be able to put one together (or does one already exist)?

That exists in https://github.com/pantsbuild/pants/blob/master/src/python/pants/engine/README.md : I will definitely extend it here to account for AggregationRule.

I also fully expect to be able to remove SingletonRule as the UX for Params is fleshed out (since a singleton is simply a zero-Param @rule).

/// as sorted string tuples.
/// Params represent a TypeId->Key map.
///
/// For efficiency and hashability, they're stored as sorted Keys (with distinct TypeIds), and
Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to use a BTreeSet, we'd still need to verify unique typeids, so would need a wrapper around Key to redefine Ord in terms of typeids.

It's possible that the Arc is overkill here though (since Key is so small).

impl Params {
pub fn new(mut params: Vec<Key>) -> Params {
params.sort_by(|l, r| l.type_id().cmp(r.type_id()));
params.dedup_by(|l, r| l.type_id() == r.type_id());
Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Will add.

entries: edges.entries_for(&select_key),
}
pub fn new(product: TypeConstraint, params: Params, edges: &rule_graph::RuleEdges) -> Select {
Self::new_with_selector(selectors::Select::without_variant(product), params, edges)
Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed: I'll clean these up today.

///
fn gen_nodes(&self, context: &Context) -> Vec<NodeFuture<Value>> {
/// TODO: This could take `self` by value and avoid cloning.
Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was planning to leave this TODO here long enough to land this, and then inline this method into Select, as the TODO near the end of the method suggests. Reasonable?

.core
.rule_graph
.edges_for_inner(&self.entry)
.expect("Expected edges to exist for Aggregation.");
Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. And will try to move most of edges_for_inner(..).expect calls into RuleGraph, as dependency edges not existing would represent a bug there rather than here.

let mut diagnostics = Vec::new();
for available_params in params_powerset {
let available_params = available_params.into_iter().collect();
// If a subset of these parameters is already satisfied, skip. This has the effect of
Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is required. This dedupes entries that use subsets of one-another's parameters. IE, if an @rule is satisfiable with an Address and Address+OtherThing, we prefer the subset. This also differentiates between the rule that would use only OtherThing and the rule that would use only Address.

But I think it needs unification with the logic in choose_dependency that does a similar thing... and choose_dependency likely needs some sort of unification with rhs. Will look into what can be deduped here before landing.

return Ok(None);
}
1 => {
combination.push((key, *chosen_entries.last().unwrap()));
Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


// Prefer a Singleton, then a Param, then the non-ambiguous rule with the smallest set of
// input Params.
// TODO: We should likely prefer Rules to Params.
Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. This needs unification with rhs, but: the non-intuitive thing I discovered while implementing this is that in order to keep Node identities small (ie, containing the fewest Params possible), we want to prefer Entries that need fewer parameters. This biases toward avoiding propagating information from callers to callees unless they really need it to be executed (and avoids the case where absolutely anything used anywhere in the graph is potentially a parameter used at the root of the graph: that will continue to be bounded by RootRule).

In short: we would generally prefer a @rule to a Param, because depending on a rule does not affect our identity, while using a Param does (because it comes from the caller).

rules_by_kind
.entry(wd.simplified(BTreeSet::new()))
.and_modify(|e| {
// TODO: Param set size isn't sufficient to fully differentiate alternatives: would
Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is stale: this code is actually prefering the smaller set. As mentioned above, needs some unification.

@@ -517,9 +974,9 @@ impl RuleGraph {
}

pub fn find_root_edges(&self, subject_type: TypeId, select: Select) -> Option<RuleEdges> {
// TODO return Result instead
// TODO: Support more than one root parameter... needs some API work.
Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will open a followup ticket for multiple-parameter-Get UX.

Copy link
Contributor

@baroquebobcat baroquebobcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a first pass at this. Most of my comments are for small nitpicks.

I like this change quite a bit. I think that, conceptually, param is a much easier thing to explain than subject and variant. And, I think param aligns better with what those things were used for.

"""A rule that receives the results of all other rules for a product to aggregate them.

An AggregationRule supports composability of @rules by allowing additional rules to be installed
to provide some type without removing or otherwise modifying the installed rules.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of to provide some type, maybe that provide the same type?

// The Entry was satisfiable without waiting for any additional nodes to be satisfied. The result
// contains copies of the input Entry for each set subset of the parameters that satisfy it.
Fulfilled(Vec<EntryWithDeps>),
// The Entry was not satifiable with installed rules.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: sp satisfiable

pub fn sub_graph(&self, subject_type: TypeId, product_type: &TypeConstraint) -> RuleGraph {
if let Some(beginning_root) = self.gen_root_entry(subject_type, product_type) {
self._construct_graph(vec![beginning_root])
pub fn sub_graph(&self, subject_type: &TypeId, product_type: &TypeConstraint) -> RuleGraph {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename subject_type to param_type?

Copy link
Contributor

@ity ity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reading through this was a learning experience for me! thank you for the absolutely useful changes with a lot of complexity removed. And, would love to see smaller/more digestible changes in the future :)

@stuhood
Copy link
Sponsor Member Author

stuhood commented Sep 10, 2018

I've rebased this and applied review feedback: only the topmost commit contains anything new.

Having thought a bit more about what rule graph extensibility should look like, I decided to rebase AggregationRule out of this branch, and to stash it so that it can be considered in the context of implementing #4535 (which needs something that is shaped slightly differently).

I'd like to clean up the tests and land this tomorrow if reviewers agree.

Copy link
Contributor

@illicitonion illicitonion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handful of todos left to go, then let's get this thing merged :D

@stuhood stuhood merged commit c99cb60 into pantsbuild:master Sep 20, 2018
@stuhood stuhood deleted the stuhood/subsingliant branch September 20, 2018 20:37
@stuhood stuhood mentioned this pull request Sep 20, 2018
stuhood pushed a commit that referenced this pull request Apr 2, 2019
### Problem

In #6170, it became possible for a `@rule` to take no `Params`, meaning that its identity did not include any of its context, and it could act as a singleton. Before that change, it had always been the case that a `@rule` would have a "subject" in its identity, necessitating `SingletonRule` in order to avoid creating a new copy of the rule in the graph for each distinct subject.

### Solution

Remove `SingletonRule` in favor of definition of zero-parameter `@rules`, which reduces the total number of concepts involved in the engine. Because `SingletonRule` used to allow a declaration to lie about the exact type it was providing (and give an instance of a subclass instead), usage of `SymbolTable` first needed to be cleaned up to remove subclassing.
stuhood added a commit that referenced this pull request Aug 2, 2019
### Problem

The engine `README.md` had not been updated for the merge of "subjects" and `Variants` into `Params` (#6170).

### Solution

Overhaul the explanations to refer to `Params`, and expand on how `Params` are propagated from dependents to dependencies.

Additionally, simplify and document the codepath in rule graph construction that selects which `@rule` dependencies to use.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants