Add partition operator to filter expressions? #159

sunshowers · 2022-04-17T17:03:44Z

Consider adding a partition() operator to filter expressions, and possibly deprecating the current --partition flag.

This is in principle possible to do, since (assuming a specific state of source code) all partition assignments for tests are completely deterministic. However, the current partition scheme for count: uses mutable state, which is too complicated -- instead, we can simply give each test a numeric ID and simply use modulo N for the count partition operator.

Strictly speaking, this doesn't block the release of filter expressions. However, if we do it before releasing filter expressions, we can ban using --partition and -E simultaneously. If we implement this after the release, we'll have to scan the query to look for partition operators, raising the complexity of the implementation -- so there's some advantages to doing it now.

The text was updated successfully, but these errors were encountered:

sunshowers · 2022-04-17T17:04:09Z

cc @Guiguiprim if you wanted to tackle this.

Guiguiprim · 2022-04-17T17:56:40Z

How would you see this ? I never used partitioning, so I don't know what would be the right API for it.

partition(1/2, set-expr) / partition(1, 2, set-expr) ?

Would this, be a scenario we want to allow:

machine 1: partition(1, 2, test(parse)) + partition(1, 3, test(run))
machine 2: partition(2, 2, test(parse)) + partition(2, 3, test(run))
machine 3: test(check) + partition(3, 3 ,test(run))

Guiguiprim · 2022-04-17T18:08:55Z

For a far, it seems to me that the numeric ID strategy might not always works. Worst case the filter happen to eliminate all the odd tests...?

sunshowers · 2022-04-17T18:35:42Z

I was just thinking of adding partition(count:1/2) & test(...).

See https://nexte.st/book/partitioning.html for how partitioning works.

sunshowers · 2022-04-17T22:01:06Z

Would this, be a scenario we want to allow:

I think we should let people do this if we like -- it's up to them to exhaustively run all the tests in complicated situations. The more common approach will be to simply do -E 'partition(count:1/2)' etc.

For a far, it seems to me that the numeric ID strategy might not always works. Worst case the filter happen to eliminate all the odd tests...?

Hmm, I'm not sure what you mean. From my view the point of partitioning is to not run the tests that aren't in the right bucket -- this seems like a good use case for filter expressions.

Guiguiprim · 2022-04-18T10:39:28Z

I mean:

In the current state of things, the filtering pipeline is: first apply user filter then partition filtered tests. This ensures the N buckets have the same sizes (+-1).
Using an ID associated to the test for partitioning would be the same as: first partition tests then apply user filter on the current partition. At this level, the user filter can be seen as a pseudo-random function, so we would get N buckets of random sizes. Worst case, one of them contains all the tests.

Am I wrong ?

That's also why I saw it like partition(count:1/2, inner-set), so that the partition could update it state only on tests accepted by inner-set.

It's a syntax choice I guess: partition(count:1/2) & test(xx) could behave like partition(count:1/2, test(xx)), partition(count:1/2) would be like partition(count:1/2, all()). But that open the question: What does partition(count:1/2) - test(xx) means ? Should it be rejected during compilation ?

Use an index maintained by the caller rather than state maintained by the callee -- this removes mutable state and also makes it easier to implement a partition() operator in filter expressions (#159). A TODO is to remove TestFilterBuilder and PartitionerBuilder, neither of which are required any more.

@Guiguiprim

This reverts commit 14d7791. As @Guiguiprim pointed out in #159, we want the partition operator to apply after others.

sunshowers · 2022-04-18T16:49:28Z

Ahhh yes you're absolutely right. Sorry about misunderstanding you earlier -- you're absolutely right that partition operations must happen after all other filters are applied.

I think we shouldn't implement a partition() operator in that case. Thank you so much for showing so much care.

@Guiguiprim

… end As @Guiguiprim pointed out in #159, partition-based filtering must happen after all other kinds.

sunshowers · 2022-04-18T16:55:06Z

I added a comment to try and explain this -- hopefully that'll prevent future me from being confused!

sunshowers · 2022-04-18T17:10:37Z

I guess we could support hash-based partitioning though, maybe at some point in the future.

sunshowers added the A-filter-expressions Support for filter expressions label Apr 17, 2022

sunshowers mentioned this issue Apr 17, 2022

Tracking issue for filter expressions #158

Closed

3 tasks

sunshowers added a commit that referenced this issue Apr 18, 2022

Revert "[nextest-runner] make count-based partitioning simpler"

95e2f29

This reverts commit 14d7791. As @Guiguiprim pointed out in #159, we want the partition operator to apply after others.

sunshowers closed this as completed Apr 18, 2022

sunshowers added a commit that referenced this issue Apr 18, 2022

[nextest-runner] explain why partition-based filtering must be at the…

c047716

… end As @Guiguiprim pointed out in #159, partition-based filtering must happen after all other kinds.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add partition operator to filter expressions? #159

Add partition operator to filter expressions? #159

sunshowers commented Apr 17, 2022 •

edited

Loading

sunshowers commented Apr 17, 2022

Guiguiprim commented Apr 17, 2022

Guiguiprim commented Apr 17, 2022

sunshowers commented Apr 17, 2022

sunshowers commented Apr 17, 2022

Guiguiprim commented Apr 18, 2022

sunshowers commented Apr 18, 2022

sunshowers commented Apr 18, 2022

sunshowers commented Apr 18, 2022

Add partition operator to filter expressions? #159

Add partition operator to filter expressions? #159

Comments

sunshowers commented Apr 17, 2022 • edited Loading

sunshowers commented Apr 17, 2022

Guiguiprim commented Apr 17, 2022

Guiguiprim commented Apr 17, 2022

sunshowers commented Apr 17, 2022

sunshowers commented Apr 17, 2022

Guiguiprim commented Apr 18, 2022

sunshowers commented Apr 18, 2022

sunshowers commented Apr 18, 2022

sunshowers commented Apr 18, 2022

sunshowers commented Apr 17, 2022 •

edited

Loading