Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add partition operator to filter expressions? #159

Closed
Tracked by #158
sunshowers opened this issue Apr 17, 2022 · 9 comments
Closed
Tracked by #158

Add partition operator to filter expressions? #159

sunshowers opened this issue Apr 17, 2022 · 9 comments
Labels
A-filter-expressions Support for filter expressions

Comments

@sunshowers
Copy link
Member

sunshowers commented Apr 17, 2022

Consider adding a partition() operator to filter expressions, and possibly deprecating the current --partition flag.

This is in principle possible to do, since (assuming a specific state of source code) all partition assignments for tests are completely deterministic. However, the current partition scheme for count: uses mutable state, which is too complicated -- instead, we can simply give each test a numeric ID and simply use modulo N for the count partition operator.

Strictly speaking, this doesn't block the release of filter expressions. However, if we do it before releasing filter expressions, we can ban using --partition and -E simultaneously. If we implement this after the release, we'll have to scan the query to look for partition operators, raising the complexity of the implementation -- so there's some advantages to doing it now.

@sunshowers sunshowers added the A-filter-expressions Support for filter expressions label Apr 17, 2022
@sunshowers
Copy link
Member Author

cc @Guiguiprim if you wanted to tackle this.

@Guiguiprim
Copy link
Contributor

How would you see this ? I never used partitioning, so I don't know what would be the right API for it.

partition(1/2, set-expr) / partition(1, 2, set-expr) ?

Would this, be a scenario we want to allow:

  • machine 1: partition(1, 2, test(parse)) + partition(1, 3, test(run))
  • machine 2: partition(2, 2, test(parse)) + partition(2, 3, test(run))
  • machine 3: test(check) + partition(3, 3 ,test(run))

@Guiguiprim
Copy link
Contributor

For a far, it seems to me that the numeric ID strategy might not always works. Worst case the filter happen to eliminate all the odd tests...?

@sunshowers
Copy link
Member Author

I was just thinking of adding partition(count:1/2) & test(...).

See https://nexte.st/book/partitioning.html for how partitioning works.

@sunshowers
Copy link
Member Author

Would this, be a scenario we want to allow:

I think we should let people do this if we like -- it's up to them to exhaustively run all the tests in complicated situations. The more common approach will be to simply do -E 'partition(count:1/2)' etc.

For a far, it seems to me that the numeric ID strategy might not always works. Worst case the filter happen to eliminate all the odd tests...?

Hmm, I'm not sure what you mean. From my view the point of partitioning is to not run the tests that aren't in the right bucket -- this seems like a good use case for filter expressions.

@Guiguiprim
Copy link
Contributor

I mean:

  • In the current state of things, the filtering pipeline is: first apply user filter then partition filtered tests. This ensures the N buckets have the same sizes (+-1).
  • Using an ID associated to the test for partitioning would be the same as: first partition tests then apply user filter on the current partition. At this level, the user filter can be seen as a pseudo-random function, so we would get N buckets of random sizes. Worst case, one of them contains all the tests.

Am I wrong ?

That's also why I saw it like partition(count:1/2, inner-set), so that the partition could update it state only on tests accepted by inner-set.

It's a syntax choice I guess: partition(count:1/2) & test(xx) could behave like partition(count:1/2, test(xx)), partition(count:1/2) would be like partition(count:1/2, all()). But that open the question: What does partition(count:1/2) - test(xx) means ? Should it be rejected during compilation ?

sunshowers added a commit that referenced this issue Apr 18, 2022
Use an index maintained by the caller rather than state maintained by
the callee -- this removes mutable state and also makes it easier to
implement a partition() operator in filter expressions (#159).

A TODO is to remove TestFilterBuilder and PartitionerBuilder, neither of
which are required any more.
sunshowers added a commit that referenced this issue Apr 18, 2022
This reverts commit 14d7791. As
@Guiguiprim pointed out in #159, we want the partition operator to apply
after others.
@sunshowers
Copy link
Member Author

Ahhh yes you're absolutely right. Sorry about misunderstanding you earlier -- you're absolutely right that partition operations must happen after all other filters are applied.

I think we shouldn't implement a partition() operator in that case. Thank you so much for showing so much care.

sunshowers added a commit that referenced this issue Apr 18, 2022
… end

As @Guiguiprim pointed out in #159, partition-based filtering must
happen after all other kinds.
@sunshowers
Copy link
Member Author

I added a comment to try and explain this -- hopefully that'll prevent future me from being confused!

@sunshowers
Copy link
Member Author

I guess we could support hash-based partitioning though, maybe at some point in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-filter-expressions Support for filter expressions
Projects
None yet
Development

No branches or pull requests

2 participants