Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark flaky NixOS tests #216828

Open
roberth opened this issue Feb 17, 2023 · 2 comments
Open

Mark flaky NixOS tests #216828

roberth opened this issue Feb 17, 2023 · 2 comments

Comments

@roberth
Copy link
Member

roberth commented Feb 17, 2023

Describe the bug

We have become accustomed to some tests failing every now and then.
Making sweeping changes is harder than necessary because we can't trust the outcomes of tests.

Flaky tests are a nuisance to everyone who invests good work to get changes through and have the side-effect of wearing down motivation to painstaking.

Steps To Reproduce

Steps to reproduce the behavior:

  1. Make a mass-rebuild change or a change to the test driver
  2. Some tests fail inexplicably. What now?

Expected behavior

Flaky tests can be marked as such, so that they can still be fixed and don't bog anyone down.

See also earlier short discussion:

Some requirements for the marking solution:

  • hydra puts the flaky tests in a separate attrset for easy recognition
  • marking / unmarking is a one-line change; no file moving churn
  • evaluation does not suffer: no attribute set filtering, but rather mapAttrs to set some tests to null (and only on Hydra)
  • pkgs.nixosTests attrset works the same regardless of flakiness status
  • pkgs.nixosTests prints a warning when evaluation or running a known-flaky test
  • update the manual; change what was added in nixos/doc: Add Developing the Test Driver #216660

Additional context

Notify maintainers

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
output here
@vcunat
Copy link
Member

vcunat commented Feb 28, 2023

I'd simply mark them as broken, so they won't get built by default. I mean, what good is it to make Hydra run them all the time when we don't do anything with the result?

@roberth
Copy link
Member Author

roberth commented Feb 28, 2023

what good is it to make Hydra run them all the time when we don't do anything with the result?

We could do something with the result. After a sufficient number of successes we may consider a test stable again. This can happen when a non-deterministic bug is fixed in the actual program, and the test code was ok all along.

We could run only those tests that have been worked on since they were marked as flaky, by introducing another type of marking.
As a compute cost optimization we could then choose not to run those tests. That removes the last edge of the state machine:

regular -> flaky        -- discovered to be flaky
regular -> stage        -- discovered flaky with obvious fix (but not quite certain)
stage -> flaky          -- test is still flaky and not worked on anymore
stage -> regular        -- test has run successfully for many times
flaky -> stage          -- test has been improved
flaky -> regular        -- non-deterministic bug has been fixed without changing the test

I think compute should be cheap enough to let us make use of the flaky -> regular edge, possibly even for tests that are timeouts.

I don't really like the name I came up with. Stage could be trial or recovery or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants