Mark flaky NixOS tests #216828

roberth · 2023-02-17T17:17:56Z

Describe the bug

We have become accustomed to some tests failing every now and then.
Making sweeping changes is harder than necessary because we can't trust the outcomes of tests.

Flaky tests are a nuisance to everyone who invests good work to get changes through and have the side-effect of wearing down motivation to painstaking.

Steps To Reproduce

Steps to reproduce the behavior:

Make a mass-rebuild change or a change to the test driver
Some tests fail inexplicably. What now?

Expected behavior

Flaky tests can be marked as such, so that they can still be fixed and don't bog anyone down.

Additional context

nixos/doc: Add Developing the Test Driver #216660

Notify maintainers

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
output here

The text was updated successfully, but these errors were encountered:

vcunat · 2023-02-28T10:54:45Z

I'd simply mark them as broken, so they won't get built by default. I mean, what good is it to make Hydra run them all the time when we don't do anything with the result?

roberth · 2023-02-28T12:06:21Z

what good is it to make Hydra run them all the time when we don't do anything with the result?

We could do something with the result. After a sufficient number of successes we may consider a test stable again. This can happen when a non-deterministic bug is fixed in the actual program, and the test code was ok all along.

We could run only those tests that have been worked on since they were marked as flaky, by introducing another type of marking.
As a compute cost optimization we could then choose not to run those tests. That removes the last edge of the state machine:

regular -> flaky        -- discovered to be flaky
regular -> stage        -- discovered flaky with obvious fix (but not quite certain)
stage -> flaky          -- test is still flaky and not worked on anymore
stage -> regular        -- test has run successfully for many times
flaky -> stage          -- test has been improved
flaky -> regular        -- non-deterministic bug has been fixed without changing the test

I think compute should be cheap enough to let us make use of the flaky -> regular edge, possibly even for tests that are timeouts.

I don't really like the name I came up with. Stage could be trial or recovery or something.

roberth added 0.kind: bug 0.kind: enhancement 6.topic: nixos 6.topic: developer experience 6.topic: testing Tooling for automated testing of packages and modules labels Feb 17, 2023

roberth mentioned this issue Feb 17, 2023

nixos/doc: Add Developing the Test Driver #216660

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mark flaky NixOS tests #216828

Mark flaky NixOS tests #216828

roberth commented Feb 17, 2023 •

edited

Loading

vcunat commented Feb 28, 2023

roberth commented Feb 28, 2023

Mark flaky NixOS tests #216828

Mark flaky NixOS tests #216828

Comments

roberth commented Feb 17, 2023 • edited Loading

Describe the bug

Steps To Reproduce

Expected behavior

Additional context

Notify maintainers

Metadata

vcunat commented Feb 28, 2023

roberth commented Feb 28, 2023

roberth commented Feb 17, 2023 •

edited

Loading