Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

[DO NOT MERGE] Try nextest #11260

Closed
wants to merge 24 commits into from
Closed

[DO NOT MERGE] Try nextest #11260

wants to merge 24 commits into from

Conversation

alvicsam
Copy link
Contributor

@alvicsam alvicsam commented Apr 21, 2022

Try to run cargo nextest instead cargo test in test-linux-stable job

Part of https://github.com/paritytech/ci_cd/issues/334

@alvicsam alvicsam added A3-in_progress Pull request is in progress. No review needed at this stage. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Apr 21, 2022
@alvicsam alvicsam requested a review from a team as a code owner April 21, 2022 14:39
@github-actions github-actions bot added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Apr 21, 2022
@ggwpez
Copy link
Member

ggwpez commented Apr 21, 2022

Nice that the CI team is trying this out, I have been using this for the speedup!
The base command is cargo nextest run instead of cargo nextest.

@alvicsam
Copy link
Contributor Author

Thx for the hint :)

@alvicsam
Copy link
Contributor Author

@ggwpez do you use it locally? Strangely enough tests doesn't work with nextest in our CI

@ggwpez
Copy link
Member

ggwpez commented Apr 26, 2022

I just tried it on Substrate master and it works fine with cargo nextest run --workspace --locked --release --verbose --features runtime-benchmarks --manifest-path ./bin/node/cli/Cargo.toml.

Two things:

  • The tests currently fail even with the normal cargo test without nextest, so something is wrong.
  • Only the pallet-ui tests are failing, these need a specific rust compiler version. Maybe something from the other steps that you removed changed this?

It should start printing something like this:

Starting 3839 tests across 286 binaries (8 skipped)
        PASS [   0.089s]                         substrate-test-utils::basic basic_test_with_args
        PASS [   0.092s]                         substrate-test-utils::basic basic_test
        PASS [   0.094s]                                        beefy-gadget gossip::tests::note_same_round_twice
        PASS [   0.096s]                                        beefy-gadget gossip::tests::note_and_drop_round_works
        PASS [   0.100s]                                        beefy-gadget gossip::tests::known_votes_insert_remove
        PASS [   0.008s]                                        beefy-gadget round::tests::vote_threshold
        PASS [   0.005s]                                        beefy-gadget tests::beefy_protocol_name

@alvicsam
Copy link
Contributor Author

Thanks! I'll try to fix it and run with the options you provided

@alvicsam
Copy link
Contributor Author

node-cli tests continue to fail with nextest

@ggwpez
Copy link
Member

ggwpez commented Apr 28, 2022

node-cli tests continue to fail with nextest

They are quite CPU intensive, we could increase the timeouts there.
Can you check the CPU usage of the runner?

@alvicsam
Copy link
Contributor Author

I checked the CPU during the run. When node-cli started it wasn't used heavily.
Nextest docs says: The timeout is 60 seconds by default. I'm not sure that they fail because of it. The strange thing is that the failed tests are not the same, they may vary but all of them belongs to node-cli.

These tests usually fail:

  • node-cli::running_the_node_and_interrupt running_the_node_works_and_can_be_interrupted
  • node-cli::running_the_node_and_interrupt running_two_nodes_with_the_same_ws_port_should_work
  • node-cli::purge_chain_works purge_chain_works
  • node-cli::temp_base_path_works temp_base_path_works

@alvicsam
Copy link
Contributor Author

I added --retries=3 and it seems that it helped

@TriplEight
Copy link
Contributor

I added --retries=3 and it seems that it helped

this would mean that there are flaky tests. But AFAIK this was not the case and we didn't have to repeat the tests with the usual cargo test. Please try with a timeout, we've quite some long system/midule tests.

@bkchr
Copy link
Member

bkchr commented Apr 29, 2022

The mentioned tests are probably not written in a way of being "parallelizable" because they run a full node. However, I think we probably could fix this.

@ggwpez
Copy link
Member

ggwpez commented Apr 29, 2022

I re-run it on my machine and it now also fails. So these tests are somehow brittle.
It works when i reduce the number of test threads from 32 to 12, so definitely something with parallelization.
The CPU utilization hits 100% easily for me though, so something could get resource starved maybe.

@ggwpez
Copy link
Member

ggwpez commented May 5, 2022

I think we need to fix the node tests first, or what are you trying now?

@alvicsam
Copy link
Contributor Author

alvicsam commented May 5, 2022

Until they are fixed we can run them as a workaround in a different job. I was checking how much time will it take.

@alvicsam
Copy link
Contributor Author

alvicsam commented Jun 2, 2022

Closing the PR, because the were several big changes in CI, so it's easier to start from scratch

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants