Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go-linux-race is flaky #2741

Closed
RaduBerinde opened this issue Jul 13, 2023 · 9 comments
Closed

go-linux-race is flaky #2741

RaduBerinde opened this issue Jul 13, 2023 · 9 comments

Comments

@RaduBerinde
Copy link
Member

RaduBerinde commented Jul 13, 2023

Failure on go-linux-race:
https://github.com/cockroachdb/pebble/actions/runs/5546996852/jobs/10128026245?pr=2724

Looks like a timeout after ~15 minutes, though not at the level of go test which has 20 minutes timeout. In any case, we can probably do much better here by skipping some tests in race mode (in particular lint).

?   	github.com/cockroachdb/pebble/internal/ackseq	[no test files]
?   	github.com/cockroachdb/pebble/internal/bytealloc	[no test files]
?   	github.com/cockroachdb/pebble/internal/crc	[no test files]
?   	github.com/cockroachdb/pebble/internal/datatest	[no test files]
?   	github.com/cockroachdb/pebble/internal/errorfs	[no test files]
?   	github.com/cockroachdb/pebble/internal/humanize	[no test files]
?   	github.com/cockroachdb/pebble/internal/invariants	[no test files]
ok  	github.com/cockroachdb/pebble	286.984s
ok  	github.com/cockroachdb/pebble/bloom	0.257s
ok  	github.com/cockroachdb/pebble/cmd/pebble	0.028s
ok  	github.com/cockroachdb/pebble/internal/arenaskl	0.566s
ok  	github.com/cockroachdb/pebble/internal/base	0.111s
ok  	github.com/cockroachdb/pebble/internal/batchskl	0.140s
ok  	github.com/cockroachdb/pebble/internal/cache	6.992s
ok  	github.com/cockroachdb/pebble/internal/fastrand	0.020s [no tests to run]
ok  	github.com/cockroachdb/pebble/internal/intern	0.028s
ok  	github.com/cockroachdb/pebble/internal/keyspan	1.458s
ok  	github.com/cockroachdb/pebble/internal/lint	208.[75](https://github.com/cockroachdb/pebble/actions/runs/5546996852/jobs/10128026245?pr=2724#step:4:76)4s
?   	github.com/cockroachdb/pebble/internal/manual	[no test files]
ok  	github.com/cockroachdb/pebble/internal/manifest	150.4[80](https://github.com/cockroachdb/pebble/actions/runs/5546996852/jobs/10128026245?pr=2724#step:4:81)s
?   	github.com/cockroachdb/pebble/internal/pacertoy/pebble	[no test files]
?   	github.com/cockroachdb/pebble/internal/pacertoy/rocksdb	[no test files]
?   	github.com/cockroachdb/pebble/internal/private	[no test files]
?   	github.com/cockroachdb/pebble/internal/rangedel	[no test files]
?   	github.com/cockroachdb/pebble/internal/rate	[no test files]
ok  	github.com/cockroachdb/pebble/internal/metamorphic	124.[95](https://github.com/cockroachdb/pebble/actions/runs/5546996852/jobs/10128026245?pr=2724#step:4:96)7s
ok  	github.com/cockroachdb/pebble/internal/metamorphic/crossversion	0.058s
ok  	github.com/cockroachdb/pebble/internal/mkbench	2.227s
ok  	github.com/cockroachdb/pebble/internal/randvar	9.418s
ok  	github.com/cockroachdb/pebble/internal/rangekey	2.221s
ok  	github.com/cockroachdb/pebble/internal/rawalloc	0.055s [no tests to run]
ok  	github.com/cockroachdb/pebble/internal/testkeys	1.304s
?   	github.com/cockroachdb/pebble/objstorage	[no test files]
ok  	github.com/cockroachdb/pebble/metamorphic	17.318s
ok  	github.com/cockroachdb/pebble/objstorage/objstorageprovider	0.261s
ok  	github.com/cockroachdb/pebble/objstorage/objstorageprovider/objiotracing	0.030s
?   	github.com/cockroachdb/pebble/objstorage/shared	[no test files]
?   	github.com/cockroachdb/pebble/rangekey	[no test files]
make: *** [Makefile:26: test] Terminated
Error: Process completed with exit code [1](https://github.com/cockroachdb/pebble/actions/runs/5546996852/jobs/10128026245?pr=2724#step:9:1)43.
@RaduBerinde
Copy link
Member Author

Not sure here why we get a SIGTERM. The default timeout for CI jobs is 360 minutes..

@RaduBerinde
Copy link
Member Author

The CI status page shows a not very helpful The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.

@RaduBerinde
Copy link
Member Author

Closing this as there's nothing actionable left.

@nicktrav
Copy link
Contributor

Not sure here why we get a SIGTERM

We've seen this previously when we exhausted the resource limits for the job (container?). This is poorly documented on the GH side, so it's hard to say definitively if this was the issue. Just throwing it out there in case this keeps happening ...

@RaduBerinde
Copy link
Member Author

resource limits

Is there any documentation anywhere about these limits? What would cause termination, I'm guessing RAM?

According to this linux runners have 7GB of RAM. I will investigate if there are any tests that get close to that.

@nicktrav
Copy link
Contributor

There's some context on #2159 that might help.

One suggestion that came out of that was that we could look into larger runners, which is now available to our account. Maybe worth revisiting.

@RaduBerinde
Copy link
Member Author

Locally I see only about 2GB max resident during make testrace

@RaduBerinde
Copy link
Member Author

Let me reopen while I keep investigating a bit more.

@RaduBerinde RaduBerinde reopened this Jul 17, 2023
RaduBerinde added a commit to RaduBerinde/pebble that referenced this issue Jul 17, 2023
Reduce the max number of shards and skip the random block size tests
under race. This significantly reduces the runtime and memory
footprint of the test.

Informs: cockroachdb#2741
RaduBerinde added a commit that referenced this issue Jul 18, 2023
Reduce the max number of shards and skip the random block size tests
under race. This significantly reduces the runtime and memory
footprint of the test.

Informs: #2741
@RaduBerinde
Copy link
Member Author

Reduced the scope of objstorage/objstorageprovider/sharedcache test in race mode; it makes various random choices which could bring up memory usage in a fraction of cases. The logs above suggests that this package was running at the time. I am comfortable closing this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

2 participants