Fixing flaky tests #3563

thampiotr · 2023-04-18T09:53:49Z

PR Description

Fixes a few flaky tests:

component/loki/source/file/file_test.go had a race condition on clean up & component creating a file
component/module/file/file_test.go had an issue where health message was not deterministic, leading to test flakiness.
Test_serviceManager/can_run_service_binary fails to dial a local service, potentially not enough time to start up? increased the backoff time.

thampiotr · 2023-04-18T11:18:27Z

component/module/file/file.go

+	// if both components are healthy - return c.mod's health, so we can have a stable Health.Message.
+	if leastHealthy.Health == component.HealthTypeHealthy {
+		return c.mod.CurrentHealth()
+	}
+	return leastHealthy


There is a race without this - both components are reporting healthy, so the timestamp determines which one is LeastHealthy. This leads to health message being non-deterministic.

I think it's safe to pick the parent component's health message if both are healthy.

An alternative would be to combine their health messages.

Will leave this to @erikbaranowski

We talked offline and I'm good to merge this as is for now. A higher level look at component health management might be needed to fix this in a more elegant way but the path forward will require some more thinking.

thampiotr · 2023-04-18T12:52:01Z

pkg/util/eventually.go

@@ -11,7 +11,7 @@ import (
 var backoffRetry = backoff.Config{
 	MinBackoff: 10 * time.Millisecond,
 	MaxBackoff: 1 * time.Second,
-	MaxRetries: 5,
+	MaxRetries: 10,


I was intermittently getting the following in CI:

--- FAIL: Test_serviceManager (1.28s) --- FAIL: Test_serviceManager/can_run_service_binary (0.28s) eventually.go:48: Error Trace: /Users/runner/work/agent/agent/cmd/grafana-agent-service/service_test.go:40 /Users/runner/work/agent/agent/cmd/grafana-agent-service/eventually.go:68 /Users/runner/work/agent/agent/cmd/grafana-agent-service/eventually.go:33 /Users/runner/work/agent/agent/cmd/grafana-agent-service/eventually.go:21 /Users/runner/work/agent/agent/cmd/grafana-agent-service/service_test.go:38 Error: Received unexpected error: Post "http://127.0.0.1:49178/echo/response": dial tcp 127.0.0.1:49178: connect: connection refused FAIL FAIL github.com/grafana/agent/cmd/grafana-agent-service 1.496s

Couldn't repro this locally. Potentially macOS CI agent is slow (they often are) and we need more time.

thampiotr · 2023-04-18T12:52:41Z

component/module/file/file_test.go

+	require.True(
+		t,
+		strings.HasPrefix(s, prefix),
+		"expected '%v' to have '%v' prefix",


Adds more useful error message.

mattdurham · 2023-04-18T13:22:25Z

Looks good from me, will let Erik make the call on the healthiness.

mattdurham

LGTM

* Fixing test race condition * Fixing test race condition * Fixing other test race condition * Better fix for race condition #1 * Attempt to reduce test flakyness

thampiotr requested review from erikbaranowski and spartan0x117 April 18, 2023 12:37

thampiotr added 5 commits April 18, 2023 13:41

Fixing test race condition

cb9c090

Fixing test race condition

f625a0c

Fixing other test race condition

65305f1

Better fix for race condition #1

e519154

Attempt to reduce test flakyness

b7e6156

thampiotr force-pushed the thampiotr/fixing-test-race-condition branch from 99176a5 to b7e6156 Compare April 18, 2023 12:50

thampiotr commented Apr 18, 2023

View reviewed changes

thampiotr changed the title ~~Fixing test race condition~~ Fixing flaky tests Apr 18, 2023

thampiotr marked this pull request as ready for review April 18, 2023 13:02

thampiotr requested a review from mattdurham April 18, 2023 13:02

mattdurham approved these changes Apr 18, 2023

View reviewed changes

mattdurham merged commit 8921c2b into main Apr 18, 2023

mattdurham deleted the thampiotr/fixing-test-race-condition branch April 18, 2023 14:09

thampiotr mentioned this pull request Apr 18, 2023

go fix clean up & minor docs fixes #3557

Merged

clayton-cornell pushed a commit that referenced this pull request Aug 14, 2023

Fixing flaky tests (#3563)

807793d

* Fixing test race condition * Fixing test race condition * Fixing other test race condition * Better fix for race condition #1 * Attempt to reduce test flakyness

clayton-cornell pushed a commit that referenced this pull request Aug 14, 2023

Fixing flaky tests (#3563)

11d3816

* Fixing test race condition * Fixing test race condition * Fixing other test race condition * Better fix for race condition #1 * Attempt to reduce test flakyness

github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Mar 1, 2024

github-actions bot locked as resolved and limited conversation to collaborators Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing flaky tests #3563

Fixing flaky tests #3563

thampiotr commented Apr 18, 2023 •

edited

Loading

thampiotr Apr 18, 2023 •

edited

Loading

mattdurham Apr 18, 2023

erikbaranowski Apr 18, 2023

thampiotr Apr 18, 2023

thampiotr Apr 18, 2023

mattdurham commented Apr 18, 2023

mattdurham left a comment

Fixing flaky tests #3563

Fixing flaky tests #3563

Conversation

thampiotr commented Apr 18, 2023 • edited Loading

PR Description

thampiotr Apr 18, 2023 • edited Loading

Choose a reason for hiding this comment

mattdurham Apr 18, 2023

Choose a reason for hiding this comment

erikbaranowski Apr 18, 2023

Choose a reason for hiding this comment

thampiotr Apr 18, 2023

Choose a reason for hiding this comment

thampiotr Apr 18, 2023

Choose a reason for hiding this comment

mattdurham commented Apr 18, 2023

mattdurham left a comment

Choose a reason for hiding this comment

thampiotr commented Apr 18, 2023 •

edited

Loading

thampiotr Apr 18, 2023 •

edited

Loading