Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crash on startup after the HRDA addition #76

Merged
merged 3 commits into from
Aug 2, 2020

Conversation

mumoshu
Copy link
Collaborator

@mumoshu mumoshu commented Jul 29, 2020

This is a follow-up for #66.

The reconciler for the new HorizontalRunnerDeploymentAutoscaler had a terrible flaw that prevented the controller to fail launching due to an error like:

indexer conflict: map[field:.metadata.controller:{}]

This fixes that, while adding integration_test.go to verify its actually fixed and prevent regression in the future.

This is a follow-up for #66.

The reconciler for the new HorizontalRunnerDeploymentAutoscaler had a terrible flaw that prevented the controller to fail launching due to an error like:

```
indexer conflict: map[field:.metadata.controller:{}]
```

This fixes that, while adding `integration_test.go` to verify its actually fixed and prevent regression in the future.
Recorder: mgr.GetEventRecorderFor("runnerdeployment-controller"),
}
err = deploymentsController.SetupWithManager(mgr)
Expect(err).NotTo(HaveOccurred(), "failed to setup controller")
Copy link
Collaborator Author

@mumoshu mumoshu Jul 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The essential difference between other tests and this integration_test is that this one registers all the reconcilers, which is helpful to find any bug related to the whole controller startup.

Copy link
Contributor

@summerwind summerwind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!
The test did not pass in my environment. I need time to dig deeper on this... 🤔

Running Suite: Controller Suite
===============================
Random Seed: 1596114499
Will run 3 of 3 specs

••
------------------------------
• Failure [5.553 seconds]
Inside of a new namespace
/Users/moto/dev/src/github.com/summerwind/actions-runner-controller/controllers/integration_test.go:96
  when no existing resources exist
  /Users/moto/dev/src/github.com/summerwind/actions-runner-controller/controllers/integration_test.go:100
    should create and scale runners [It]
    /Users/moto/dev/src/github.com/summerwind/actions-runner-controller/controllers/integration_test.go:102

    Timed out after 5.002s.
    Expected
        <int>: 2
    to be equivalent to
        <int>: 1

    /Users/moto/dev/src/github.com/summerwind/actions-runner-controller/controllers/integration_test.go:249
------------------------------
2020-07-30T22:08:28.309+0900	INFO	controller-runtime.metrics	metrics server is starting to listen	{"addr": ":8080"}
2020-07-30T22:08:28.309+0900	INFO	controller-runtime.manager	starting metrics server	{"path": "/metrics"}
2020-07-30T22:08:28.409+0900	ERROR	controller-runtime.metrics	failed to register metric	{"name": "workqueue_depth", "queue": "runnerreplicaset", "error": "duplicate metrics collector registration attempted"}
github.com/go-logr/zapr.(*zapLogger).Error
	/Users/moto/.go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/metrics.registerWorkqueueMetric
	/Users/moto/.go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/metrics/workqueue.go:37
sigs.k8s.io/controller-runtime/pkg/metrics.workqueueMetricsProvider.NewDepthMetric
	/Users/moto/.go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/metrics/workqueue.go:50
k8s.io/client-go/util/workqueue.(*queueMetricsFactory).newQueueMetrics
	/Users/moto/.go/pkg/mod/k8s.io/client-go@v0.0.0-20190918160344-1fbdaa4c8d90/util/workqueue/metrics.go:243
k8s.io/client-go/util/workqueue.NewNamed
	/Users/moto/.go/pkg/mod/k8s.io/client-go@v0.0.0-20190918160344-1fbdaa4c8d90/util/workqueue/queue.go:44
k8s.io/client-go/util/workqueue.newDelayingQueue
	/Users/moto/.go/pkg/mod/k8s.io/client-go@v0.0.0-20190918160344-1fbdaa4c8d90/util/workqueue/delaying_queue.go:47
k8s.io/client-go/util/workqueue.NewNamedDelayingQueue
	/Users/moto/.go/pkg/mod/k8s.io/client-go@v0.0.0-20190918160344-1fbdaa4c8d90/util/workqueue/delaying_queue.go:42
k8s.io/client-go/util/workqueue.NewNamedRateLimitingQueue
	/Users/moto/.go/pkg/mod/k8s.io/client-go@v0.0.0-20190918160344-1fbdaa4c8d90/util/workqueue/rate_limiting_queue.go:46
sigs.k8s.io/controller-runtime/pkg/controller.New.func1
	/Users/moto/.go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/controller/controller.go:90
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start
	/Users/moto/.go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:151
sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).startLeaderElectionRunnables.func1
	/Users/moto/.go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/manager/internal.go:477
2020-07-30T22:08:28.409+0900	ERROR	controller-runtime.metrics	failed to register metric	{"name": "workqueue_depth", "queue": "runnerdeployment", "error": "duplicate metrics collector registration attempted"}
...

@@ -130,25 +130,8 @@ func (r *HorizontalRunnerAutoscalerReconciler) Reconcile(req ctrl.Request) (ctrl
func (r *HorizontalRunnerAutoscalerReconciler) SetupWithManager(mgr ctrl.Manager) error {
r.Recorder = mgr.GetEventRecorderFor("runnerdeployment-controller")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this controller name should be changed as well 😉

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Fixed in 50487bb

@summerwind
Copy link
Contributor

@mumoshu I still haven't been able to do a successful integration test. Are you sure that the integration tests pass when you run make test? In my environment, AutoScaler doesn't seem to be working properly due to a failure to access the fake server.

2020-08-01T19:37:17.446+0900	ERROR	Could not compute replicas	{"horizontalrunnerautoscaler": "testns-ieyoh/example-runnerdeploy", "error": "Get http://127.0.0.1:54042/repos/foo/bar/actions/runs: dial tcp 127.0.0.1:54042: connect: connection refused"}

@summerwind
Copy link
Contributor

@mumoshu
Copy link
Collaborator Author

mumoshu commented Aug 2, 2020

@summerwind Thanks for the rewview! Yeah it worked on my machine. I'll follow your comments and investigate/fix it asap

@mumoshu
Copy link
Collaborator Author

mumoshu commented Aug 2, 2020

@summerwind I could easily reproduce the error. I'm not sure why I thought it was working, but it should now be fixed anyway 🙏

I've also enhanced the integration test to cover the scale-up scenario. I will probably cover scale-down scenario as well, in another PR.

Copy link
Contributor

@summerwind summerwind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've confirmed that the test will be a success! Thanks for the fix!
LGTM 👍

@summerwind summerwind merged commit 3818e58 into master Aug 2, 2020
@summerwind summerwind deleted the fix-crash-on-startup branch August 2, 2020 04:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants