Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add grpc client-side reconnection #4598

Merged
merged 3 commits into from
Jun 24, 2021
Merged

Add grpc client-side reconnection #4598

merged 3 commits into from
Jun 24, 2021

Conversation

aledbf
Copy link
Member

@aledbf aledbf commented Jun 24, 2021

In case of connection errors (pod restart), reconnect.

Example error:

{"@type":"type.googleapis.com/google.devtools.clouderrorreporting.v1beta1.ReportedErrorEvent","error":"invalid ref: rpc error: code = Unavailable desc = error reading from server: read tcp 10.0.128.131:55298-\u003e172.20.175.142:8080: read: no route to host:\n    github.com/gitpod-io/gitpod/registry-facade/pkg/registry.(*RemoteSpecProvider).GetSpec\n        github.com/gitpod-io/gitpod/registry-facade/pkg/registry/imagecfg.go:55\n  - invalid 

How to test:

  • start a workspace and close it.
  • kill ws-manager pod
  • open a new workspace.

Right now we only see errors in the log (and "Pulling container image …")

@codecov
Copy link

codecov bot commented Jun 24, 2021

Codecov Report

Merging #4598 (633b4ab) into main (b6eaadb) will decrease coverage by 20.47%.
The diff coverage is 0.00%.

❗ Current head 633b4ab differs from pull request most recent head 0b6923c. Consider uploading reports for the commit 0b6923c to get more accurate results
Impacted file tree graph

@@             Coverage Diff             @@
##             main    #4598       +/-   ##
===========================================
- Coverage   36.15%   15.68%   -20.48%     
===========================================
  Files          14        9        -5     
  Lines        3864     1760     -2104     
===========================================
- Hits         1397      276     -1121     
+ Misses       2349     1459      -890     
+ Partials      118       25       -93     
Flag Coverage Δ
components-blobserve-app 28.40% <ø> (?)
components-registry-facade-app 11.52% <0.00%> (?)
components-registry-facade-lib 11.52% <0.00%> (?)
components-ws-manager-app ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
components/registry-facade/pkg/registry/blob.go 0.00% <ø> (ø)
...omponents/registry-facade/pkg/registry/imagecfg.go 0.00% <ø> (ø)
...onents/registry-facade/pkg/registry/layersource.go 28.80% <ø> (ø)
...omponents/registry-facade/pkg/registry/manifest.go 9.85% <ø> (ø)
...omponents/registry-facade/pkg/registry/registry.go 0.00% <0.00%> (ø)
components/ws-manager/pkg/manager/annotations.go
components/ws-manager/pkg/manager/probe.go
components/ws-manager/pkg/manager/create.go
...-manager/pkg/manager/internal/workpool/workpool.go
components/ws-manager/pkg/manager/manager_ee.go
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b6eaadb...0b6923c. Read the comment docs.

@aledbf aledbf force-pushed the aledbf/registryreconnect branch 5 times, most recently from 633b4ab to 457b01b Compare June 24, 2021 05:39
@aledbf aledbf requested a review from csweichel June 24, 2021 05:39
@aledbf aledbf changed the title [registry-facade] Add client-side reconnection Add grpc client-side reconnection Jun 24, 2021
@aledbf aledbf marked this pull request as ready for review June 24, 2021 06:06
//
// Note: the retry policy only has an effect if ws-manager is run with the GRPC_GO_RETRY=on env var set.
// see https://github.com/grpc/grpc-go/blob/506b7730668b5a13465224b0d8133f974a3f843d/dialoptions.go#L522-L524
var retryPolicy = `{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this is fine because

  • we never had GRPC_GO_RETRY set, or because
  • some other option is taking over the functionality this config provided?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some other option is taking over the functionality this config provided?

The changes in the PR introduce three changes:

  • grpc.KeepaliveEnforcementPolicy

Allow more frequent pings. Default is 5 minutes https://github.com/grpc/grpc-go/blob/87eb5b7502493f758e76c4d09430c0049a81a557/internal/transport/defaults.go#L40
This implies that more frequent pings result in an error HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we never had GRPC_GO_RETRY set, or because

Good point. This env variable was configured only in ws-manager. Now is present in all components with grpc

@aledbf aledbf merged commit 007c937 into main Jun 24, 2021
@aledbf aledbf deleted the aledbf/registryreconnect branch June 24, 2021 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants