-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a simple network prober to the activator. #3257
Add a simple network prober to the activator. #3257
Conversation
/test pull-knative-serving-unit-tests |
334506c
to
fd054ea
Compare
/test pull-knative-serving-integration-tests |
3dec44f
to
d73c81f
Compare
/lgtm |
d73c81f
to
7d95873
Compare
7d95873
to
d071d17
Compare
If this works, let's check it in because I feel like this is part of #813 which is work we signed up for in 0.4.0. Thanks for the fix! |
/lgtm |
@tcnghia I'm worried that this is potentially disruptive to a key path. After 0.4.0 cuts, I'll check this in enabled by default and we can discuss cherry picking it into 0.4.x at the WG meeting after it's baked at least a little in master. |
d071d17
to
4b868e7
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mattmoor The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
4b868e7
to
f58c04c
Compare
@markusthoemmes The probe endpoints are in 0.4 and unused. However, this would affect a conversation about whether to cherry-pick this (I added this to the #networking agenda on Thursday). |
f58c04c
to
0a6f316
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, RFAL.
/hold cancel |
http.CanonicalHeaderKey(network.ProbeHeaderName): []string{"true"}, | ||
}, | ||
} | ||
settings := wait.Backoff{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this invalidates the comment above where maxRetries
says that the sum will evaluate to 1 minute. My math shows you'll be getting 37.151 seconds (or 48397, depending on how wait internals are done), but still not a minute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That comment was written for the other code path. We can clean that up when we remove the other path?
0a6f316
to
dfb4c44
Compare
/test pull-knative-serving-integration-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/test pull-knative-serving-integration-tests |
/lgtm |
/test pull-knative-serving-integration-tests |
pkg/activator/handler/handler.go
Outdated
a.Logger.Errorw("Pod probe failed", zap.Error(err)) | ||
return false, nil | ||
} | ||
if probeResp.StatusCode != http.StatusOK { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to do similar to the retry roundtripper where we only continue on 503? I worry that retrying on any !ok response will surface e.g. 500 with a 1min delay on activation rather than instant fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to only continue on 503, and below to only declare success on http.StatusOK
.
dfb4c44
to
dfcdcc8
Compare
dfcdcc8
to
da4a5f3
Compare
/test pull-knative-serving-upgrade-tests |
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use. Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes). This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero. Fixes: knative#3239 Fixes: knative#2856
da4a5f3
to
b5dbff6
Compare
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/retest |
1 similar comment
/retest |
When the flag
-enable-network-probing
is passed the activator will replace its retring transport logic with a simple network probe based on #3256 with a similar number of retries to what the retrying transport was previously configured to use. Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use
t.Parallel()
since it will parallelize the two times this waits for a scale-to-zero.Fixes: #3239
Fixes: #2856