-
Notifications
You must be signed in to change notification settings - Fork 797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recreate as CHP proxy pod's deployment strategy #1401
Recreate as CHP proxy pod's deployment strategy #1401
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! Looks good to me modulo the two nits.
Knock knock
Race condition
Who's there?
😂
Using a rolling update by default on the proxy pod is a mistake by us, because of the JupyterHub / CHP proxy interaction. JupyterHub assumes in check_routes / add_route etc to be speaking to one specific CHP proxy server, but there can be different ones responding if we make an upgrade and the proxy pod is making a rolling upgrade. For example, consider a hub pod making a recreate upgrade, and a proxy pod makinga rolling upgrade. The new hub pod could for example get ready before the proxy pod and start speaking with the old proxy pod and later at a crucial point start speaking with the new pod. If you switch to speaking with the new pod at the wrong time, you may end up with failure to get responses from user pods that are verified to be around, and then they are deleted. So, this commit hope to fix a sneaky bug where user pods are deleted during upgrades where the proxy pod is also updated!
902a032
to
4bb76c7
Compare
Upgrades from previous state to this would fail without this fix of the issue caused by removing the fix in this PR: jupyterhub#1401
Bugfix for proxy upgrade strategy pr #1401
After this, it will be typical to find these kinds of errors on the hub starting up until the proxy becomes ready again.
|
hi,guys ,have you solved the questions , tornado.curl_httpclient.CurlError: HTTP 599: Connection timed out after 20001 milliseconds,i have meet the same problems, it bothers me for several days , please do give me some suggestions . waiting the response |
Yes, use the latest version, 0.10.6 of the helm chart. |
Using a rolling update by default on the proxy pod is a mistake by us,
because of the JupyterHub / CHP proxy interaction. JupyterHub assumes in
check_routes / add_route etc to be speaking to one specific CHP proxy
server, but there can be different ones responding if we make an upgrade
and the proxy pod is making a rolling upgrade.
For example, consider a hub pod making a recreate upgrade, and a proxy
pod makinga rolling upgrade. The new hub pod could for example get ready
before the proxy pod and start speaking with the old proxy pod and later
at a crucial point start speaking with the new pod. If you switch to
speaking with the new pod at the wrong time, you may end up with failure
to get responses from user pods that are verified to be around, and then
they are deleted.
So, this commit hope to fix a sneaky bug where user pods are deleted
during upgrades where the proxy pod is also updated!
Note that with the traefik proxy that would store state in a key value store, this may not be a problem, but we don't yet use traefik proxy.