Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hub pod's readinessProbe too strict #1936

Closed
chrisbulgaria opened this issue Dec 8, 2020 · 7 comments · Fixed by #1941
Closed

hub pod's readinessProbe too strict #1936

chrisbulgaria opened this issue Dec 8, 2020 · 7 comments · Fixed by #1941
Labels
Milestone

Comments

@chrisbulgaria
Copy link

Bug description

Using a custom authenticator for jupyterhub (on kubernetes) which takes some time todo the authentication - > 10sec. Upgrading from helm chart version 0.9.1 (where everything was working) to 0.10.6.
Now after the login the redirect to /hub/ times out after 1 minute, resulting in HTTP 503. Logs below.
A refresh of the browser (repeating the call to /hub) works then without problems.
It seems to be related to the time the authenticator takes - with 5 sec all is fine, > 10 sec not

Expected behaviour

The call to /hub succedes (does the spawn)

Actual behaviour

HTTP 503

How to reproduce

Use below as custom authenticator reproduces,

def authenticate(self, handler, data):

  
    print("enter authenticate") 

    time.sleep(10)
    return {
        'name': 'testuser'} 

Your personal set up

Jupyterhub (helm chart 0.10.6 on Kubernetes (EKS)). On 0.9.1 works fine.

# paste relevant logs here, if any
`
proxy

6:42:45.269 [ConfigProxy] info: 200 GET /api/routes
16:43:43.427 [ConfigProxy] info: 200 GET /api/routes
16:43:43.437 [ConfigProxy] info: 200 GET /api/routes
16:43:51.066 [ConfigProxy] debug: PROXY WEB /hub/ to http://hub:8081
16:43:51.092 [ConfigProxy] debug: PROXY WEB /hub/login?next=%2Fhub%2F to http://hub:8081
16:44:01.129 [ConfigProxy] debug: PROXY WEB /hub/ to http://hub:8081
16:44:43.443 [ConfigProxy] info: 200 GET /api/routes
16:45:01.130 [ConfigProxy] error: 503 GET /hub/ socket hang up
16:45:01.437 [ConfigProxy] debug: PROXY WEB /favicon.ico to http://hub:8081
16:45:43.443 [ConfigProxy] info: 200 GET /api/routes
16:46:43.453 [ConfigProxy] info: 200 GET /api/routes

Hub

D 2020-12-08 16:43:50.526 JupyterHub log:181] 200 GET /hub/health (@xxxx) 0.73ms
[I 2020-12-08 16:43:51.070 JupyterHub log:181] 302 GET /hub/ -> /hub/login?next=%2Fhub%2F (@x) 0.90ms
enter authenticate
[D 2020-12-08 16:44:01.108 JupyterHub base:519] Setting cookie jupyterhub-session-id: {'httponly': True}
[D 2020-12-08 16:44:01.109 JupyterHub base:523] Setting cookie for schaper: jupyterhub-hub-login
[D 2020-12-08 16:44:01.109 JupyterHub base:519] Setting cookie jupyterhub-hub-login: {'httponly': True, 'path': '/hub/'}
[I 2020-12-08 16:44:01.110 JupyterHub base:755] User logged in: testuser
[I 2020-12-08 16:44:01.110 JupyterHub log:181] 302 GET /hub/login?next=%2Fhub%2F -> /hub/ (@x) 10015.43ms
[D 2020-12-08 16:44:01.114 JupyterHub log:181] 200 GET /hub/health (@x) 2.52ms
[D 2020-12-08 16:44:01.116 JupyterHub log:181] 200 GET /hub/health (@x) 3.94ms
[D 2020-12-08 16:44:01.116 JupyterHub log:181] 200 GET /hub/health (@x) 4.06ms
...
[D 2020-12-08 16:44:34.527 JupyterHub log:181] 200 GET /hub/health (@x) 0.66ms
[D 2020-12-08 16:44:36.526 JupyterHub log:181] 200 GET /hub/health (@x) 0.69ms
[D 2020-12-08 16:44:38.526 JupyterHub log:181] 200 GET /hub/health (@x) 0.65ms
[D 2020-12-08 16:44:40.526 JupyterHub log:181] 200 GET /hub/health (@x) 0.64ms
[D 2020-12-08 16:44:42.526 JupyterHub log:181] 200 GET /hub/health (@x) 0.61ms
[D 2020-12-08 16:44:43.439 JupyterHub proxy:788] Proxy: Fetching GET http://proxy-api:8001/api/routes
[I 2020-12-08 16:44:43.444 JupyterHub proxy:320] Checking routes
[D 2020-12-08 16:44:44.526 JupyterHub log:181] 200 GET /hub/health (@x) 0.75ms
[D 2020-12-08 16:44:46.526 JupyterHub log:181] 200 GET /hub/health (@x) 0.81ms
...
[D 2020-12-08 16:45:00.526 JupyterHub log:181] 200 GET /hub/health (@x) 0.82ms
[D 2020-12-08 16:45:01.133 JupyterHub pages:654] No template for 503
[I 2020-12-08 16:45:01.169 JupyterHub log:181] 200 GET /hub/error/503?url=%2Fhub%2F (@x) 35.98ms
[D 2020-12-08 16:45:01.441 JupyterHub log:181] 200 GET /favicon.ico (@x) 0.98ms





@welcome
Copy link

welcome bot commented Dec 8, 2020

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@consideRatio
Copy link
Member

consideRatio commented Dec 8, 2020

Upgrade with --set hub.networkPolicy.enabled=false or configure it in your passed config. If this still persists, I assume you have an issue with JupyterHub itself rather than this Helm chart deploying JupyterHub. If it goes away, I assume you had some custom logic that required network access to the hub pod directly instead of going through the proxy which was failing.

Btw, did you reproduce this with the dummy authenticator you examplified, or do you believe it would reproduce with that authenticator?

@chrisbulgaria
Copy link
Author

Thanks for the very quick reply ! It’s not network policy, that’s the first thing I disabled in the upgrade . And yes , it reproduces with the dummy Authenticator. And changing the sleep to 5 sec makes it work .
When I try to manually connect from the proxy to the hub while a long authentication is ongoing I see the same - hangs and times out after a minute . Shortly before and after all works .

@chrisbulgaria
Copy link
Author

As further info - its as well really the upgrade which caused this behaviour and nothing else which may have by chance happened in our environment at the same time. Just downgrading back to 0.9.1 (and undoing the minimal changes we needed to do in values to support 0.10.6 - network policy disabling and different place for imagepullsecret) => works. Upgrading again => doesnt work. All with the dummy authenticator above.
Looking further .... but any hint what change could relate to this would be appreciated .

@chrisbulgaria
Copy link
Author

:) know hat it is. It is actually the helm chart, Readiness probe. This fails while the authenticator is running , 3 * 2 = 6 seconds until failure and update of pod which then causes subsequent issues. Thats why 5 sec work and 10 sec not. So I have a workaround for me at least - change the readiness probe.

@consideRatio consideRatio transferred this issue from jupyterhub/jupyterhub Dec 10, 2020
@welcome
Copy link

welcome bot commented Dec 10, 2020

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@consideRatio
Copy link
Member

Ah!!!! Nice catch isolating this!!! Thank you for your work on this @chrisbulgaria! This issue is related: #1732.

@consideRatio consideRatio changed the title Version upgrade (helm 0.9.1 ->0.10.6) - custom authenticator taking > 10 sec leads to subsequent failure hub pod's readinessProbe too strict Dec 11, 2020
@consideRatio consideRatio added this to the 0.11.0 milestone Dec 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants