JupyterHub CrashLoopBackOff #493

kdubovikov · 2018-02-10T08:28:09Z

I am trying to spin up JupyterHub using helm and all resources start successfully, but after a short time hub pod enters CrashLoopBackOff.

Installation was performed using the following command:

helm install jupyterhub/jupyterhub --version=v0.6 --name=jupyterhub --namespace jupyterhub -f ./jupyterhub/config.yaml --timeout=1000000

I've also tested version 0.5 and got the same results.

Logs:

$ kubectl logs po/hub-56d985bfb8-vb6pl --namespace jupyterhub
[I 2018-02-10 08:21:27.439 JupyterHub app:830] Loading cookie_secret from env[JPY_COOKIE_SECRET]
[W 2018-02-10 08:21:27.673 JupyterHub app:955] No admin users, admin interface will be unavailable.
[W 2018-02-10 08:21:27.673 JupyterHub app:956] Add any administrative users to `c.Authenticator.admin_users` in config.
[I 2018-02-10 08:21:27.673 JupyterHub app:983] Not using whitelist. Any authenticated user will be allowed.
[I 2018-02-10 08:21:28.025 JupyterHub app:1528] Hub API listening on http://0.0.0.0:8081/hub/
[I 2018-02-10 08:21:28.026 JupyterHub app:1538] Not starting proxy
[I 2018-02-10 08:21:28.026 JupyterHub app:1544] Starting managed service cull-idle
[I 2018-02-10 08:21:28.026 JupyterHub service:266] Starting service 'cull-idle': ['/usr/local/bin/cull_idle_servers.py', '--timeout=3600', '--cull-every=600', '--url=http://127.0.0.1:8081/hub/api']
[I 2018-02-10 08:21:28.053 JupyterHub service:109] Spawning /usr/local/bin/cull_idle_servers.py --timeout=3600 --cull-every=600 --url=http://127.0.0.1:8081/hub/api
[I 2018-02-10 08:21:28.263 JupyterHub log:122] 200 GET /hub/api/users (cull-idle@127.0.0.1) 25.95ms
[E 2018-02-10 08:21:48.064 JupyterHub app:1623]
    Traceback (most recent call last):
      File "/usr/local/lib/python3.5/dist-packages/jupyterhub/app.py", line 1621, in launch_instance_async
        yield self.start()
      File "/usr/local/lib/python3.5/dist-packages/jupyterhub/app.py", line 1569, in start
        yield self.proxy.check_routes(self.users, self._service_map)
      File "/usr/local/lib/python3.5/dist-packages/jupyterhub/proxy.py", line 294, in check_routes
        routes = yield self.get_all_routes()
      File "/usr/local/lib/python3.5/dist-packages/jupyterhub/proxy.py", line 589, in get_all_routes
        resp = yield self.api_request('', client=client)
    tornado.curl_httpclient.CurlError: HTTP 599: Connection timed out after 20000 milliseconds

Namespace status:

kubectl get all --namespace dljupyterhub   

NAME           DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/hub     1         1         1            0           22m
deploy/proxy   1         1         1            1           22m

NAME                  DESIRED   CURRENT   READY     AGE
rs/hub-5479595c8d     1         1         0         22m
rs/proxy-6fbf784dbd   1         1         1         22m

NAME           DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/hub     1         1         1            0           22m
deploy/proxy   1         1         1            1           22m

NAME                  DESIRED   CURRENT   READY     AGE
rs/hub-5479595c8d     1         1         0         22m
rs/proxy-6fbf784dbd   1         1         1         22m

NAME                        READY     STATUS             RESTARTS   AGE
po/hub-5479595c8d-7qhzb     0/1       CrashLoopBackOff   7          22m
po/proxy-6fbf784dbd-pt5q6   2/2       Running            0          22m

NAME                               TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
svc/glusterfs-dynamic-hub-db-dir   ClusterIP      10.104.97.71    <none>        1/TCP                        15m
svc/hub                            ClusterIP      10.106.231.99   <none>        8081/TCP                     22m
svc/proxy-api                      ClusterIP      10.104.175.74   <none>        8001/TCP                     22m
svc/proxy-http                     ClusterIP      10.106.46.200   <none>        8000/TCP                     22m
svc/proxy-public                   LoadBalancer   10.109.72.153   <pending>     80:32500/TCP,443:31790/TCP   22m

Contents of `config.yaml`

hub:
  cookieSecret: "aaa"
proxy:
  secretToken: "bbb"
singleuser:
  storage:
    capacity: 2Gi
    dynamic:
      storageClass: gluster-heketi
ingress:
    enabled: true
    hosts:
     - host1

The text was updated successfully, but these errors were encountered:

yuvipanda · 2018-02-13T01:06:55Z

Heya @kdubovikov! Thanks for filing this issue!

It looks like the hub pod can not reach the proxy pod. Is Pod networking and kube-proxy working properly? I suspect this is an OpenStack installation / bare-metal setup. Are other services on the cluster working fine? Does https://scanner.heptio.com/ find any issues?

kdubovikov · 2018-02-14T14:17:48Z

Hey @yuvipanda , thanks for the response. All other services are working fine (we also run glusterfs). I've ran the tests and no issues have been found:

Ran 125 of 710 Specs in 3156.548 seconds
SUCCESS! -- 125 Passed | 0 Failed | 0 Pending | 585 Skipped PASS

Also, I am able to run Jupyter Hub with KubeSpawner outside of the cluster without issues.

yuvipanda · 2018-02-22T00:56:56Z

Hmm, in that case I'm at a loss about what is going on :(

willingc · 2018-02-27T20:34:58Z

Ping @minrk. Any thoughts?

Are you still seeing this issue @kdubovikov?

minrk · 2018-02-28T10:02:04Z

It does seem like a networking problem, but I'm not sure what the best way to debug it would be. You could edit the Hub command to run a while true; do sleep 10; done and then kubectl exec hub-pod bash and see if you can communicate with the proxy via curl/etc.

You could also try communicating with the proxy from another context (e.g. outside the cluster, another pod, etc.) to be sure that the proxy pod is accepting connections.

Do you have any NetworkPolicy config on the cluster?

kdubovikov · 2018-03-01T10:15:01Z

@minrk, I think no NetworkPolicy is present. The cluster was set up using kubeadm. Cloud you clarify on where do I need to change the Hub command?

minrk · 2018-03-02T12:02:54Z

You can edit the jupyterhub command with:

kubectl edit deployment hub

and change the command that looks like:

      - command:
        - jupyterhub
        - --config
        - /srv/jupyterhub_config.py
        - --upgrade-db

to

      - command:
        - sh
        - -c
        - while true; do sleep 10; done

This will create a new hub pod with the new command, which you can kubectl exec -it into.

yuandongfang · 2018-04-10T04:15:15Z

did you got this solution? i alse have this problem.who can help me? thanks for all of you.

Name: hub-86d676cf88-jw8ws
Namespace: jupyterhubtest
Node: 192.168.0.5/192.168.0.5
Start Time: Tue, 10 Apr 2018 11:34:12 +0800
Labels: app=jupyterhub
component=hub
heritage=Tiller
name=hub
pod-template-hash=4282327944
release=jupyterhubfork8s
Status: Running
IP: 172.18.0.26
Controllers: ReplicaSet/hub-86d676cf88
Containers:
hub-container:
Container ID: docker://550c1ae33d73c965a87a50bd87f2b87fcafa498f3b4a7e59b807828ef15cea63
Image: jupyterhub/k8s-hub:4b122ad
Image ID: docker-pullable://jupyterhub/k8s-hub@sha256:b1fb9dd9eec9a9aab583addd8f03fd035494681ac224cfaa55126de442eeecd3
Port: 8081/TCP
Command:
jupyterhub
--config
/srv/jupyterhub_config.py
Requests:
cpu: 200m
memory: 512Mi
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 10 Apr 2018 12:06:03 +0800
Finished: Tue, 10 Apr 2018 12:06:04 +0800
Ready: False
Restart Count: 11
Volume Mounts:
/etc/jupyterhub/config/ from config (rw)
/etc/jupyterhub/secret/ from secret (rw)
/var/run/secrets/kubernetes.io/serviceaccount from hub-token-lwmrd (ro)
Environment Variables:
SINGLEUSER_IMAGE: jupyterhub/k8s-singleuser-sample:5d060de
JPY_COOKIE_SECRET: <set to the key 'hub.cookie-secret' in secret 'hub-secret'>
POD_NAMESPACE: jupyterhubtest (v1:metadata.namespace)
CONFIGPROXY_AUTH_TOKEN: <set to the key 'proxy.token' in secret 'hub-secret'>
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: hub-config
secret:
Type: Secret (a volume populated by a Secret)
SecretName: hub-secret
hub-token-lwmrd:
Type: Secret (a volume populated by a Secret)
SecretName: hub-token-lwmrd
QoS Class: Burstable
Tolerations:
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message

32m 32m 1 {default-scheduler } Normal Scheduled Successfully assigned hub-86d676cf88-jw8ws to 192.168.0.5
32m 32m 1 {kubelet 192.168.0.5} Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "config"
32m 32m 1 {kubelet 192.168.0.5} Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "secret"
32m 32m 1 {kubelet 192.168.0.5} Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "hub-token-lwmrd"
32m 32m 1 {kubelet 192.168.0.5} spec.containers{hub-container} Normal Pulling pulling image "jupyterhub/k8s-hub:4b122ad"
32m 32m 1 {kubelet 192.168.0.5} spec.containers{hub-container} Normal Pulled Successfully pulled image "jupyterhub/k8s-hub:4b122ad"
32m 31m 4 {kubelet 192.168.0.5} spec.containers{hub-container} Normal Created Created container
32m 31m 4 {kubelet 192.168.0.5} spec.containers{hub-container} Normal Started Started container
32m 31m 3 {kubelet 192.168.0.5} spec.containers{hub-container} Normal Pulled Container image "jupyterhub/k8s-hub:4b122ad" already present on machine
32m 17m 67 {kubelet 192.168.0.5} spec.containers{hub-container} Warning BackOff Back-off restarting failed container
32m 2m 135 {kubelet 192.168.0.5} Warning FailedSync Error syncing pod

ryanlovett · 2018-07-09T18:00:14Z

I'm seeing this on GKE. We were running v0.6 and tried to upgrade to the latest chart. After some helm failures I reverted to v0.6 but ran into this. I've tried deleting the pods and deployments. I'll do some debugging.

ryanlovett · 2018-07-09T18:11:10Z

There's no curl or wget in the pod. With python3+requests I can confirm the tornado.curl_httpclient.CurlError error that the proxy-api endpoint times out. proxy-public and proxy-http are responsive.

The cluster has:

addonsConfig:
  networkPolicyConfig:
    disabled: true

ryanlovett · 2018-07-09T19:35:22Z

The proxy-api object was referencing a newer version of the helm chart -- one that I had previously tried to upgrade to. I deleted the proxy-api object, then reran my CI to do a helm upgrade and now everything is working.

ryanlovett · 2018-08-03T18:15:18Z

I'm ran into this again on Azure after a helm upgrade. Unlike last time, I couldn't access any of the service endpoints. I have a feeling this last occasion is due to the infrastructure and not z2jh, but I just thought I'd leave a trail marker.

consideRatio · 2018-09-04T06:45:23Z

Hmmm, @ryanlovett wrote:

The proxy-api object was referencing a newer version of the helm chart -- one that I had previously tried to upgrade to. I deleted the proxy-api object, then reran my CI to do a helm upgrade and now everything is working.

Does this mean that our proxy pod did not trigger restart as it should, or that it persisted some faulty state that needed to be refreshed? Ideas on what state that were outdated?

@ryanlovett we have now released 0.7.0, any feedback on your update to that would be very relevant. If you do, just make sure to follow upgrade instructions in the changelog.md file.

diegodorgam · 2019-06-25T01:23:17Z

any thoughs on this?

consideRatio · 2019-09-30T08:18:20Z

I found that these errors happen when the hub and proxy gets an update at the same time. The hub is going to crash if it fails to communicate with the proxy, but realizing the failure happens 20 seconds later and by this time, the hub can be apparently functional. When we bump the JupyterHub version the next time, we will get to use jupyterhub/jupyterhub#2750, it will make the hub pod look stay unavailable until it actually will function reliable.

Perhaps we bump it along with #1422, or earlier.

willingc added the question label Feb 27, 2018

willingc added the configuration label Feb 28, 2018

yuvipanda mentioned this issue Mar 20, 2018

Figure out what Kubernetes installation method we 'support' #593

Closed

vishwesh5 mentioned this issue Aug 5, 2018

Jupyterhub fails to launch server: 500 Internal Server Error #819

Closed

consideRatio mentioned this issue Sep 30, 2019

CI rework - use kind, validate->test->publish, contrib and release rework #1422

Merged

22 tasks

consideRatio closed this as completed in #1422 Oct 17, 2019

Abhi3linku mentioned this issue May 20, 2020

Hub pod is going into CrashLoopBackOff state after deployment #1675

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JupyterHub CrashLoopBackOff #493

JupyterHub CrashLoopBackOff #493

kdubovikov commented Feb 10, 2018 •

edited

Loading

yuvipanda commented Feb 13, 2018

kdubovikov commented Feb 14, 2018

yuvipanda commented Feb 22, 2018

willingc commented Feb 27, 2018

minrk commented Feb 28, 2018

kdubovikov commented Mar 1, 2018

minrk commented Mar 2, 2018

yuandongfang commented Apr 10, 2018

ryanlovett commented Jul 9, 2018

ryanlovett commented Jul 9, 2018 •

edited

Loading

ryanlovett commented Jul 9, 2018

ryanlovett commented Aug 3, 2018

consideRatio commented Sep 4, 2018

diegodorgam commented Jun 25, 2019

consideRatio commented Sep 30, 2019 •

edited

Loading

JupyterHub CrashLoopBackOff #493

JupyterHub CrashLoopBackOff #493

Comments

kdubovikov commented Feb 10, 2018 • edited Loading

Contents of config.yaml

yuvipanda commented Feb 13, 2018

kdubovikov commented Feb 14, 2018

yuvipanda commented Feb 22, 2018

willingc commented Feb 27, 2018

minrk commented Feb 28, 2018

kdubovikov commented Mar 1, 2018

minrk commented Mar 2, 2018

yuandongfang commented Apr 10, 2018

ryanlovett commented Jul 9, 2018

ryanlovett commented Jul 9, 2018 • edited Loading

ryanlovett commented Jul 9, 2018

ryanlovett commented Aug 3, 2018

consideRatio commented Sep 4, 2018

diegodorgam commented Jun 25, 2019

consideRatio commented Sep 30, 2019 • edited Loading

kdubovikov commented Feb 10, 2018 •

edited

Loading

Contents of `config.yaml`

ryanlovett commented Jul 9, 2018 •

edited

Loading

consideRatio commented Sep 30, 2019 •

edited

Loading