Proxy startup and configuration is required for init_spawners, right? #2749

consideRatio · 2019-09-26T11:50:09Z

I had a faulty assumption, this is not an issue.

If the hub starts up with a state about running user pods, they will have their individual Spawner objects initialized (init_spawners) again during startup of jupyterhub. As part of this, they will be probed for a life sign, and if they fail to respond they will be deleted. This user lifesign probe is relying on a proxy is available and configured. And here is the crux, can we be confident we have configured the proxy? I don't think so, that happens in check_routes but that is called after the init_spawners function unless a configurable amount of time passes, because then it will be called earlier in the end of the start function...

That was a discussion of configuring the proxy before init_spawners verifications are run, but what if the proxy isn't even started? Well, then its good that it becomes started by the start function which can run after the configurable timeout is reached...

The JupyterHub startup phase

async launch_instance_async
- await: async initialize
  - semi await: async init_spawners for a configurable max duration of 10 seconds, then continue.
- await: async start
  - await: (optional) start of proxy
  - await: async check_routes

After #2750 this structure described will change for the better, but there is still something to consider I think. What dynamics are caused by init_spawners_timeout duration in conjunction with the patience the checks triggered by init_spawners have, which I think is represented by the http_timeout in the Spawner baseclass.

minrk · 2019-09-26T12:21:44Z

This is the key misunderstanding:

This user lifesign probe is relying on a proxy is available and configured

The Hub probing servers in init_spawners does not involve the proxy at all.

init_spawners exclusively verifies the Hub's internal state about which spawners are running and where. Only after init_spawners completes is check_routes called, which reconciles the internal state of the Hub with the proxy.

The startup phase:

await initialize
- await init_spawners (initializes spawner state, proxy is not consulted)
await start
- await check_routes <- this is the only time the proxy is involved, where hub state is reconciled with the proxy

In JupyterHub 1.0, init_spawners is guaranteed to be complete before the proxy is consulted (or even started, in the default case where the Hub starts the proxy).

#2721 complicated this a couple days ago by allowing init_spawners to be incomplete when that first check_routes is called. Any Spawners that are still waiting on a check are in a 'pending' state, which means whatever their state in the proxy is, will not be modified. To deal with this, check_routes is called again as soon as init_spawners completes.

[There] was a discussion of configuring the proxy before init_spawners verifications are run

where was this? It doesn't really make sense to do that.

consideRatio · 2019-09-26T12:34:14Z

init_spawners exclusively verifies the Hub's internal state about which spawners are running and where.

I believe that init_spawners awaits many calls to check_spawner which will delete users that fail to respond properly.

check_spawner requires the proxy to be configured as it will run checks towards users and kill them if they fail to respond, so if there is no routing available for these checks to succeed that's a problem. This can be seen in the code below.

jupyterhub/jupyterhub/app.py

Lines 1940 to 1951 in 5b13f96

    
           self.log.debug( 
        
               "Verifying that %s is running at %s", spawner._log_name, url 
        
           ) 
        
           try: 
        
               await user._wait_up(spawner) 
        
           except TimeoutError: 
        
               self.log.error( 
        
                   "%s does not appear to be running at %s, shutting it down.", 
        
                   spawner._log_name, 
        
                   url, 
        
               ) 
        
               status = -1

The calls to check_spawner that are depending on the proxy to be configured, is awaited within init_spawners here.

jupyterhub/jupyterhub/app.py

Lines 1974 to 2000 in 5b13f96

    
           # parallelize checks for running Spawners 
        
           check_futures = [] 
        
           for orm_user in db.query(orm.User): 
        
               user = self.users[orm_user] 
        
               self.log.debug("Loading state for %s from db", user.name) 
        
               for name, orm_spawner in user.orm_spawners.items(): 
        
                   if orm_spawner.server is not None: 
        
                       # spawner should be running 
        
                       # instantiate Spawner wrapper and check if it's still alive 
        
                       spawner = user.spawners[name] 
        
                       # signal that check is pending to avoid race conditions 
        
                       spawner._check_pending = True 
        
                       f = asyncio.ensure_future(check_spawner(user, name, spawner)) 
        
                       check_futures.append(f) 
        
           TOTAL_USERS.set(len(self.users)) 
        
           # it's important that we get here before the first await 
        
           # so that we know all spawners are instantiated and in the check-pending state 
        
           # await checks after submitting them all 
        
           if check_futures: 
        
               self.log.debug( 
        
                   "Awaiting checks for %i possibly-running spawners", len(check_futures) 
        
               ) 
        
               await gen.multi(check_futures) 
        
           db.commit()

So, if for example init_spawners were to be awaited on a system where both the proxy and hub restarts at the same time, the proxy will have lost its state, the hub will remember users and initialize them and later kill them, and then configure the proxy.

minrk · 2019-09-26T12:40:15Z

I believe that init_spawners awaits many calls to check_spawner which will delete users that fail to respond properly.

Yes, and rightly so. The proxy is irrelevant, though, because the Hub always talks to spawners directly. No internal component of JupyterHub ever communicates via the proxy. Thinking about what the check is for: this is the check of what servers are running, in order to determine what the proxy should do. The proxy cannot be a requirement for determining what the proxy's routes should be.

if for example init_spawners were to be awaited on a system where both the proxy and hub restarts at the same time, the proxy will have lost its state, the hub will remember users and initialize them and later kill them, and then configure the proxy.

This is what happens very often with jupyterhub upgrades and no users are deleted, because it goes like this:

proxy restarts, state is empty
hub restarts, starts polling user servers
(init_spawners) the user servers that respond are loaded as 'running', the ones that don't are loaded as 'stopped'
(check_routes) the proxy table is retrieved
any routes for servers that are running are added if missing in the proxy table (if the proxy also restarted, this will be everything)
any routes for servers that are not responsive are removed if they are present in the proxy table (this can only happen if the hub restarted and the proxy did not)

minrk · 2019-09-26T12:43:40Z

Take the default jupyterhub configuration, where the Hub starts the proxy with c.JupyterHub.cleanup_servers = False: The proxy is not started until after init_spawners is complete. If this didn't work, we would have a problem a long time ago.

consideRatio · 2019-09-26T13:14:24Z

@minrk ah hmmm so the hub, from within app.py/init_spawners invoking check_spawn invoking user.py/_wait_up speaks directly to the user server, and does not require any routing from the proxy?

Ah...

jupyterhub/jupyterhub/user.py

Lines 670 to 684 in 66f29e0

    
               async def _wait_up(self, spawner): 
        
                   """Wait for a server to finish starting. 
        
                   Shuts the server down if it doesn't respond within 
        
                   spawner.http_timeout. 
        
                   """ 
        
                   server = spawner.server 
        
                   key = self.settings.get('internal_ssl_key') 
        
                   cert = self.settings.get('internal_ssl_cert') 
        
                   ca = self.settings.get('internal_ssl_ca') 
        
                   ssl_context = make_ssl_context(key, cert, cafile=ca) 
        
                   try: 
        
                       resp = await server.wait_up( 
        
                           http=True, timeout=spawner.http_timeout, ssl_context=ssl_context 
        
                       )

minrk · 2019-09-26T13:39:20Z

@consideRatio discussions like this sound like a great occasion for improving architecture docs! We currently have this overview, but it may not be clear who talks directly to whom and when.

This diagram, for instance:

attempts to communicate that we have:

proxy talks to the hub and notebooks
notebooks talk to the hub
hub talks to notebooks (missing!)

Critical points in JupyterHub architecture:

One of the main tasks of the Hub is to ensure the Proxy is routing requests to the right places
The proxy is exclusively for external communication. JupyterHub never uses the proxy for internal communication.
The proxy being down or restarting or slow does not affect JupyterHub's internal function, except for when it is checking the proxy's state itself.

What happens when the Hub is checking if a server is alive? (occurs at startup and as the last stage of spawner start)

Spawner determines URL (e.g. http://host:port) where the server is running
Hub connects directly to this URL
If the Hub successfully connects to this URL, the proxy should route /user/name to this URL

minrk · 2019-09-26T13:41:35Z

speaks directly to the user server, and does not require any routing from the proxy?

Yes, exactly! This is the condition that is required before the Hub will add the route to the proxy. We can't require that it be in the proxy before we decide if it should be added to the proxy.

rkdarst · 2019-09-27T10:09:23Z

@consideRatio discussions like this sound like a great occasion for improving architecture docs! We currently have this overview, but it may not be clear who talks directly to whom and when.

#2726 was my attempt at more, sort of what I learned with technical overview + everything else. Sometime I could make a pass at improving the technical overview too, but how should these two pages relate?

For another "JupyterHub for sysadmins" talk I made a more detailed architecture diagram, [here]
(https://docs.google.com/presentation/d/1Izs1EJJLqNUCqnEblatc59CznnKNz0uA-33T6g6bMf8). I've been meaning to submit it to the JH docs for a while. What do you think? (I now know of a few problems that need fixing in it...)

consideRatio changed the title ~~Proxy startup and configuration before init_spawners - right?~~ Proxy startup and configuration is required for init_spawners, right? Sep 26, 2019

consideRatio mentioned this issue Sep 26, 2019

hub->proxy web-request timebombs - how to not explode? #2745

Closed

consideRatio closed this as completed Sep 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proxy startup and configuration is required for init_spawners, right? #2749

Proxy startup and configuration is required for init_spawners, right? #2749

consideRatio commented Sep 26, 2019 •

edited

Loading

consideRatio commented Sep 26, 2019

minrk commented Sep 26, 2019

consideRatio commented Sep 26, 2019

minrk commented Sep 26, 2019 •

edited

Loading

minrk commented Sep 26, 2019

consideRatio commented Sep 26, 2019

minrk commented Sep 26, 2019

minrk commented Sep 26, 2019

rkdarst commented Sep 27, 2019

Proxy startup and configuration is required for init_spawners, right? #2749

Proxy startup and configuration is required for init_spawners, right? #2749

Comments

consideRatio commented Sep 26, 2019 • edited Loading

The JupyterHub startup phase

Related

consideRatio commented Sep 26, 2019

minrk commented Sep 26, 2019

consideRatio commented Sep 26, 2019

minrk commented Sep 26, 2019 • edited Loading

minrk commented Sep 26, 2019

consideRatio commented Sep 26, 2019

minrk commented Sep 26, 2019

minrk commented Sep 26, 2019

rkdarst commented Sep 27, 2019

consideRatio commented Sep 26, 2019 •

edited

Loading

minrk commented Sep 26, 2019 •

edited

Loading