Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add progress indicators for jupyterhub 0.9 #86

Merged
merged 2 commits into from
Jul 24, 2019

Conversation

rkdarst
Copy link
Contributor

@rkdarst rkdarst commented May 20, 2018

This was my attempt at adding progress indicators for JupyterHub 0.9 ( jupyterhub/jupyterhub#1771), #81 of batchspawner. However, it just doesn't work. In fact, I find no indication it is even using this .progress method at all, which makes me think there is something fundamentally wrong with what I am doing (if I put an exception in it, it doesn't raise it... it always uses the base Spawner .progress method.).

There is some junk in here still: it's artificially slowing down to make it last longer. I've also tried to using async_generator instead of gen.coroutine. It also doesn't do anything useful, since I was just trying to get any sign of it working first, but it can integrate with the spawner estimated start times and so on before this is ready to go.

Any ideas? Any known implementations in other spawners yet yet?

@minrk ?

@@ -353,6 +358,26 @@ def stop(self, now=False):
self.job_id, self.current_ip, self.port)
)

@gen.coroutine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should look like:

@async_generator
async def progress(self):
    while True:
        ...
        await yield_({...})

This can't be a tornado coroutine because those are themselves generators, so they can't tell the difference between a yield that should be treated as an async yield item vs a yield that's actually a breakpoint.

@@ -353,6 +358,26 @@ def stop(self, now=False):
self.job_id, self.current_ip, self.port)
)

@gen.coroutine
def progress(self):
while True:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This iterator needs to be able to stop at some point.

@rkdarst
Copy link
Contributor Author

rkdarst commented May 20, 2018

@minrk, thanks for the reviews. I had tried the async_generator one before, and now have updated it.

But still something is fundamentally wrong. Notice there is a raise in there, and when running I never see this exception happen.

  • Editing other things in the batchspawner.py file I am using does have a noticeable effect.
  • I do see the default implementation of 50%="Spawning server..." when the server is slow enough.
  • This has always been the case as long as I have tried adding .progress.

UPDATE: It's obvious once I think about it deeply enough. I am using ProfileSpawner, and it needs to pass through .progress().

Also, now tests fails on python<3.5, but I'll deal with that later.

@rkdarst
Copy link
Contributor Author

rkdarst commented May 21, 2018

OK, with jupyterhub/wrapspawner#21, this minimally works now!

  • Fixed the problems above
  • Does the loop need to await yield_(gen.sleep(.1)) (and this is tornado, which is probably wrong I guess)?
  • Preferably, I would like to maintain compatibility with python<3.5 (HPC is rather slow to change...). I will go and find out what jupyterhub did and copy that here.
  • Does the progress indicator only start being show after .start() returns? With my current implementation, I see that this always begins at 66%, never less. I see a 10-15 second delay, then it jumps straight to 66%. The batchspawner .start() method takes quite a while, because the job has to start running, and the find the node and IP before it returns.

@minrk, any ideas? + I'd be happy for a critical review of this whole thing and the wrapspawner thing, since I am a bit at the limits of my expertise.

@rkdarst rkdarst force-pushed the progress branch 2 times, most recently from 8126666 to 4561139 Compare May 21, 2018 19:17
@mbmilligan
Copy link
Member

Cool, I do think this would be a useful feature to have. Since this is added in Jupyterhub 0.9, there's no point trying to support this for python<3.5 - your version gate looks correct. Once the basic concept is working, we should think about how to expose this to child classes in an easy-to-override way. For example, we could define the interface such that a child class can have:

  def current_progress(self):
      return self.state_to_progress_dict()

With that in place, the base class implementation would call that overrideable method to get the actual dict to yield, and child class authors just have to add a single function if they don't like the generic implementation.

As an aside, from a user interaction perspective, I would omit the numeric progress field from the generic implementation. The only actual information we have is whether the job has been queued or is already running, and assigning an arbitrary progress meter to that is misleading. I would also put those messge texts in configurable traits, so site admins can customize them to match their language/style guide/etc.

@rkdarst
Copy link
Contributor Author

rkdarst commented May 21, 2018

The new version works on python 3.3+, by using old syntax only and conditioning the progress method on the python version. A bit hackish but effective.

The tests pass, but that's just because it doesn't cover progress yet: it fails in real life for a reason I am about to comment. Unless someone smarter than me knows the answer, this may be all for nothing.

# only supports Python 3.5+. It requires asyncio coroutines.
if sys.version_info >= (3, 5):
@async_generator
@asyncio.coroutine
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combining these two things doesn't seem to work on python3.6 at least. Is there any way to make async_generator work on python 3.5/3.6 without using syntax incompatible with python<3.4?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if using yield_from_ should be used instead of yield from yield_ ref

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No luck... I have a feeling it's something about the interaction of @async_generator and @asyncio.coroutine, but my experience is limited...

This is the actual traceback:

    HTTPServerRequest(protocol='http', host='jupyter.triton.aalto.fi', method='GET', uri='/dev/hub/api/users/darstr1/server/progress', version='HTTP/1.1', remote_ip='::ffff:127.0.0.1')
    Traceback (most recent call last):
      File "/share/apps/jupyterhub/dev/miniconda/lib/python3.6/site-packages/tornado/web.py", line 1543, in _execute
        result = yield result
      File "/share/apps/jupyterhub/dev/jupyterhub/jupyterhub/apihandlers/users.py", line 496, in get
        async for event in events:
      File "/share/apps/jupyterhub/dev/miniconda/lib/python3.6/site-packages/async_generator/_impl.py", line 271, in __anext__
        return await self._do_it(self._it.__next__)
      File "/share/apps/jupyterhub/dev/miniconda/lib/python3.6/site-packages/async_generator/_impl.py", line 290, in _do_it
        return await ANextIter(self._it, start_fn, *args)
      File "/share/apps/jupyterhub/dev/miniconda/lib/python3.6/site-packages/async_generator/_impl.py", line 199, in __next__
        return self._invoke(self._it.__next__)
      File "/share/apps/jupyterhub/dev/miniconda/lib/python3.6/site-packages/async_generator/_impl.py", line 209, in _invoke
        result = fn(*args)
      File "/share/apps/jupyterhub/dev/jupyterhub/jupyterhub/utils.py", line 480, in iterate_until
        await yield_(item_future.result())
      File "/share/apps/jupyterhub/dev/miniconda/lib/python3.6/site-packages/async_generator/_impl.py", line 271, in __anext__
        return await self._do_it(self._it.__next__)
      File "/share/apps/jupyterhub/dev/miniconda/lib/python3.6/site-packages/async_generator/_impl.py", line 290, in _do_it
        return await ANextIter(self._it, start_fn, *args)
      File "/share/apps/jupyterhub/dev/miniconda/lib/python3.6/site-packages/async_generator/_impl.py", line 197, in __next__
        return self._invoke(first_fn, *first_args)
      File "/share/apps/jupyterhub/dev/miniconda/lib/python3.6/site-packages/async_generator/_impl.py", line 209, in _invoke
        result = fn(*args)
      File "/share/apps/jupyterhub/dev/jupyterhub/jupyterhub/spawner.py", line 733, in _generate_progress
        await yield_(event)
      File "/share/apps/jupyterhub/dev/miniconda/lib/python3.6/site-packages/async_generator/_util.py", line 14, in __aexit__
        await self._aiter.aclose()
    AttributeError: 'generator' object has no attribute 'aclose'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 3.3 hit end-of-support last year, so I think new releases of everything should drop support for 3.3, if that helps the syntax support issues.

yield from yield_({event})

should be the right syntax for 3.4, I think.

@rkdarst
Copy link
Contributor Author

rkdarst commented May 21, 2018

@mbmilligan, I agree about the hook for child spawners. As for the numerical progress percent, this is they way it works in jupyterhub and as far as I know some number is required. So I made up something reasonable and relatively meaningless.

Biggest current issue seems to be that progress indicators don't start until .start() has returned, which for batchspawner is the longest part of the process (waiting for cluster job to start). I guess #58 might help with this.

@mbmilligan
Copy link
Member

@rkdarst I'm going by the docstring for the progress function, which claims that if the numeric progress is omitted, the message will simply be displayed:
https://github.com/jupyterhub/jupyterhub/blob/b8b57843a6e2f8a908d9112cea84c18005b116d1/jupyterhub/spawner.py#L730-L738

Regarding not rendering anything until start() returns - that seems like it would be an issue for any non-local spawners trying to use this feature, it can't return until the ip/port of the connection are known. I'll dig a bit into the Jupyterhub code and investigate if that's really how it works now.

@minrk
Copy link
Member

minrk commented May 24, 2018

It should be able to retrieve progress while start is pending (that's how I'm testing locally), but this does require that start is properly async.

@rkdarst
Copy link
Contributor Author

rkdarst commented Jun 20, 2018

It's been several weeks and I have taken a lot of time to learn about async async_generator, and so on. So I can hopefully use more precise vocabulary now.

  • async_generator does not work with tornado coroutines.

  • I can make the progress bar work with async def progress, @async_generator, and one of your earlier suggestions. But this is source-level incompatible with python 3.4.

  • I tried swapping @async_generator \n async def -> @async_genarotor \n @asyncio.coroutine \n def and await yield_() -> yield from yield_() and it does not work:

    AttributeError: 'generator' object has no attribute '__await__'

I see at least one person asking the same thing on stackoverflow, but no answer.

The chances of mistakes by me are smaller than before (but still significant), so, it looks like we might have to wait until we drop 3.4 to take this into use.

- Uses JupyterHub 0.9 feature, no effect for <0.9.
- Closes: jupyterhub#81
@mbmilligan mbmilligan merged commit ffd3f93 into jupyterhub:master Jul 24, 2019
mbmilligan added a commit that referenced this pull request Jul 24, 2019
@rkdarst rkdarst deleted the progress branch September 6, 2019 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants