Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus metrics blocks tornado main thread #123

Closed
dleen opened this issue Apr 6, 2022 · 4 comments
Closed

Prometheus metrics blocks tornado main thread #123

dleen opened this issue Apr 6, 2022 · 4 comments
Labels

Comments

@dleen
Copy link
Contributor

dleen commented Apr 6, 2022

Description

A bug was reported in jtpio/jupyterlab-system-monitor#87 about the UI lagging with several kernels running. The issue was traced to the system monitor extension as disabling that extension while keeping the same load on the system made the UI issue go away.

Reproduce

Create multiple notebooks with contents:

import time

i = 0
while True:
    print(f"i={i}")
    i += 1
    time.sleep(1)
Run 4+ kernels all executing this cell.

Open a terminal and (hopefully your key repeat speed is high enough) hold down a character e.g. "x" to get continuous input into the terminal. This should be very smooth, you should see characters appearing rapidly and without pause.

Now relaunch the server with --ResourceUseDisplay.track_cpu_percent=True.

Repeat the process. While holding down a key in the terminal you will notice frequent lags and pauses.

Expected behavior

The UI does not lag with the extension enabled.

Problem

The API handler does the right thing by running the call to psutil on a separate thread: https://github.com/jupyter-server/jupyter-resource-usage/blob/master/jupyter_resource_usage/api.py#L66

However the prometheus metrics uses a different implementation (why?) and does the same expensive operation on the main tornado thread which blocks other calls: https://github.com/jupyter-server/jupyter-resource-usage/blob/master/jupyter_resource_usage/metrics.py#L40

You can prove this is the root cause by simply disabling this and the following lines: https://github.com/jupyter-server/jupyter-resource-usage/blob/master/jupyter_resource_usage/server_extension.py#L22

When this callback is removed the UI no longer lags every second.

@dleen dleen added the bug label Apr 6, 2022
@welcome
Copy link

welcome bot commented Apr 6, 2022

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

dleen added a commit to dleen/jupyter-resource-usage that referenced this issue Apr 6, 2022
@dleen
Copy link
Contributor Author

dleen commented Apr 6, 2022

Posted PR #124 which obviously doesn't solve the underlying issue but at least lets users who aren't using prometheus to avoid the issue.

A good solution is to also run psutil on a separate thread for Prometheus.

The best solution is two merge the code for getting metrics. For example you could have the API handler use the most recent entry in the prometheus list of metrics

@dleen dleen changed the title Prometheus metrics block tornado main thread Prometheus metrics blocks tornado main thread Apr 7, 2022
@possiblyMikeB
Copy link

possiblyMikeB commented Aug 14, 2022

This turned out to be an issue for my deployments. To remedy the problem we ended up completely removing the Prometheus callback by commenting out the appropriate section in server_extension.py

The periodic Prometheus handler caused the introduction of "skipping" & "lag" while using jupyter-server-proxy to connect to a VNC server via a proxied websocket; making it completely unusable.

@daschnerm
Copy link

We've encountered this issue as well. Maybe we can get @dleen's PR merged for now, until there is a better solution.

dleen added a commit to dleen/jupyter-resource-usage that referenced this issue Aug 22, 2022
dleen added a commit to dleen/jupyter-resource-usage that referenced this issue Aug 22, 2022
dleen added a commit to dleen/jupyter-resource-usage that referenced this issue Aug 22, 2022
jtpio pushed a commit to dleen/jupyter-resource-usage that referenced this issue Aug 23, 2022
kevin-bates added a commit that referenced this issue Aug 23, 2022
* Allow users to opt out of prometheus metrics

Poor mans fix for #123

* Update README.md

Co-authored-by: Kevin Bates <kbates4@gmail.com>

* Lint README.md

Co-authored-by: Kevin Bates <kbates4@gmail.com>
Co-authored-by: Jeremy Tuloup <jeremy.tuloup@gmail.com>
@dleen dleen closed this as completed Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants