Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workers/gthread: Remove locks + one event queue + general cleanup #3157

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sylt
Copy link

@sylt sylt commented Feb 17, 2024

The main purpose is to remove complexity from gthread by:

  • Removing the lock for handling self._keep and self.poller. This is possible since we now do all such manipulation on the main thread instead. When a connection is done, it posts a callback through the PollableMethodCaller which gets executed on the main thread.

  • Having a single event queue (self.poller), as opposed to also managing a set of futures. This fixes Gunicorn gthread deadlock #3146 (although there are more minimal ways of doing it).

There are other more minor things as well:

  • Renaming some variables, e.g. self._keep to self.keepalived_conns.
  • Remove self-explanatory comments (what the code does, not why).
  • Remove fiddling with connection socket block/not blocked.

Some complexity has been added to the shutdown sequence, but hopefully for good reason: it's to make sure that all already accepted connections are served within the grace period.

@sylt sylt force-pushed the gthread-cleanup branch 2 times, most recently from 17e7abf to 04243f3 Compare February 18, 2024 17:24
@benoitc benoitc self-assigned this Mar 26, 2024
gunicorn/workers/gthread.py Outdated Show resolved Hide resolved
Copy link

@javiertejero javiertejero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried ab -n1000 -c100 http://0.0.0.0:8080/ with my standalone script (as explained here #3147) and it does not work well, so rejecting from my side

@sylt
Copy link
Author

sylt commented Jul 2, 2024

If it is so, then that doesn't sound good @javiertejero -- I would have thought the issue to be fixed! I can't reproduce any hanging behavior with while ab -n1000 -c100 http://0.0.0.0:8080/; do :; done running for a few minutes with the suggested pull request. However, on master I have no problem getting the error.

I'm testing the standalone app using the options:

    options = {
        'bind': '%s:%s' % ('127.0.0.1', '8080'),
        'workers': 1,
        'worker_class': 'gthread',
        'threads': 3,
        'worker_connections': 4,
    }

But now I have tested using Linux. Are you running MacOS?

@javiertejero
Copy link

javiertejero commented Jul 2, 2024

@sylt oh yes, apologies, I was testing with MacOS

when trying on linux it works just fine as you mentioned with a huge concurrency :)

however in MacOS I see this with "just" 100 concurrent clients

 ab -n100 -c100 http://0.0.0.0:8080/
This is ApacheBench, Version 2.3 <$Revision: 1903618 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 0.0.0.0 (be patient)...apr_socket_recv: Connection reset by peer (54)
Total of 1 requests completed

on the server I see this:

[2024-07-02 17:09:20 +0200] [45416] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/.../gunicorn/gunicorn/arbiter.py", line 609, in spawn_worker
    worker.init_process()
  File "/.../gunicorn/gunicorn/workers/gthread.py", line 115, in init_process
    super().init_process()
  File "/.../gunicorn/gunicorn/workers/base.py", line 142, in init_process
    self.run()
  File "/.../gunicorn/gunicorn/workers/gthread.py", line 205, in run
    self.set_accept_enabled(new_connections_still_accepted)
  File "/.../gunicorn/gunicorn/workers/gthread.py", line 133, in set_accept_enabled
    method(sock, event, self.accept)
  File ".../.pyenv/versions/3.11.7/lib/python3.11/selectors.py", line 261, in modify
    key = self.register(fileobj, events, data)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../.pyenv/versions/3.11.7/lib/python3.11/selectors.py", line 518, in register
    key = super().register(fileobj, events, data)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../.pyenv/versions/3.11.7/lib/python3.11/selectors.py", line 236, in register
    raise ValueError("Invalid events: {!r}".format(events))
ValueError: Invalid events: 0

UPDATE: with less concurrency like 10 it works fine

let me know if I can help further with this

@sylt
Copy link
Author

sylt commented Jul 3, 2024

Thanks for confirming @javiertejero ! Based on your stack trace (thank you very much!) I think I have corrected the error. If you would have the ability to test the latest patch set, I would be more than grateful!

The error was my usage of DefaultSelector().register, which has different behavior depending on if one is running Linux (which will use the EpollSelector) or Mac (which uses KqueueSelector). The former accepts to have events as set as "0" (meaning no events), while the latter doesn't.

The fix is to try and not be smart, but simply use DefaultSelector().unregister when we're not interested in accepting new connections any more :) This way, we won't rely on any implementation specific behavior. Perhaps even the code got a bit clearer...

@pajod
Copy link
Contributor

pajod commented Jul 3, 2024

@sylt I don't think we are allowed to call DefaultSelector.unregister twice in a row (1. bottom of loop 2. immediately after)

@sylt
Copy link
Author

sylt commented Jul 3, 2024

@pajod Ah, yes, you're right! We could get in that situation if the server is terminated right at the time when it's very busy. I've tried to address it in the latest patch. Thanks!

@pajod
Copy link
Contributor

pajod commented Aug 14, 2024

Sorry, does not merge cleanly because of my #3189 - fix is to apply those changes like this:

diff --git a/gunicorn/workers/gthread.py b/gunicorn/workers/gthread.py
index 49946d77..196759b8 100644
--- a/gunicorn/workers/gthread.py
+++ b/gunicorn/workers/gthread.py
@@ -1 +0,0 @@
-# -*- coding: utf-8 -
@@ -34 +33 @@ from ..http import wsgi
-class TConn(object):
+class TConn:
@@ -145 +144 @@ class ThreadWorker(base.Worker):
-        except EnvironmentError as e:
+        except OSError as e:
@@ -264 +263 @@ class ThreadWorker(base.Worker):
-        except EnvironmentError as e:
+        except OSError as e:
@@ -318 +317 @@ class ThreadWorker(base.Worker):
-        except EnvironmentError:
+        except OSError:
@@ -329 +328 @@ class ThreadWorker(base.Worker):
-                except EnvironmentError:
+                except OSError:

The main purpose is to remove complexity from gthread by:

* Removing the lock for handling self._keep and self.poller. This is
  possible since we now do all such manipulation on the main thread
  instead. When a connection is done, it posts a callback through the
  PollableMethodCaller which gets executed on the main thread.

* Having a single event queue (self.poller), as opposed to also
  managing a set of futures. This fixes benoitc#3146 (although there are
  more minimal ways of doing it).

There are other more minor things as well:

* Renaming some variables, e.g. self._keep to self.keepalived_conns.
* Remove self-explanatory comments (what the code does, not why).
* Just decide that socket is blocking.
* Use time.monotonic() for timeouts in gthread.

Some complexity has been added to the shutdown sequence, but hopefully
for good reason: it's to make sure that all already accepted
connections are served within the grace period.
@javiertejero
Copy link

I think I have corrected the error

sorry for the delay, just tested in macosx again and it works now, thanks @sylt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Gunicorn gthread deadlock
5 participants