Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undertow worker pool thread leak caused by ajax fallback requests #409

Closed
wants to merge 2 commits into from

Conversation

kajism
Copy link
Contributor

@kajism kajism commented Sep 19, 2022

We were experiencing Undertow worker pool starvation in production. It was caused by ajax requests waiting on never timeouting core.async/<!! channel read. It works better after replacing core.async channel by promise with deref timeout.

Replaces never timeouting core.async channel read by promise with timeout.
@ptaoussanis ptaoussanis self-requested a review September 19, 2022 10:14
@ptaoussanis ptaoussanis self-assigned this Sep 19, 2022
@ptaoussanis
Copy link
Member

@kajism Hi Karel, thanks for pinging about this- and for the PR. Will add some specific comments shortly.

Copy link
Member

@ptaoussanis ptaoussanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline comments, thanks!

@@ -1,8 +1,8 @@
(ns taoensso.sente.server-adapters.undertow
"Sente server adapter for ring-undertow-adapter."
"Sente server adapter for ring-undertow-adapter.
Modified to avoid core.async and use promise directly and with read timeout"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the docstring change here.
The mention of core.async seems like an implementation detail, and the read timeout we can better document elsewhere (e.g. in the constructor).

@@ -13,6 +13,8 @@
WebSocketConnectionCallback
WebSocketProtocolHandshakeHandler]))

(def ajax-response-timeout-ms (* 60 1000))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would suggest instead adding this as an option to get-sch-adapter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relatedly: please add a docstring to get-sch-adapter to document the 2x new options:

  • :ajax-response-timeout-ms (and mention its default)
  • :ajax-response-timeout-val (and mention its default)

get-sch-adapter should accept zero args (use default opts), or a single map arg of opts.

Ideally the PR would have 2x commits:

  • 1st commit to introduce the new options, but keep the old default behaviour (non-breaking).
  • 2nd commit to change the default, which we can mark as potentially breaking.

(when on-close (on-close ch false nil))
(reset! open?_ false)
(async/close! ch))
(when @open?_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use (when (compare-and-set! open?_ false) ...) here instead

@@ -40,24 +42,28 @@
(read! [this])
(close! [this]))

(deftype SenteUndertowAjaxChannel [ch open?_ on-close]
(deftype SenteUndertowAjaxChannel [promised-response open?_ on-close]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To match the surrounding code style, please suffix promised-response with an _ to indicate a dereffable.

(read! [this] (async/<!! ch))
(send! [this msg] (deliver promised-response msg))
(read! [this]
(let [resp (deref promised-response ajax-response-timeout-ms :lp-timed-out)]
Copy link
Member

@ptaoussanis ptaoussanis Sep 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two changes requested here:

  1. For compatibility with the old behaviour, please make the timeout optional.
  2. Suggest the timeout value be configurable via get-sch-adapter option. Maybe a default value of :undertow/ajax-response-timeout? (ajax-response-timeout to better match the :ajax-response-timeout-ms option name, and the undertow prefix to help clarify that this is an adapter-specific keyword).

@kajism
Copy link
Contributor Author

kajism commented Sep 20, 2022

@ptaoussanis Hello Peter, I have commited the backward compatible suggested changes and will verify it in production.

@ptaoussanis
Copy link
Member

ptaoussanis commented Sep 20, 2022

@kajism Thanks! Content of your PR has been merged on 2022-09-20-pr-409 branch and pushed to [com.taoensso/sente "1.18.0-SNAPSHOT"] on Clojars.

I don't use Undertow, so it'd be helpful to get your confirmation that this version:

  1. Solves the problem
  2. Still works correctly otherwise

Once I've got your confirmation, I'll merge with master.

Cheers

@ptaoussanis ptaoussanis added this to the v1.18 milestone Sep 20, 2022
@kajism
Copy link
Contributor Author

kajism commented Sep 21, 2022

@ptaoussanis Peter, there seems to be a problem on line 44. I don't see a 3 arity version of clojure.core/deliver ...

@kajism
Copy link
Contributor Author

kajism commented Sep 21, 2022

@ptaoussanis We have moved to Undertow from http-kit after experiencing huge memory leaks in allocated direct memory buffers. After a few days direct memory went up to the max heap size (4GB) and then some short videos on our page stopped playing. The problem is that direct memory is not allocated on the heap, so there was 4GB of heap + 4GB in those buffers.

At first, it was hard to find the the problem, because almost everybody talks only about the heap when speaking about JVM memory. But in the system the number of memory was almost 2 times higher. Also jconsole don't show this memory. It can be found using JMX beans (java.nio:name=direct,type=BufferPool MemoryUsed).

We are sending quite huge ws messages (sometimes more than 1MB), so I was suspecting this may be the reason. After move to Undertow this problem disappeared. In our case, Undertow allocates only up to 90MB in direct memory buffers. But then we experienced the worker thread starvation...

I have investigated this further and created this http-kit issue: http-kit/http-kit#496

@ptaoussanis
Copy link
Member

Peter, there seems to be a problem on line 44. I don't see a 3 arity version of clojure.core/deliver ...

@kajism Apologies, fixed. And updated SNAPSHOT.

@kajism
Copy link
Contributor Author

kajism commented Sep 21, 2022

Peter, there seems to be a problem on line 44. I don't see a 3 arity version of clojure.core/deliver ...

@kajism Apologies, fixed. And updated SNAPSHOT.

Thanks. Locally works fine. Will be deployed in a few days and will confirm then.

@kajism
Copy link
Contributor Author

kajism commented Sep 29, 2022

Works fine. Thanks!

@ptaoussanis
Copy link
Member

Great, appreciate the confirmation 👍 Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants