Consumer Timeout Configuration Per Queue #4099

kycrow32 · 2022-02-03T15:21:04Z

The introduction of consumer_timeout in rabbitmq 3.8.15 was a breaking change in a patch release, requiring global configuration to reenable functionality for consumers that were expecting to execute long running jobs.

RabbitMQ should support consumer timeout settings at the queue level.

This will allow RabbitMQ to provide the same protection against most faulty consumers while allowing long running consumers to continue to function correctly.

michaelklishin · 2022-02-03T15:39:16Z

Protecting against "some" consumers when the risk is running out of disk space is not particularly useful.

99% of consumers are not affected by the timeout (which defaults to 30m now), so the goal was to have an "upper bound" and not a whole slew of features most of which won't be used. So this won't be a priority any time soon.

EmielBruijntjes · 2022-02-10T11:58:01Z

@michaelklishin

I disagree. The ability to have messages redelivered when a connection is lost (for example because of a crash) is why people use queueing software. Plus the ability to ask for the next message when you're done with the previous one. This is exactly what a queueing platform should provide.

I understand from your arguments in some of the discussions here on github that for most real world use cases 30 minutes is a reasonable indication that a consumer is broken. But not for all production environments. This is therefore also a discussion about what is the conceptually right AMQP API to offer. There are many scenarios where 30 minutes is way too early. Think of applications that do (for example) background processing, run batch jobs or analyze data. Such jobs can take hours if not days, and they rely on the queueing platform to redeliver the message in case of a crash, and to hand out the next piece of data once the previous one have been processed. The requirement to ack a message within 30 minutes as a system wide default is wrong.

Besides that, if there should be such a limit (and I think that there should be not: a broken "user space" application that forgets to ack messages is not a RabbitMQ-problem, and hence does not need to be fixed in the RabbitMQ-codebase, see also: separation-of-concerns) - but if the RabbitMQ-team insists on adding this limit: it should definitely be possible to set and override it on the application level. For example per connection, per channel or even better: per consumer. In many organizations the people who install, run and update RabbitMQ and the teams who write the software are not always the same, and changing a system-wide config file is difficult. A solution on the AMQP protocol level therefore makes a lot of sense.

The 'wontfix' label, and 'issue closed' response seems not to be the right response. This is a very logical and very normal feature request.

kycrow32 · 2022-03-07T13:53:51Z

@EmielBruijntjes
Thank you for that excellent write up.

@michaelklishin
Please reconsider this feature request.

Victor-N-Suadicani · 2022-07-25T08:51:41Z

@michaelklishin

I just ran into a situation where I would like to configure this per consumer in a production environment. We have some fast RPCs but also some very long processing jobs. Having the consumer_timeout be global is not ideal in this situation. It would be really great if this could be configured for each consumer.

lukebakken · 2022-08-03T23:55:50Z

We have some fast RPCs but also some very long processing jobs.

Set the limit higher than any expected job duration.

If that isn't feasible, consider why you think you need to keep a message "checked out" for so long. This is one alternative:

A worker creates its own "work in progress" queue with a message TTL and DLX settings so that if the work does not complete or the worker crashes, the message will eventually be re-queued for work
Your long processing job workers should use basic.get for a message
The worker publishes the message to a dedicated "in progress" queue, and acks the message from the job queue
The worker does its work. When complete, it clears the in-progress queue and fetches another job via basic.get

michaelklishin · 2022-08-04T08:55:14Z

If 30 minutes is not enough, set the limit higher or disable it. You make it sound like there is no solution but there is and there always has been.

The argument that the timeout should be able to survive a connection loss is a strawman one: you conveniently ignore the fact that connection loss requeues all unackowledged deliveries and thus resets the timeout.

There is interest from some members of our team in making this a per-vhost setting.
Making this per-consumer means significant changes to acknowledgement tracking and our team has significantly more important things to worry about, like the schema database replacement that would affect every single user.

Please stop claiming that this is a critical issue without a solution. You can take on the risk of stuck consumers and set the timeout much higher or even disable it. Most users do not need delivery processing times of more than 30 minutes and are not affected at all.

I can see how having different values for virtual hosts would make sense, so that will be investigated. Reworking all acknowledgement tracking for the 1% of users who need or would use it sounds like a very suboptimal use of our small team's time.

I also wonder why no one has taken the time to sponsor or contribute this feature if this is such a big deal. My answer is: it is not big enough.

michaelklishin closed this as completed Feb 3, 2022

michaelklishin added the wontfix label Feb 3, 2022

rabbitmq locked and limited conversation to collaborators Aug 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consumer Timeout Configuration Per Queue #4099

Consumer Timeout Configuration Per Queue #4099

kycrow32 commented Feb 3, 2022

michaelklishin commented Feb 3, 2022

EmielBruijntjes commented Feb 10, 2022

kycrow32 commented Mar 7, 2022

Victor-N-Suadicani commented Jul 25, 2022

lukebakken commented Aug 3, 2022 •

edited

Loading

michaelklishin commented Aug 4, 2022

Consumer Timeout Configuration Per Queue #4099

Consumer Timeout Configuration Per Queue #4099

Comments

kycrow32 commented Feb 3, 2022

michaelklishin commented Feb 3, 2022

EmielBruijntjes commented Feb 10, 2022

kycrow32 commented Mar 7, 2022

Victor-N-Suadicani commented Jul 25, 2022

lukebakken commented Aug 3, 2022 • edited Loading

michaelklishin commented Aug 4, 2022

lukebakken commented Aug 3, 2022 •

edited

Loading