Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consumer Timeout Configuration Per Queue #4099

Closed
kycrow32 opened this issue Feb 3, 2022 · 6 comments
Closed

Consumer Timeout Configuration Per Queue #4099

kycrow32 opened this issue Feb 3, 2022 · 6 comments
Labels

Comments

@kycrow32
Copy link

kycrow32 commented Feb 3, 2022

References: #2990 3033

The introduction of consumer_timeout in rabbitmq 3.8.15 was a breaking change in a patch release, requiring global configuration to reenable functionality for consumers that were expecting to execute long running jobs.

RabbitMQ should support consumer timeout settings at the queue level.

This will allow RabbitMQ to provide the same protection against most faulty consumers while allowing long running consumers to continue to function correctly.

@michaelklishin
Copy link
Member

Protecting against "some" consumers when the risk is running out of disk space is not particularly useful.

99% of consumers are not affected by the timeout (which defaults to 30m now), so the goal was to have an "upper bound" and not a whole slew of features most of which won't be used. So this won't be a priority any time soon.

@EmielBruijntjes
Copy link

@michaelklishin

I disagree. The ability to have messages redelivered when a connection is lost (for example because of a crash) is why people use queueing software. Plus the ability to ask for the next message when you're done with the previous one. This is exactly what a queueing platform should provide.

I understand from your arguments in some of the discussions here on github that for most real world use cases 30 minutes is a reasonable indication that a consumer is broken. But not for all production environments. This is therefore also a discussion about what is the conceptually right AMQP API to offer. There are many scenarios where 30 minutes is way too early. Think of applications that do (for example) background processing, run batch jobs or analyze data. Such jobs can take hours if not days, and they rely on the queueing platform to redeliver the message in case of a crash, and to hand out the next piece of data once the previous one have been processed. The requirement to ack a message within 30 minutes as a system wide default is wrong.

Besides that, if there should be such a limit (and I think that there should be not: a broken "user space" application that forgets to ack messages is not a RabbitMQ-problem, and hence does not need to be fixed in the RabbitMQ-codebase, see also: separation-of-concerns) - but if the RabbitMQ-team insists on adding this limit: it should definitely be possible to set and override it on the application level. For example per connection, per channel or even better: per consumer. In many organizations the people who install, run and update RabbitMQ and the teams who write the software are not always the same, and changing a system-wide config file is difficult. A solution on the AMQP protocol level therefore makes a lot of sense.

The 'wontfix' label, and 'issue closed' response seems not to be the right response. This is a very logical and very normal feature request.

@kycrow32
Copy link
Author

kycrow32 commented Mar 7, 2022

@EmielBruijntjes
Thank you for that excellent write up.

@michaelklishin
Please reconsider this feature request.

@Victor-N-Suadicani
Copy link

@michaelklishin

I just ran into a situation where I would like to configure this per consumer in a production environment. We have some fast RPCs but also some very long processing jobs. Having the consumer_timeout be global is not ideal in this situation. It would be really great if this could be configured for each consumer.

@lukebakken
Copy link
Collaborator

lukebakken commented Aug 3, 2022

We have some fast RPCs but also some very long processing jobs.

Set the limit higher than any expected job duration.

If that isn't feasible, consider why you think you need to keep a message "checked out" for so long. This is one alternative:

  • A worker creates its own "work in progress" queue with a message TTL and DLX settings so that if the work does not complete or the worker crashes, the message will eventually be re-queued for work
  • Your long processing job workers should use basic.get for a message
  • The worker publishes the message to a dedicated "in progress" queue, and acks the message from the job queue
  • The worker does its work. When complete, it clears the in-progress queue and fetches another job via basic.get

@michaelklishin
Copy link
Member

If 30 minutes is not enough, set the limit higher or disable it. You make it sound like there is no solution but there is and there always has been.

The argument that the timeout should be able to survive a connection loss is a strawman one: you conveniently ignore the fact that connection loss requeues all unackowledged deliveries and thus resets the timeout.

There is interest from some members of our team in making this a per-vhost setting.
Making this per-consumer means significant changes to acknowledgement tracking and our team has significantly more important things to worry about, like the schema database replacement that would affect every single user.

Please stop claiming that this is a critical issue without a solution. You can take on the risk of stuck consumers and set the timeout much higher or even disable it. Most users do not need delivery processing times of more than 30 minutes and are not affected at all.

I can see how having different values for virtual hosts would make sense, so that will be investigated. Reworking all acknowledgement tracking for the 1% of users who need or would use it sounds like a very suboptimal use of our small team's time.

I also wonder why no one has taken the time to sponsor or contribute this feature if this is such a big deal. My answer is: it is not big enough.

@rabbitmq rabbitmq locked and limited conversation to collaborators Aug 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants