-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consumer.consume() stops returning messages #970
Comments
// The consumer (or a few in the group) will loop here without fetchers this way until one of them is killed to trigger a rebalance. |
After some further digging into Librdkafka open issues, this seems to be the exact same problem as observed here confluentinc/librdkafka#2933 I see that it's part of milestone 1.5.2. Do you have an expected release date for this fix/when the fix will be bundled into confluent-kafka-python? |
Sorry for missing this issue, great report! The v1.6.0 release is due this week. |
Description
Observed: Consumer (or two) within Group of 50 consumers stops receiving messages on Consume from assigned partitions.
Temporary Fix: Terminating a consumer (which restarts, triggering a rebalance) reassigns the partitions to another consumer and the partition lag is resolved as the new consumer has no issues receiving messages from the newly assigned partitions. The time between occurrences is not deterministic. It happens most times on an event that rolls all consumers, and occasionally on a random rebalance.
Notes:
How to reproduce
I am unsure about the process to reproduce here.
I am currently working on the details around this issue to find the concrete case that triggers this behavior.This seems to occur when multiple rebalances are triggered subsequently, and occasionally on a single rebalance.Upon further investigation, this can be reproduced easily by triggering a rebalance while a consumer is processing a record. This record will be processed, and subsequently the consumer will attempt to commit the message for a partition it no longer is assigned. This stops the fetchers from being activated for the new partitions thus creating a nil loop where the partition lag will build even though it has an assigned consumer.
Checklist
Please provide the following information:
confluent_kafka.version()
andconfluent_kafka.libversion()
):{...}
'debug': '..'
as necessary)confluent-kafka-python version: 1.5.0
librdkafka version: 1.5.0
broker-version: 2.2.1
Operating System: Linux, running Python 3.6.2-stretch
Consumer Config: {
'bootstrap.servers': 'my_boostrap_servers',
'group.id': 'my_unique_group',
'auto.offset.reset': 'earliest',
'max.poll.interval.ms': 300000,
'enable.auto.commit': False,
'client.id': str(uuid.uuid4()),
'max.partition.fetch.bytes': 104857600,
'debug': 'cgrp'
}
CGRP Logs in following message.
The text was updated successfully, but these errors were encountered: