Scalability - support for read from a specific partition #314

ytaous · 2018-05-04T17:44:57Z

follow up on #220

We need this feature to scale. Here's valid case: there's limit for each partition on each consumer group. If there are more than 5 concurrent jobs in a spark app, we are getting the exception - "Exceeded the maximum number of allowed receivers per partition in a consumer group which is 5."
Since we can't add more concurrent jobs, in order to perform faster, the solution is to create more consumer groups, and create multiple spark apps, each would hit different consumer group on specific partition(s). For example, a 32 partitions of Eventhub (1 default consumer group) read by 5 concurrent spark jobs would be much slower than 16 consumer groups, each assigned 2 partitions and read by 1 Spark app. So we can have 16 Spark apps, each has 5 concurrent jobs - total would be 80 concurrent jobs. 16x faster !!

sabeegrewal · 2018-05-14T22:27:23Z

Hey @ytaous - we talked offline and agreed a more formal proposal is in order. Feel free to share that here!

If you'd like to continue talking about this offline, then feel free to close this issue 👍

ytaous · 2018-05-16T00:24:05Z

let's talk offline, thanks.

ytaous closed this as completed May 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scalability - support for read from a specific partition #314

Scalability - support for read from a specific partition #314

ytaous commented May 4, 2018

sabeegrewal commented May 14, 2018

ytaous commented May 16, 2018

Scalability - support for read from a specific partition #314

Scalability - support for read from a specific partition #314

Comments

ytaous commented May 4, 2018

sabeegrewal commented May 14, 2018

ytaous commented May 16, 2018