Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to generate alert on error in event hub connector #503

Closed
guruvonline opened this issue Apr 16, 2020 · 3 comments
Closed

How to generate alert on error in event hub connector #503

guruvonline opened this issue Apr 16, 2020 · 3 comments

Comments

@guruvonline
Copy link

Hi,
We are using event hub connector to get event from EH in Azure Databricks. Occasionally we see ReceivedDisconnectedException which are not transient and our jobs are not able to read any data after the exception. We have to restart spark job and then it start processing/reading data.

We have configured for email alerts on job failure. Since i our job there are multiple streaming queries are running, any failure in the EH connector doesn't fail the job and we don't get alert.

Is there any recommendation to get alert when there is any exception in connector.

@nyaghma
Copy link
Contributor

nyaghma commented Aug 7, 2020

@guruvonline Are these multiple streaming queries using the same consumer group? If yes, that's the reason your job gets ReceivedDisconnectedException since we expect a unique consumer group for each reader as it has been discussed on the FAQ page.

@guruvonline
Copy link
Author

No, multiple streaming queries are not reading from same EventHub. The queries are like a processing pipeline, where first reads from EH process and write to Sink(Delta lake table), next query/stage read from first sink and write to its's sync and so on.

@nyaghma
Copy link
Contributor

nyaghma commented Jun 24, 2021

The ReceivedDisconnectedException is expected when a new instance of epoch receiver for the same partition-consumerGroup combo is being created. This could happen in several situations, one of those is when the job has a low locality level and tasks for the same partition in different batches are being dispatched to different executor nodes.
The ReceivedDisconnectedException just indicates that a new receiver for a partition-consumerGroup has been created and the job should continue reading events after this exception if only one task at a time is trying to read from the partition-consumerGroup combo. However, in a case where multiple tasks are competing on creating receivers then this could result in not making progress on reading events. You can find more information in this document.

@nyaghma nyaghma closed this as completed Jun 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants