Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZHA Watchdog heartbeat timeout after upgrading to latest HA, making ZHA invalid and all devices unavailable #115927

Open
HomeAssistantPim opened this issue Apr 21, 2024 · 11 comments

Comments

@HomeAssistantPim
Copy link

The problem

I upgraded to the latest version and now ZHA crashes frequently.
Not sure what was in the upgrade, but this is a serious degradation, I'm even considering a rollback now.
The logging states:

2024-04-21 06:49:02.345 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: TimeoutError()
2024-04-21 06:49:05.557 ERROR (bellows.thread_0) [bellows.uart] Lost serial connection: ConnectionResetError('Failed to transmit ASH frame after 4 retries')
2024-04-21 06:49:05.564 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2024-04-21 06:49:07.324 WARNING (bellows.thread_0) [homeassistant.util.executor] Thread[SyncWorker_0] is still running at shutdown: File "/usr/local/lib/python3.12/threading.py", line 1030, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 92, in _worker
    work_item.run()
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.12/site-packages/serial/serialposix.py", line 673, in flush
    termios.tcdrain(self.fd)
2024-04-21 06:49:08.150 WARNING (bellows.thread_0) [homeassistant.util.executor] Thread[SyncWorker_0] is still running at shutdown: File "/usr/local/lib/python3.12/threading.py", line 1030, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 92, in _worker
    work_item.run()
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.12/site-packages/serial/serialposix.py", line 673, in flush
    termios.tcdrain(self.fd)

Unfortunately the only way I found to fix this is to restart HA.
I will look to find a way to reload ZHA through an automation, but haven't found it yet.
I will keep searching for this as this bug makes all my zigbee devices and therefor HA unusable

What version of Home Assistant Core has the issue?

2024.4.3

What was the last working version of Home Assistant Core?

2024.3.3

What type of installation are you running?

Home Assistant Supervised

Integration causing the issue

Zigbee Home Automation

Link to integration documentation on our website

https://www.home-assistant.io/integrations/zha/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

2024-04-21 06:49:02.345 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: TimeoutError()
2024-04-21 06:49:05.557 ERROR (bellows.thread_0) [bellows.uart] Lost serial connection: ConnectionResetError('Failed to transmit ASH frame after 4 retries')
2024-04-21 06:49:05.564 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2024-04-21 06:49:07.324 WARNING (bellows.thread_0) [homeassistant.util.executor] Thread[SyncWorker_0] is still running at shutdown: File "/usr/local/lib/python3.12/threading.py", line 1030, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 92, in _worker
    work_item.run()
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.12/site-packages/serial/serialposix.py", line 673, in flush
    termios.tcdrain(self.fd)
2024-04-21 06:49:08.150 WARNING (bellows.thread_0) [homeassistant.util.executor] Thread[SyncWorker_0] is still running at shutdown: File "/usr/local/lib/python3.12/threading.py", line 1030, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 92, in _worker
    work_item.run()
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.12/site-packages/serial/serialposix.py", line 673, in flush
    termios.tcdrain(self.fd)

Additional information

No response

@home-assistant
Copy link

Hey there @dmulcahey, @Adminiuga, @puddly, @TheJulianJES, mind taking a look at this issue as it has been labeled with an integration (zha) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of zha can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign zha Removes the current integration label and assignees on the issue, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


zha documentation
zha source
(message by IssueLinks)

@puddly
Copy link
Contributor

puddly commented Apr 21, 2024

Please fill out the issue template completely. Include diagnostics information and a full debug log.

@HomeAssistantPim
Copy link
Author

@puddly I'm not sure what you mean by diagnostics information, but attach here is debug logging during initialisation.
What standsout to me is the occurrences stating:
2024-04-21 20:15:02.104 DEBUG (MainThread) [zigpy.quirks] Fail because input cluster mismatch on at least one endpoint

home-assistant.log

@HomeAssistantPim
Copy link
Author

I downgraded to 2024.3.3 and so far so good.
Will keep you posted.

@HomeAssistantPim
Copy link
Author

@puddly my HA is still running well since the downgrade.
In the logging no watchdog timeouts, nor initialisation problems.
I noticed a pull request targetting 2024.4.0 that would set new id's for all zigbee devices.
It seems related as my logging posted earlier mentions a mismatch in endpoint id's, could this change have caused this?

@dmulcahey
Copy link
Contributor

What PR are you talking about?

@HomeAssistantPim
Copy link
Author

This one although I'm not sure if it's actually merged into some 2024.4.x:
#112459

@dmulcahey
Copy link
Contributor

Not merged and unrelated

@HomeAssistantPim
Copy link
Author

HomeAssistantPim commented Apr 23, 2024

Ok, so currently I downgraded to 2024.3.3 and zha didn't have issues since.
No watchdog timeout, no endpoint mismatch issues as shown in the logging I shared when running on 2024.4.3.
Only thing I could do, is upgrade to 2024.4.3 again to see if the issues will reoccur.

@mdeletr2
Copy link

I also have same problem after last update.
ZHA stops and trying to reconfigure......
If i restart HA it works again, happened atleast 3 times since last update

@halukanlar
Copy link

halukanlar commented Apr 26, 2024

I also had a problem after updating. Restart solved it for now. We'll see if it lasts...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants