Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZHA] Integration randomly stops working, sits in 'initialising' state. (still) #107490

Closed
thefunkygibbon opened this issue Jan 7, 2024 · 49 comments

Comments

@thefunkygibbon
Copy link

The problem

As per previous issue (#105445) I am experiencing my ZHA randomly becoming completely unresponsive and seeing that the integration is sitting "initialising"

What version of Home Assistant Core has the issue?

core-2024.1.2

What was the last working version of Home Assistant Core?

core-2024.1.1

What type of installation are you running?

Home Assistant Container

Integration causing the issue

ZHA

Link to integration documentation on our website

No response

Diagnostics information

config_entry-zha-5fb366dc2478313fb3cb2b29c52254af.json.txt

Example YAML snippet

No response

Anything in the logs that might be useful for us?

[home-assistant_zha_2024-01-07T19-41-14.250Z.log.zip](https://github.com/home-assistant/core/files/13854715/home-assistant_zha_2024-01-07T19-41-14.250Z.log.zip)

Additional information

No response

@home-assistant
Copy link

home-assistant bot commented Jan 7, 2024

Hey there @dmulcahey, @Adminiuga, @puddly, @TheJulianJES, mind taking a look at this issue as it has been labeled with an integration (zha) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of zha can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign zha Removes the current integration label and assignees on the issue, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


zha documentation
zha source
(message by IssueLinks)

@kaciker
Copy link

kaciker commented Jan 8, 2024

Same problem here

image

@cdalexndr
Copy link

cdalexndr commented Jan 8, 2024

Encountered same issue on 2023.12.4
error_log-3.txt

Tried to reconfigure network, reboot, now integration is not showing "initializing" but no zigbee device is working.
Had to re-pair every zigbee device to work again. (possibly due to network reconfig I did when stuck initializing?)

Diagnostics:
zha-b1738f4b724427bec34bcc396a7b0ff4-Zigbee Coordinator-e8e28ba646e99c7f83d95fb82f306659.json.txt
config_entry-zha-b1738f4b724427bec34bcc396a7b0ff4.json.txt

@brylee123
Copy link

brylee123 commented Jan 8, 2024

@cdalexndr you can do a full system reboot Settings > System > Hardware > Advanced options> Reboot system. Or less recommended: unplug it and replug it back in. You do not need to repair (I have not needed to). After rebooting, ZHA should work again. For me, it works anywhere between 3 hours and 24 hours before it needs another reboot.


Here are my logs, I am experiencing the same issue.
home-assistant_2024-01-08T14-00-17.880Z.log

I am running Core 2023.1.2, Supervisor 2023.12.0, OS 11.3, on a Raspberry Pi 4.
(HUSBZB-1) HubZ Smart Home Controller - Standard Com Port, s/n: 1160046D - Silicon Labs


This issue is probably a duplicate of this: #105506

@asayler
Copy link

asayler commented Jan 9, 2024

I'm experiencing this issue as well (after seeing the same issue start in the 2023.12.x releases as documented in #105445 and related tickets). ZHA was stable for me through 2023.11.3, and has not worked well since. I avoided the 2023.12.x releases all together due to these bugs, but updated to 2024.1.1 earlier this week. ZHA worked for a few days, but as of this afternoon, has started falling into the "Initializing..." As of now, I can't get it to recover, even with a full system reboot and power off. I'm using an HA Yellow with the built-in Zigbee radio.

Here are the logs with debugging enabled since the last reboot. ZHA never comes online and stays stuck in initializing:

home-assistant_2024-01-09T06-09-23.858Z.log

I'll likely need to revert to 2023.11.3 again, but I'm not sure how long I can stay on that old of a version. I don't suppose there's been any discussion of reverting ZHA to the 2023.11.3 code base until these issues can be resolved?

@harvindhillon
Copy link

I can confirm the issues of ZHA instability are still present even on the latest. I had a stable 2024.1.2 for a few days but since yesterday ZHA just randomly restarted twice, 6 hours apart, :( the 2nd time the system was unstable for an hour before recovering.

Will try to see if I can manage to capture logs

@tjerkw
Copy link

tjerkw commented Jan 9, 2024

Same issue here.

@joelevi
Copy link

joelevi commented Jan 10, 2024

Same issue here.

Same thing here:

I run HAOS on an NUC, have the SkyConnect connected via USB extension cable (like you're supposed to), got the 2.4 update to the Silicon Labs Multiprotocol to 2.4.0, things started breaking... hours later I updated to 2.4.1... still broken... another few hours 2.4.2 was pushed and I upgraded.

Since 2.4.2 it's been up a day, then randomly the ZHA integration goes back to "Failed Setup Will Retry"

What's worse, I'm running both Zigbee AND Thread on the SkyConnect... so BOTH type of devices (85 of them) are broken... including lights.

Wife Acceptance Factor is dropping rapidly.

How do I downgrade back to 2.3.2??

@itsSaad
Copy link

itsSaad commented Jan 10, 2024

Had the same issue, Although my error is specifically:

async_initialize: all attempts have failed: [TimeoutError(), TimeoutError(), TimeoutError(), TimeoutError()]

Downgrading to 2023.12.4 seems to have caused less problems after I restarted the zigbee router based devices.

@mmccool
Copy link

mmccool commented Jan 11, 2024

Wife Acceptance Factor is dropping rapidly.

Know what you mean. Home automations (including things we have come to rely on) being broken for months is not winning me any points. I had to revert a bunch of things to failsafe mode and find workarounds for a bunch of other things. Overall this is causing me a significant amount of work and effort.

@puddly
Copy link
Contributor

puddly commented Jan 11, 2024

@cdalexndr: upgrade to 2024.1.2. There were may bugs fixed between 2023.12.4 and then.

Multi-PAN has issues independent of ZHA, some of which will be addressed in an update scheduled for release very soon. If you're having reloads and using multi-PAN, this isn't a ZHA issue, nor something ZHA can fix. Be aware that multi-PAN is still in the experimental phase (though improving) so if you need stability, I strongly suggest using separate sticks for Zigbee and Thread (or using an external Thread border router).

@thefunkygibbon
Copy link
Author

so was there anything obvious in my logs? do you want/need me to do anything to help get to the bottom of this?

were the libraries which were changed (reverted) in 1.1 changed back again in 1.2 or something?

@puddly
Copy link
Contributor

puddly commented Jan 11, 2024

2024.1.1 to 2024.1.2 was a very tiny change and there would be no difference between how the two behave network-wise.

What exact coordinator are you using?

@harvindhillon
Copy link

harvindhillon commented Jan 11, 2024

For me the issues started in the 2023.12.X releases. I am with Sonoff Dongle P

Even now I see random reboots, example:
image

Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/zigpy_znp/api.py", line 1098, in request_callback_rsp await self.request(request, timeout=timeout, **response_params) File "/usr/local/lib/python3.11/site-packages/zigpy_znp/api.py", line 1052, in request self._uart.send(frame) ^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'send'
This is the indicator that re-init has been done

Just a minute before, a lot of timeouts

`Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/zigpy_znp/api.py", line 1098, in request_callback_rsp
    await self.request(request, timeout=timeout, **response_params)
  File "/usr/local/lib/python3.11/site-packages/zigpy_znp/api.py", line 1059, in request
    response = await response_future
               ^^^^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/zigpy_znp/api.py", line 1097, in request_callback_rsp
    async with async_timeout.timeout(timeout):
  File "/usr/local/lib/python3.11/site-packages/async_timeout/__init__.py", line 141, in __aexit__
    self._do_exit(exc_type)
  File "/usr/local/lib/python3.11/site-packages/async_timeout/__init__.py", line 228, in _do_exit
    raise asyncio.TimeoutError
TimeoutError`

@Trabbi1999
Copy link

Can confirm issues since 2023.12.x aswell. The Skyconnect seems to be crashing or loosing the connection, my logs from the multiprotocoll integration are full of some messages like trying to conncet with baudrate X while its trying different baudrates before it stops overall.

Physically reconnecting the Skyconnect and starting the integration again fixes the issue temporarly. It mostly crashes at 2-3AM in the night. With 2024.0 it was stable for a couple of days, now we are back to 24h.

@thefunkygibbon
Copy link
Author

my coordinator as it is currently is a sonoff brigge flashed with tasmota. unlike others here.

@puddly
Copy link
Contributor

puddly commented Jan 11, 2024

Let me set one up to test. I've been running my home network on a Silvercrest gateway without issues for the past day so perhaps it's something specific to the Sonoff.

@asayler
Copy link

asayler commented Jan 11, 2024 via email

@puddly
Copy link
Contributor

puddly commented Jan 11, 2024

There were a lot of changes between 2023.12.4 and 2024.1.0 so please try the latest version. If you still have issues, post a debug log of the integration reload.

Multi-PAN issues are not related to ZHA unless downgrading solves the problem. Keep in mind that ZHA prior to 2023.12.0 did not notify you when your coordinator was offline or unresponsive so it's very possible that you're not actually seeing any new issues that were not present in the past.

@asayler
Copy link

asayler commented Jan 12, 2024 via email

@dmulcahey
Copy link
Contributor

I tested 2024.1.2 and that's where I had the most recent issues on my HA Yellow. The logs above are from that version. Just noting that the issues started in 2023.12.x. Prior to that, ZHA was rock solid. Ever since, it's been very flaky.

On Thu, Jan 11, 2024 at 3:20 PM puddly @.> wrote: There were a lot of changes between 2023.12.4 and 2024.1.0 so please try the latest version. If you still have issues, post a debug log of the integration reload. Multi-PAN issues are not related to ZHA unless downgrading solves the problem. Keep in mind that ZHA prior to 2023.12.0 did not notify you when your coordinator was offline or unresponsive so it's very possible that you're not actually seeing any new issues that were not present in the past. — Reply to this email directly, view it on GitHub <#107490 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACLHPPTUJ3JJSRDNTVZWRDYOBQRPAVCNFSM6AAAAABBQRKL4GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYGA3DAMRSGQ . You are receiving this because you are subscribed to this thread.Message ID: @.>

If this is the case are you willing to try something drastic to help identify this? Would you be willing to try running the most recent version with all other integrations disabled? Just for a bit to see if the stability issue goes away?

@asayler
Copy link

asayler commented Jan 12, 2024 via email

@dmulcahey
Copy link
Contributor

It’s worth a shot and I completely understand the impact this would have. No worries either way.

@EuleMitKeule
Copy link
Contributor

EuleMitKeule commented Jan 13, 2024

I have the same problem, I have to reboot 2-3 times for ZHA to startup correctly. I am using a Sonoff Zigbee 3.0 dongle.
I had this occur before updating to 2024.1, but since then the Home Assistant frontend crashes completely after 1-2 minutes whenever ZHA does not start. I am not 100% sure that these two issues are related though.

These are the related log messages:

2024-01-13 20:07:45.709 DEBUG (MainThread) [homeassistant.components.zha] ZHA storage file does not exist or was already removed
2024-01-13 20:07:50.431 ERROR (bellows.thread_0) [bellows.uart] CRC error in frame b'9c2791907e' (b'9190' != b'140c')
2024-01-13 20:07:56.889 DEBUG (MainThread) [homeassistant.components.zha] Failed to set up ZHA
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/protocol.py", line 74, in command
    return await future
           ^^^^^^^^^^^^
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/zha/__init__.py", line 163, in async_setup_entry
    zha_gateway = await ZHAGateway.async_from_config(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/zha/core/gateway.py", line 193, in async_from_config
    await instance.async_initialize()
  File "/usr/src/homeassistant/homeassistant/components/zha/core/gateway.py", line 211, in async_initialize
    await app.startup(auto_form=True)
  File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 226, in startup
    await self.initialize(auto_form=auto_form)
  File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 142, in initialize
    await self.load_network_info(load_devices=False)
  File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 251, in load_network_info
    (nwk,) = await ezsp.getNodeId()
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/__init__.py", line 215, in _command
    return await self._protocol.command(name, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/protocol.py", line 73, in command
    async with asyncio_timeout(EZSP_CMD_TIMEOUT):
  File "/usr/local/lib/python3.11/asyncio/timeouts.py", line 111, in __aexit__
    raise TimeoutError from exc_val
TimeoutError

@mortezaadi
Copy link

Same problem

2024-01-14 21:28:18.852 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: EzspError('EZSP is not running') 2024-01-14 21:28:18.853 WARNING (MainThread) [zigpy.application] Watchdog failure Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 665, in _watchdog_loop await self.watchdog_feed() File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 647, in watchdog_feed await self._watchdog_feed() File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 999, in _watchdog_feed (res,) = await self._ezsp.readCounters() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/__init__.py", line 212, in _command raise EzspError("EZSP is not running") bellows.exception.EzspError: EZSP is not running 2024-01-14 21:30:38.990 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [140501092979264] Error handling message: Unknown error (unknown_error) M from 192.168.x.x (Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36) Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/websocket_api/decorators.py", line 26, in _handle_async_response await func(hass, connection, msg) File "/usr/src/homeassistant/homeassistant/components/zha/websocket_api.py", line 1047, in websocket_get_configuration zha_gateway = get_zha_gateway(hass) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/zha/core/helpers.py", line 459, in get_zha_gateway raise ValueError("No gateway object exists") ValueError: No gateway object exists 2024-01-14 21:30:39.125 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [140501092979264] Error handling message: Unknown error (unknown_error) M from 192.168.x.x (Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36) Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/websocket_api/decorators.py", line 26, in _handle_async_response await func(hass, connection, msg) File "/usr/src/homeassistant/homeassistant/components/zha/websocket_api.py", line 1140, in websocket_get_network_settings backup = async_get_active_network_settings(hass) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/zha/api.py", line 43, in async_get_active_network_settings app = get_zha_gateway(hass).application_controller ^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/zha/core/helpers.py", line 459, in get_zha_gateway raise ValueError("No gateway object exists") ValueError: No gateway object exists 2024-01-14 21:30:45.635 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [140501092979264] Error handling message: Unknown error (unknown_error) M from 192.168.x.x (Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36) Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/websocket_api/decorators.py", line 26, in _handle_async_response await func(hass, connection, msg) File "/usr/src/homeassistant/homeassistant/components/zha/websocket_api.py", line 1047, in websocket_get_configuration zha_gateway = get_zha_gateway(hass) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/zha/core/helpers.py", line 459, in get_zha_gateway raise ValueError("No gateway object exists") ValueError: No gateway object exists 2024-01-14 21:30:45.641 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [140501092979264] Error handling message: Unknown error (unknown_error) M from 192.168.x.x (Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36) Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/websocket_api/decorators.py", line 26, in _handle_async_response await func(hass, connection, msg) File "/usr/src/homeassistant/homeassistant/components/zha/websocket_api.py", line 1140, in websocket_get_network_settings backup = async_get_active_network_settings(hass) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/zha/api.py", line 43, in async_get_active_network_settings app = get_zha_gateway(hass).application_controller ^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/zha/core/helpers.py", line 459, in get_zha_gateway raise ValueError("No gateway object exists") ValueError: No gateway object exists

@thefunkygibbon
Copy link
Author

whatever was changed in 2024.1.3 has made it even worse. what was once a week has happened about 4 times in 2 days

@thefunkygibbon
Copy link
Author

thefunkygibbon commented Jan 15, 2024

for what its worth, here is what it is saying mostly in the debug logs (other than the other ZHA log entries saying that DELIVERY_FAILED errors (understandably)
(it's just occured again!)

2024-01-14 23:20:46.009 ERROR (MainThread) [zigpy.zcl] [0xA01A:1:0x0b04] Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/zigpy/zcl/init.py", line 411, in reply
return await self._endpoint.reply(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/zigpy/endpoint.py", line 278, in reply
return await self.device.reply(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 483, in reply
return await self.request(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 317, in request
await send_request
File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 833, in request
await self.send_packet(
File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 868, in send_packet
status, _ = await self._ezsp.sendUnicast(
^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'sendUnicast'

@harvindhillon
Copy link

My network went on a restarting spree today, a lot of errors and timeouts. Devices dropped off but started rejoining when I lauched add devices. Apologies I cant pin point when things started to hit the fan but trying to fix it I had turn on debug a few times. Hopefully something jumps out
home-assistant.log
home-assistant_zha_2024-01-15T22-01-51.716Z.log
home-assistant_zha_2024-01-15T21-27-11.054Z.log

@puddly
Copy link
Contributor

puddly commented Jan 16, 2024

There were no ZHA or library changes between 2024.1.2 and 2024.1.3 so I think the problem you're having is just randomly manifesting. The repeated restart issue will be fixed by #107963.

  • If you are using multi-PAN, this isn't a ZHA issue: please head to Multiprotocol Add-On Crashing addons#3408.
  • If you arent using multi-PAN and are using a Silicon Labs radio (SkyConnect, Sonoff v2, etc.), can you describe your host hardware (e.g. Pi 4, old server, Green)? I believe this issue is affecting predominantly slower hardware, as the radio resets only when the host stops communicating with it for 10+ seconds (ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT). This is only possible if you are having high CPU usage and/or some other integration/addon is slowing HA down enough to cause this.

@harvindhillon I don't believe your issue is related to this one. Your log is littered with MAC_CHANNEL_ACCESS_FAILURE (the radio refusing to transmit because it's too noisy). Similarly:

2024-01-15 21:53:08.918 WARNING (MainThread) [zigpy.application] Zigbee channel 11 utilization is 88.24%!
2024-01-15 21:53:08.918 WARNING (MainThread) [zigpy.application] If you are having problems joining new devices, are missing sensor updates, or have issues keeping devices joined, ensure your coordinator is away from interference sources such as USB 3.0 devices, SSDs, WiFi routers, etc.

@asayler
Copy link

asayler commented Jan 16, 2024

Thanks, @puddly. To your hardware question:

I'm using the built-in Silicon Labs radio on an HA Yellow with a 4GB CM4 RPi driving it. It's in the normal Zigbee-only mode (not multiprotocol). My processor usage hovers around 5%-10%, so it's not like the system is over loaded (although I'm not sure how many things are bound by single core speed that those multi-core usage percentages may not reflect.)

Screenshot_20240116-115940

I attached logs above and in the previous iteration of this ticket. I have reverted back to 2023.11.3 which was the last stable version of ZHA prior to this run of issues that started in 2023.12. I can give the latest brain a try again later this week if you need more logs. I did update the radio firmware recently, and haven't tested that against the latest releases yet.

@mmccool
Copy link

mmccool commented Jan 16, 2024

Hmmm... as I said in my description, I am using Multi-PAN. So is my problem NOT a ZHA problem? I'm not using SkyConnect however, but Home Assistant Yellow, so home-assistant/addons#3408 is not an exact match either.

@mmccool
Copy link

mmccool commented Jan 16, 2024

As it happens - I do have a SkyConnect dongle. So I could go one way and set up a separate radio for Matter. As Zigbee is much more important to me right now, I would also be willing to disable Matter/Multi-PAN and just use ZHA if I can figure out how to do it without breaking anything.

@puddly
Copy link
Contributor

puddly commented Jan 16, 2024

so home-assistant/addons#3408 is not an exact match either.

The firmware is identical for both so it's very likely the same issue.

if I can figure out how to do it without breaking anything.

There are documented steps here: https://yellow.home-assistant.io/guides/disable-multiprotocol/. It will migrate your network back to Zigbee-only.

Once that's done, plug in the SkyConnect, install the OpenThread Border Router addon: it'll flash your SkyConnect with Thread firmware. You can then push your preferred dataset to the same border router from the Thread configuration and replicate your multi-PAN setup with two stable radios.

@harvindhillon
Copy link

harvindhillon commented Jan 16, 2024

Thanks @puddly, the energy scan warning is a weird one cause it is not always and I have my dongle on a 5m powered USB2 cable in the middle of my house, far from any radios. It fluctuates from as low as 15 to as high as 88. It is usually hovering around 76 though

    "energy_scan": {
      "11": 72.54901960784314,

20240116_214440
That's my dongle, nothing around it. ZHA Channel 11 as wifi channel 1 is free

Apologise for the logs as I dont remember the timestamps but the integration was restarting multiple times on its own and was trying to capture that.

There are errors that I've not seen before

Logger: homeassistant
Source: /usr/src/homeassistant/homeassistant/runner.py:145
First occurred: 5:51:38 PM (77 occurrences)
Last logged: 6:30:23 PM
Error doing job: Task exception was never retrieved

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/zigpy_znp/api.py", line 1098, in request_callback_rsp
    await self.request(request, timeout=timeout, **response_params)
  File "/usr/local/lib/python3.11/site-packages/zigpy_znp/api.py", line 1052, in request
    self._uart.send(frame)
    ^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'send'

@mmccool
Copy link

mmccool commented Jan 17, 2024

so home-assistant/addons#3408 is not an exact match either.

The firmware is identical for both so it's very likely the same issue.

if I can figure out how to do it without breaking anything.

There are documented steps here: https://yellow.home-assistant.io/guides/disable-multiprotocol/. It will migrate your network back to Zigbee-only.

Once that's done, plug in the SkyConnect, install the OpenThread Border Router addon: it'll flash your SkyConnect with Thread firmware. You can then push your preferred dataset to the same border router from the Thread configuration and replicate your multi-PAN setup with two stable radios.

So I disabled Multi-PAN and my system is working now. Yay! Going to try setting up a separate radio for Thread next as suggested. Thanks!

@brylee123
Copy link

Any solution for non-SkyConnect users in the works? ZHA is now reinitializing 5 minutes after reboot which takes about 10 minutes each time so it's basically unusable. It used to only have this issue once or twice a day.

@thefunkygibbon
Copy link
Author

thefunkygibbon commented Jan 18, 2024

is there a way of restarting HA as part of an automation? ie if detecting that the integration isn't available (or a device isn't) then restart HA completely? (I don't seem to be able to restart the integration manually even when i have access to the gui because it needs to be "up" for it to be able to restart... so i'm guessing it wouldnt work with restarting the integration itself.... although I dont mind trying).
for a way of getting around this issue for now as I'm out of the house for a few days and my wife and family are at home, their life is going to be marred with frustration whilst i'm away when it inevitably craps out multiple times during the day and night and i'm not around to sort it.

thanks

@brylee123
Copy link

is there a way of restarting HA as part of an automation? ie if detecting that the integration isn't available (or a device isn't) then restart HA completely? (I don't seem to be able to restart the integration manually even when i have access to the gui because it needs to be "up" for it to be able to restart... so i'm guessing it wouldnt work with restarting the integration itself.... although I dont mind trying).
for a way of getting around this issue for now as I'm out of the house for a few days and my wife and family are at home, their life is going to be marred with frustration whilst i'm away when it inevitably craps out multiple times during the day and night and i'm not around to sort it.

thanks

I've set up an automation:

Trigger: one of my plugged in zigbee devices become unavailable
Condition: check all my plugged in zigbee devices for unavailable state
Action: HA reboot core

But I've since disabled it because ZHA keeps reinitializing every 5 minutes and it takes 10 to boot up again.

@thefunkygibbon
Copy link
Author

is there a way of restarting HA as part of an automation? ie if detecting that the integration isn't available (or a device isn't) then restart HA completely? (I don't seem to be able to restart the integration manually even when i have access to the gui because it needs to be "up" for it to be able to restart... so i'm guessing it wouldnt work with restarting the integration itself.... although I dont mind trying).
for a way of getting around this issue for now as I'm out of the house for a few days and my wife and family are at home, their life is going to be marred with frustration whilst i'm away when it inevitably craps out multiple times during the day and night and i'm not around to sort it.
thanks

I've set up an automation:

Trigger: one of my plugged in zigbee devices become unavailable Condition: check all my plugged in zigbee devices for unavailable state Action: HA reboot core

But I've since disabled it because ZHA keeps reinitializing every 5 minutes and it takes 10 to boot up again.

great. ok i've just created that. mine doesnt take that long (at all) to restart HA (more like a minute max). still a PITA but at least hopefully it shouldn't affect the family too much at home. thanks for the tips

@MichaelMichaelMichaelMichaelMichael

There are documented steps here: https://yellow.home-assistant.io/guides/disable-multiprotocol/. It will migrate your network back to Zigbee-only.

Once that's done, plug in the SkyConnect, install the OpenThread Border Router addon: it'll flash your SkyConnect with Thread firmware. You can then push your preferred dataset to the same border router from the Thread configuration and replicate your multi-PAN setup with two stable radios.

Thank you very much - my Zigbee is working again without too much of a hassle.

@SanderKo85
Copy link

Have also a HomeAssistant Yellow, out of the box. Nothing added. Had also to disable the Multiprocotol.

@ddeconin-gh
Copy link

Hi, I'm using a SLB-06N coordinator (ezsp) over ethernet. When ZHA goes into its 'initializing' state, the dongle is still up and running and I can reach its admin console perfectly over HTTPS, so the issue is clearly on the ZHA side. A reboot of HA (without touching the dongle at all) fixes the issue.
What logs can I add to help diagnosing this?

@puddly
Copy link
Contributor

puddly commented Mar 6, 2024

@ddeconin-gh Please enable ZHA debug logging, reload the integration, and leave debug logging on until ZHA enters the "initializing" state. After it recovers, disable debug logging, ZIP the log, and upload it.

@issue-triage-workflows
Copy link

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@greenkiwi
Copy link

I've been experiencing this issue consistently of late.

I'll see if I can get debug logs.

@jeppesens
Copy link

It has been a constant issue for my setup as well. Recently everything got worse and had a non-responsive Zigbee network. I bought a second Skyconnect, disabled multi protocol, and I believe that solved the issues I had for both things, haven’t had a ZHA stuck in initializing since!

@greenkiwi
Copy link

I just had it fail again.
image
image
image

I'm not sure the best way to pull the logs off the instance.

At any rate, here are some of the error logs that I'm seeing in the logs

2024-06-21 07:27:38.632 DEBUG (MainThread) [homeassistant.components.zha.core.gateway] Connection to the radio was lost: <NcpResetCode.ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT: 81>

DEBUG (MainThread) [homeassistant.components.zha.core.discovery] 'sensor' platform -> 'DeviceCounterSensor' using counter groups[counters] counter group[ezsp_counters] counter[ASH_OVERRUN_ERROR]

2024-06-21 08:26:39.923 DEBUG (MainThread) [homeassistant.components.zha.core.discovery] 'sensor' platform -> 'DeviceCounterSensor' using counter groups[counters] counter group[ezsp_counters] counter[ASH_FRAMING_ERROR]

2024-06-21 08:26:39.923 DEBUG (MainThread) [homeassistant.components.zha.core.discovery] 'sensor' platform -> 'DeviceCounterSensor' using counter groups[counters] counter group[ezsp_counters] counter[ASH_OVERFLOW_ERROR]

# command to get different errors
grep -i error zha.log|grep -v "Connection to the radio was lost" |grep -v "counter\[ASH_OVERRUN_ERROR\]" |grep -v "ASH_FRAMING_ERROR"|grep -v "ASH_OVERFLOW_ERROR"

Some logs with initialization in it

2024-06-21 08:26:39.936 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0001]: Performing cluster handler specific initialization: ['battery_voltage', 'battery_percentage_remaining']
2024-06-21 08:26:39.936 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0001]: finished cluster handler initialization
2024-06-21 08:26:39.936 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0000]: initializing cluster handler: from_cache: True
2024-06-21 08:26:39.936 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0000]: finished cluster handler initialization
2024-06-21 08:26:39.936 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x1000]: initializing cluster handler: from_cache: True
2024-06-21 08:26:39.936 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x1000]: finished cluster handler initialization
2024-06-21 08:26:39.936 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0019]: initializing cluster handler: from_cache: True
2024-06-21 08:26:39.936 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0019]: initializing cached cluster handler attributes: ['current_file_version']
2024-06-21 08:26:39.936 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0019]: finished cluster handler initialization
2024-06-21 08:26:39.936 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0008]: initializing cluster handler: from_cache: True
2024-06-21 08:26:39.936 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0008]: finished cluster handler initialization
2024-06-21 08:26:39.936 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0006]: initializing cluster handler: from_cache: True
2024-06-21 08:26:39.936 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0006]: finished cluster handler initialization
2024-06-21 08:26:39.936 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0019]: initializing cluster handler: from_cache: True
2024-06-21 08:26:39.936 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0019]: initializing cached cluster handler attributes: ['current_file_version']
2024-06-21 08:26:39.936 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0019]: finished cluster handler initialization
2024-06-21 08:26:39.936 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0005]: initializing cluster handler: from_cache: True
2024-06-21 08:26:39.936 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0005]: finished cluster handler initialization
2024-06-21 08:26:39.936 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0300]: initializing cluster handler: from_cache: True
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0300]: finished cluster handler initialization
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0003]: 'async_initialize' stage succeeded
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0001]: 'async_initialize' stage succeeded
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0000]: 'async_initialize' stage succeeded
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x1000]: 'async_initialize' stage succeeded
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0019]: 'async_initialize' stage succeeded
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0008]: 'async_initialize' stage succeeded
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0006]: 'async_initialize' stage succeeded
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0019]: 'async_initialize' stage succeeded
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0005]: 'async_initialize' stage succeeded
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xBCA0:1:0x0300]: 'async_initialize' stage succeeded
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0xBCA0](TS004F): completed initialization
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x4785](TS0044): started initialization
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:ZDO](TS0044): 'async_initialize' stage succeeded
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:1:0x0000]: initializing cluster handler: from_cache: True
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:1:0x0000]: finished cluster handler initialization
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:1:0x0001]: initializing cluster handler: from_cache: True
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:1:0x0001]: initializing uncached cluster handler attributes: ['battery_voltage', 'battery_percentage_remaining'] - from cache[True]
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:1:0x0001]: Performing cluster handler specific initialization: ['battery_voltage', 'battery_percentage_remaining']
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:1:0x0001]: finished cluster handler initialization
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:1:0x0019]: initializing cluster handler: from_cache: True
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:1:0x0019]: initializing cached cluster handler attributes: ['current_file_version']
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:1:0x0019]: finished cluster handler initialization
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:1:0x0019]: initializing cluster handler: from_cache: True
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:1:0x0019]: initializing cached cluster handler attributes: ['current_file_version']
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:1:0x0019]: finished cluster handler initialization
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:1:0x0000]: 'async_initialize' stage succeeded
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:1:0x0001]: 'async_initialize' stage succeeded
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:1:0x0019]: 'async_initialize' stage succeeded
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:1:0x0019]: 'async_initialize' stage succeeded
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:2:0x0001]: initializing cluster handler: from_cache: True
2024-06-21 08:26:39.937 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:2:0x0001]: initializing uncached cluster handler attributes: ['battery_voltage', 'battery_percentage_remaining'] - from cache[True]
2024-06-21 08:26:39.938 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:2:0x0001]: Performing cluster handler specific initialization: ['battery_voltage', 'battery_percentage_remaining']
2024-06-21 08:26:39.938 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:2:0x0001]: finished cluster handler initialization
2024-06-21 08:26:39.938 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:2:0x0001]: 'async_initialize' stage succeeded
2024-06-21 08:26:39.938 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:3:0x0001]: initializing cluster handler: from_cache: True
2024-06-21 08:26:39.938 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:3:0x0001]: initializing uncached cluster handler attributes: ['battery_voltage', 'battery_percentage_remaining'] - from cache[True]
2024-06-21 08:26:39.938 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:3:0x0001]: Performing cluster handler specific initialization: ['battery_voltage', 'battery_percentage_remaining']
2024-06-21 08:26:39.938 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:3:0x0001]: finished cluster handler initialization
2024-06-21 08:26:39.938 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:3:0x0001]: 'async_initialize' stage succeeded
2024-06-21 08:26:39.938 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:4:0x0001]: initializing cluster handler: from_cache: True
2024-06-21 08:26:39.938 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:4:0x0001]: initializing uncached cluster handler attributes: ['battery_voltage', 'battery_percentage_remaining'] - from cache[True]
2024-06-21 08:26:39.938 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:4:0x0001]: Performing cluster handler specific initialization: ['battery_voltage', 'battery_percentage_remaining']
2024-06-21 08:26:39.938 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:4:0x0001]: finished cluster handler initialization
2024-06-21 08:26:39.938 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x4785:4:0x0001]: 'async_initialize' stage succeeded
2024-06-21 08:26:39.938 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x4785](TS0044): completed initialization
2024-06-21 08:26:39.938 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x0000](EZSP): started initialization
2024-06-21 08:26:39.938 DEBUG (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0x0000:ZDO](EZSP): 'async_initialize' stage succeeded
2024-06-21 08:26:39.938 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x0000](EZSP): completed initialization

@puddly
Copy link
Contributor

puddly commented Jun 22, 2024

Connection to the radio was lost: <NcpResetCode.ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT: 81> means that something prevented ZHA from responding to your radio for about six seconds and the radio cut off the connection. This is usually a custom integration that blocks the Core event loop.

@issue-triage-workflows
Copy link

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@issue-triage-workflows issue-triage-workflows bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests