Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDK Loop/trip on CHIP Error 0x00000021: End of TLV #32493

Open
marcelveldt opened this issue Mar 7, 2024 · 0 comments
Open

SDK Loop/trip on CHIP Error 0x00000021: End of TLV #32493

marcelveldt opened this issue Mar 7, 2024 · 0 comments
Labels
bug Something isn't working darwin needs triage

Comments

@marcelveldt
Copy link
Contributor

marcelveldt commented Mar 7, 2024

Reproduction steps

What is supposed to happen if the connection to a device is lost right at the moment when doing a (wildcard) Read subscription request ?

We got multiple reports from Nanoleaf devices not being available in either our ecosystem (Home Assistant), Apple Home or both. Digging into the issue I noticed that for these devices setting up the subscription just never returned. It was just stuck there without an exception, error, just nothing, waiting forever.

I took me a whole bit of wrestling to finally reproduce the issue but I managed to do so and the device communication seems to get disturbed with a "CHIP Error 0x00000021: End of TLV" during the transmission of the attributes from the Read request and then restored but then crashes again, over and over. The call to do the read never returns (well, as it didn't complete technically) but there is also no timeout. I left it for hours and hours and it never exits out of this state.

So basically its already triggering the auto resubscribe logic while the initial subscription has not yet been setup. In fact its not even past the point where we assign the callback functions for the various events.

In my opinion we are dealing with - some special case here, look at the log I shared below, there's a very distinct pattern in the loop of retries. Is this a device issue or Thread level issue perhaps ? Fragmentation maybe ?

Also, maybe I'm wrong in this, but to me the subscription should not auto resubscribe yet at this stage, it should throw an exception that setting up the exception failed. It should only start doing auto resubscribes if the initial read request succeeded.

Also, to complete the info: This issue seems to be triggered when the device has a somewhat bad reception or picks a border router with a bad wifi connection. At least that is my theory. I could reproduce it with wrapping a lighbulb in tin foil to disturb its communication but also when I had a somewhat distant Apple Homepod Mini that picked the wrong access point so had a flaky wifi signal.

Bug prevalence

We got a few reports now from productions setups

GitHub hash of the SDK that was being used

v1.2.0.1 (181b0cb)

Platform

python

Platform Version(s)

No response

Anything else?

As reproducing is so hard and we have no idea if only Nanoleaf devices are affected or maybe in combination with particular Border routers, we decided to add some very visible logging to our project so we can track issue reports by users in a bit more structured manner: home-assistant-libs/python-matter-server#623

Catched some logging, log Start is the Read request (from the python c bindings);

qrghqkYg.txt

@marcelveldt marcelveldt added bug Something isn't working needs triage labels Mar 7, 2024
@github-actions github-actions bot added the darwin label Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working darwin needs triage
Projects
Status: Todo
Development

No branches or pull requests

1 participant