Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chip tool starts resolve to soon #8413

Closed
doublemis1 opened this issue Jul 15, 2021 · 7 comments
Closed

chip tool starts resolve to soon #8413

doublemis1 opened this issue Jul 15, 2021 · 7 comments
Labels
stale Stale issue or PR V1.X

Comments

@doublemis1
Copy link
Contributor

Problem

Chip tool using ble-thread command proceed the whole commissioning and after that resolve the IP address of a commissioned node.
Chip tool sends resolve command immediately after callback from enabling Thread network, which may cause that accessory can be still in attaching procedure while controller wants to resolve IP address. If the previous commissioning session used the same fabric id and node id the controller can get the wrong IP address.

Proposed Solution

For the current implementation of the resolve and chip tool. The chip tool should wait 2-3 seconds between enabling the thread network and resolving the IP address.
It will be helpful if, in the chip tool, the user can define the node id.

@bzbarsky-apple
Copy link
Contributor

The chip tool should wait 2-3 seconds between enabling the thread network and resolving the IP address.

Shouldn't the accessory side not respond until it's executed the command and joined the network? How is this expected to work per spec?

It will be helpful if, in the chip tool, the user can define the node id.

This part we should definitely do. Filed #8436

@Damian-Nordic
Copy link
Contributor

Damian-Nordic commented Jul 16, 2021

@bzbarsky-apple The thing is that Thread devices don't respond on their own. They register their service upon startup in the OpenThread Border Router using the SRP protocol. OTBR then monitors the registered services and updates a mDNS daemon accordingly (be it mDNS Responder or Avahi), which in turn acts as the mDNS server. The architecture creates some delay and if node IDs are reused, some components like mDNS client or server may still return cached entries. I think our tests should either make sure that old DNS entries are cleaned up (for example, by handling RemoveFabric command properly) before factory-resetting a device, or use random node IDs.

@bzbarsky-apple
Copy link
Contributor

@Damian-Nordic Thank you for the context!

I changed chip-tool to use random node ids by default in #8450. That does not affect CI (purposefully), but in CI this should not be a problem, I hope....

If that's not enough for our purposes, we should see what else we should do to ensure unique node ids or the DNS entry cleanup you suggest. In particular, we might need to do something with chip-device-ctrl, which will just use whatever node id you tell it to and hence can lead to reuse...

I do think the right thing here is to ensure node id uniqueness, not add sleep() workarounds for lack thereof.

@stale
Copy link

stale bot commented Jan 26, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale Stale issue or PR label Jan 26, 2022
@stale stale bot removed the stale Stale issue or PR label Jan 26, 2022
@stale
Copy link

stale bot commented Aug 16, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale Stale issue or PR label Aug 16, 2022
@bzbarsky-apple bzbarsky-apple removed the stale Stale issue or PR label Aug 26, 2022
@stale
Copy link

stale bot commented Mar 11, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale Stale issue or PR label Mar 11, 2023
@bzbarsky-apple
Copy link
Contributor

This has been fixed by adding retries to operational discovery, which gives the new node time to come up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Stale issue or PR V1.X
Projects
None yet
Development

No branches or pull requests

4 participants