Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SDK Quality Test] :Unpairing function is failed with src/app/CommandSender.cpp:328: CHIP Error 0x00000032 #34056

Closed
Saravana-kr22 opened this issue Jun 24, 2024 · 5 comments
Labels

Comments

@Saravana-kr22
Copy link

Description:
I have used python controller to run a stress test on my RPI sample dut (all-clusters-app) on Pair-Unpair test scenario for 10,000 iterations. The test passed in iteration 384 without any error but the iteration 385 is Failed to commission the dut during the Remove Current Fabric step with the below traceback error

Traceback (most recent call last):
File "/Users/chaitanyajandhyala/code/matter-qa/src/matter_qa/library/helper_libs/matter_testing_support.py", line 1214, in unpair_dut
await controller.UnpairDevice(node_id)
File "/Users/chaitanyajandhyala/code/matter-qa/src/.venv/lib/python3.10/site-packages/chip/ChipDeviceCtrl.py", line 621, in UnpairDevice
return await asyncio.futures.wrap_future(ctx.future)
chip.exceptions.ChipStackError: src/app/CommandSender.cpp:343: CHIP Error 0x00000032: Timeout

GitHub hash of the SDK that was being used: c829a2c

Platform
controller - python controller running on mac,
DUT - all-cluster-app running on raspi
Provisioned using ble-wifi

Platform Version(s)
RPI-4, 8GB RAM

Test Scenario

Pair/Provision the DUT (All-cluster-app running on raspi) using on-network
Read device information after the device is provisioned
sleep for 2 secs
Unpair the DUT (i.e. removing the fabric)
factory reset the DUT
repeat step1 - 6 for 10,000 iterations
Script reference: https://github.com/CHIP-Specifications/matter-qa/blob/main/src/matter_qa/scripts/reliability_scripts/TC_Pair.py

Reference log:
Example of Passed iteration details:

The iteration number 384 is passed, below log files for the same
TH log: log_2024-06-22_20-06-14.log
DUT log: Dut_log_384_2024-06-22T20_06_46_741537.log
TCP_DUMP: controller_tcpdump_2024-06-22_20-06-14.zip

Example of Failed iteration details:

The iteration number 385 is failed with this error below the log file
TH log: log_2024-06-22_20-07-00.log
DUT log: Dut_log_385_2024-06-22T20_07_37_259801.log
TCP_DUMP:controller_tcpdump_2024-06-22_20-07-00.zip

@bzbarsky-apple
Copy link
Contributor

From the DUT log in the success case:

[1719112000.586334][32496:32496] CHIP:EM: <<< [E:22812r S:18741 M:61018202 (Ack:232649930)] (S) Msg TX to 1:000000000001B669 [0000] [UDP:[fe80::4d1:a2a8:62b3:f1b8%wlan0]:54845] --- Type 0001:09 (IM:InvokeCommandResponse)
[1719112000.586532][32496:32496] CHIP:DMG: Command response sender moving to [AllInvokeR]
[1719112000.662908][32496:32496] CHIP:EM: >>> [E:22812r S:18741 M:232649932 (Ack:61018202)] (S) Msg RX from 1:000000000001B669 [0000] --- Type 0000:10 (SecureChannel:StandaloneAck)
[1719112006.817004][32496:32496] CHIP:ZCL: Emitting ShutDown event
...
[1719112006.819563][32496:32496] CHIP:DL: Inet Layer shutdown

So the DUT sends the response, then shuts down about 6 seconds later. That gives the response enough time to actually be sent to the client.

For the failure case:

[1719112057.315852][32559:32559] CHIP:EM: <<< [E:22833r S:23754 M:262271351 (Ack:259598923)] (S) Msg TX to 1:000000000001B669 [0000] [UDP:[fe80::4d1:a2a8:62b3:f1b8%wlan0]:54845] --- Type 0001:09 (IM:InvokeCommandResponse)
[1719112057.316202][32559:32559] CHIP:DMG: Command response sender moving to [AllInvokeR]
[1719112057.316278][32559:32559] CHIP:ZCL: Emitting ShutDown event
[1719112057.318753][32559:32559] CHIP:DL: Inet Layer shutdown

So in this case the DUT shuts down less than half a millisecond after trying to send the response, almost certainly before the actual packet has had a chance to be sent.

Note that response to RemoveFabric is best-effort, not guaranteed, per spec, so this DUT behavior is perfectly spec-compliant. But it could do a better job at waiting before shutting down. Why does it shut down immediately, exactly? What kills that process?

@bzbarsky-apple
Copy link
Contributor

@Saravana-kr22 Please see question above?

@cjandhyala
Copy link

@bzbarsky-apple script doesn't do anything on the DUT , controller in the script sends 'RemoveFabric' CMD and waits for the response. In this scenario controller didn't receive response from DUT for that CMD

@bzbarsky-apple
Copy link
Contributor

bzbarsky-apple commented Jun 24, 2024

@cjandhyala It didn't receive a responses because the DUT shut down before it managed to set the response (which by the way is allowed by the spec for RemoveFabric). Why did the DUT shut down? See the "Inet Layer shutdown" message and so on.

@woody-apple
Copy link
Contributor

Closing issues that aren't in top X, or listed as such.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants