Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align Shelly sleeping devices timeout with non-sleeping #118969

Draft
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

thecode
Copy link
Member

@thecode thecode commented Jun 6, 2024

Proposed change

Currently we allowed a Shelly sleeping device to miss a single update due to the long update period the device use.
For non sleeping devices we allowed 2 missing updates (and even after that we don't mark them offline unless we can't reach them).
This makes sleeping devices to be marked as offline very easy, Gen1 devices are UDP based and can easily miss some updates.
Even under near perfect conditions UDP has an expected loss factor, adding Wi-Fi means it will be sometime more frequent.

I did a test using a Shelly Motion Gen1:
I left the device in a closed box so it doesn't detect any events, I put it near an AP so it was around -50dBm the whole week and it was not marked as unavailable, however moving it to a far point which reduced the RSSI to -70dBm made it being marked unavailable 2 times that week. -70dBm is considered to be excellent to good but it still lost some messages, probably due to collisions.

This change align all devices to use the same timeout.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • I have followed the perfect PR recommendations
  • The code has been formatted using Ruff (ruff format homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
  • Untested files have been added to .coveragerc.

To help with the load of incoming pull requests:

@home-assistant
Copy link

home-assistant bot commented Jun 6, 2024

Hey there @balloob, @bieniu, @chemelli74, @bdraco, mind taking a look at this pull request as it has been labeled with an integration (shelly) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of shelly can trigger bot actions by commenting:

  • @home-assistant close Closes the pull request.
  • @home-assistant rename Awesome new title Renames the pull request.
  • @home-assistant reopen Reopen the pull request.
  • @home-assistant unassign shelly Removes the current integration label and assignees on the pull request, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the pull request.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the pull request.

@chemelli74
Copy link
Contributor

The only draw back with this change si that it may take way too long to mark a device as unavailable.
If it's something important, linked to security ( motion, door, flood ), this change may compromise the result.

@thecode thecode removed this from the 2024.6.1 milestone Jun 7, 2024
@thecode thecode marked this pull request as draft June 7, 2024 10:46
@bdraco
Copy link
Member

bdraco commented Jun 7, 2024

It's definitely expected that UDP-based protocol will drop packets over Wi-Fi. I expect that is one of the reasons Shelly moved away from COAP with Gen2/Gen3 devices.

This a difficult problem as its never going to be as stable as we want it to be as there are no guarantees with UDP. All the potential options here are compromises.

It's reasonable never to fix this if the consensus is that the device is more important to be shown as unavailable after a single packet drop instead of reflecting an old state. It's always hard to know the correct action here because there will be opinions on both sides of this.

As an alternative, we could clearly state in our documentation that it's normal for Gen1 sleepy devices to experience periodic unavailability under standard conditions (nobody will ever have a perfect network), particularly when updates are occasionally dropped. That would allow us to close the linked issues and point them to the docs instead. i.e. wontfix

@KruseLuds
Copy link

I have to reiterate this was listed above as a fix for

#119002

...and that 119002 is referring to a GEN 2 device, however the above statement seems to imply this (118969) may be closed without any change being made. Please note that the above link I have supplied is to an issue that is NEW which came up due to a change to the Shelly integration and needs to be fixed as these Shelly Motion 2 units are used for an alarm system as well as a smart home for disabled older folks that require them to work properly instead of just becoming "unavailable" on a constant basis (which never used to happen).

So if this (118969) is closed without making any changes, 119002 which is referenced above as being fixed by this - and was posted with logs - should stay open and 119002 still needs to be worked upon as soon as possible.

Thank you -

@smarthomefamilyverrips
Copy link

smarthomefamilyverrips commented Jun 9, 2024

It's definitely expected that UDP-based protocol will drop packets over Wi-Fi. I expect that is one of the reasons Shelly moved away from COAP with Gen2/Gen3 devices.

This a difficult problem as its never going to be as stable as we want it to be as there are no guarantees with UDP. All the potential options here are compromises.

It's reasonable never to fix this if the consensus is that the device is more important to be shown as unavailable after a single packet drop instead of reflecting an old state. It's always hard to know the correct action here because there will be opinions on both sides of this.

As an alternative, we could clearly state in our documentation that it's normal for Gen1 sleepy devices to experience periodic unavailability under standard conditions (nobody will ever have a perfect network), particularly when updates are occasionally dropped. That would allow us to close the linked issues and point them to the docs instead. i.e. wontfix

I not understand this is already going on since 2024.5 #116948 and mentioned in few other issues as well with #119002 as the latest it already several times is implied that it are connecting problems, and even now is suggested to simple ignore this and write it of as won't fix. This issue NOT WAS THERE AND NEVER HAVE BEEN THERE BEFORE 2024.5 so something did change to make this happen, and the focus should be on finding out what this is to my opinion. And yes I understand most of you do this out of hobby and for free, what is highly appreciated so I apologize for my frustrated reaction, but I hope it also is understandable if all the time is assumed this issue is not caused by the integration (or other changes within 2024.5 causing the shelly intgration to have this behavior) while it not existed before update 2024.5

EDIT: also it not is logical that this "becoming unavailable" only happen after a restart of HA and that ONE time reloading the device within the Shelly integration solves this for as long there is no other HA restart. If it was network related it also would occur on a more regular basis and not only at a HA restart.

I am no code writer and my knowledge is far from expert level, but I would say, make the Shelly integration reload all it's devices by itself for example 1 minute after restart of HA is completed would solve this issue also. Just thinking out loud as this is what I am doing manually now each time after I did do a HA restart.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants