-
-
Notifications
You must be signed in to change notification settings - Fork 29.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Purge causes recorder to stop writing to the DB until HA is restarted (Auto purge happens at 4:12am) #117263
Comments
Hey there @home-assistant/core, mind taking a look at this issue as it has been labeled with an integration ( Code owner commandsCode owners of
(message by CodeOwnersMention) recorder documentation |
It sounds like you have an integration that is filling the database with state changes so quickly that the system cannot keep up. Try enabling debug logging for |
Hi, I enabled debug logging for I did just now install Home Assistant Container on a separate machine using the same configuration and noticed I could not reproduce the issue by running a manual purge. This could mean the issue is somehow caused by the installation environment. After doing a manual purge on the Container version the database size has been reduced to 620 MB, nice. When doing a purge on my original system with debug logging enabled for
|
That sounds like a corrupt index You might try running an integrity check https://www.sqlite.org/pragma.html#pragma_integrity_check on the database and reindex anything that comes up: https://www.sqlite.org/lang_reindex.html It could also indicate a problem with your storage medium |
It is pretty much certainly an issue with my installation of HA Core and not with the database or storage medium. I installed HA Container on the same machine and now the issue is gone. Any suggestions why the issue only occurs with HA Core? |
Do you have a different version of SQLite on the core install? |
No because SQLite (the CLI tool) is not part of the Core dependencies. The Recorder page mentions Home Assistant uses SQLAlchemy for database access, but that module is not installed manually when installing Core. I expect HA installs this itself and also keeps it up to date. I don't know how I would compare the version numbers for this module. |
You can find the version under setting, system, repairs, three dot menu, system information |
Thanks @bdraco. According to the system information my Core install does indeed use a newer SQLite version than the Container install (3.45.3 > 3.44.2). It also uses a slightly newer Python version (3.12.3 > 3.12.2) for a slightly lower HA Core version (core-2024.5.2 < core-2024.5.3). I can try and see what happens if I equalize the Core install to the Container install. Do you know how I would downgrade the SQLite version? If you're interested I attach the system information outputs for the Core and Container installs: |
This is also happening for me now, a Core install. I recently upgraded Ubuntu Server to 24.04. Also on Python 3.12.3. The SQLite3 library version installed is 3.45.1-1ubuntu2. It gets locked up every morning at the programmed time (4:12?) and if I manually call the
I installed the command-line |
I'm assuming this line:
I also got logs that look the same once debugging / logging level debug are enabled:
|
Partial thread dump via
Not sure what the two |
I stoped HA and opened the database with the commandline So, I think I had some entry in the
I just deleted the relevant events:
And verified that no unreferenced
Calling |
Unfortunately my instance got back to getting stuck at purge the second day after that. At this point my next steps are probably going to add debug logging to the diverse points of the above stack trace the next time this reproduces (I cleaned the database manually again this time). |
We found the problem Some old databases have Unfortunately SQLite does not support dropping a foreign key constraint (see To fix this the whole states table needs to be recreated using the steps here: https://www.sqlite.org/lang_altertable.html
|
|
so we need write a rebuild_table function for sqlite to fix this.
Going to go with the 12 step rebuild process.. Its slower but less risky. |
Will be fixed in 2024.8.0 |
cool, |
I'm not sure this is the actual problem in my case.
I can't see any foreign key there... Edit: removed spurious extra copied text. |
Please check again as I looks like that is not the right database file unless you are running a 2-3 year old version of HA |
This instance was installed originally more than two years ago and has been migrated to the latest version each time via
Isn't this the right database file? What other name should I look for? |
Sure looks like its been changed recently. That's very strange that is missing all the foreign keys. I'm not sure whats going on with your system. events should look like this
|
What is the easiest way check/fix this? |
That’s not something that can be solved in this issue. Please jump on discord and ask for help in |
2024.7.2 resolved issue for me. |
same here. I saw that last night there was no gap in my history |
For what it is worth, 2024.7.2 seems to be working now and fixed the issue (unless it was something else I did). I was one of the people with issues without the known integrations that causes the issue. I worked around the issue, by rolling back to 2024.6.4 which is a workaround, as by database would have grown too big without doing this. While normally I have the recorder set to keep 4 days of records, I did a manual recorder:purge to keep only 1 day and repacked the DB which brought its size down from 8Gb to about 3GB (in fact the recorder was probably only storing about 3 days of data at this time). I this took a bit less than half an hour. Of course this was all working on 2024.6.4. Then I did an upgrade to 2024.7.2 and again manually did a recorder:purge keeping only 1 day which seemed to complete without issues without hanging the recorder. So assume issue is fixed. It finished very quickly because I assume not much is being purged because I only just did it. In a few days when it is running over night and purging more, I will know for sure, but looks good. Hope that is useful. |
For me this issue started with 2024.7.2. Now two nights in a row that the recorder hangs at 4.12. No useful information in the logs, except that when I restart HA I see the following:
Anything in particular I can take a look at? Looks like the database state table rebuild failed (which might have been due to not having enough diskspace for that), and it was never restarted. I executed the migration now manually and run the recorder purge service manually afterwards, which did complete correctly now🎉Now running a repack as the database file did grow a couple of GBs because of the rebuild 😄 And repack removed those additional GBs and some extra 👍🏻 |
Great work. Update to homeassistant 2024.7.2 fixed the purge problem. |
confirm: Update from 2024.7.1 to 2024.7.2 did fix the issue for me. did not break recording anymore at about 4:00 o'clock. |
If you still have a problem, please verify you have enough disk space (you need ~2x the size of the database free -- maybe a bit less but 2x is safest), and restart again. If that doesn't solve the problem, please download and post your logs https://my.home-assistant.io/redirect/logs |
So.. my db file is 16.0 gb and even after removing all the local backup files, I only have 24 gb free. This doesn't seem to have been a problem with previous monthly purges though, so why is it suddenly a problem now? I manually ran:
And it seems to be the cause of the graph flatlining. It seems to be doing a lot of disk reads:
The average appears to be over 50 mb/s, which should take less than 6 minutes to go through 16 gb, but it's been doing this for over 5.5 hrs now. Any idea what's going on?
I'm not seeing anything in the logs related to the recorder, but here it is: |
24 GB is likely enough. It should not be a problem, but is always better to have 2x the database size in case a table rebuild fails. I don't see any table rebuilds in your logs so either the table rebuild already happened, or there is something wrong with your database that prevents it from getting to that point. You can check to see if the table rebuild already happened by manually running the following % sqlite3 home-assistant_v2.db
SQLite version 3.43.2 2023-10-10 13:08:14
Enter ".help" for usage hints.
sqlite> .schema states
CREATE TABLE IF NOT EXISTS "states" (
"state_id" INTEGER NOT NULL ,
"entity_id" CHARACTER(0) NULL ,
"state" VARCHAR(255) NULL ,
"attributes" CHARACTER(0) NULL ,
"event_id" SMALLINT NULL ,
"last_changed" CHARACTER(0) NULL ,
"last_changed_ts" DOUBLE NULL ,
"last_updated" CHARACTER(0) NULL ,
"last_updated_ts" DOUBLE NULL ,
"old_state_id" INTEGER NULL ,
"attributes_id" INTEGER NULL ,
"context_id" CHARACTER(0) NULL ,
"context_user_id" CHARACTER(0) NULL ,
"context_parent_id" CHARACTER(0) NULL ,
"origin_idx" SMALLINT NULL ,
"context_id_bin" BLOB NULL ,
"context_user_id_bin" BLOB NULL ,
"context_parent_id_bin" BLOB NULL ,
"metadata_id" INTEGER NULL , last_reported_ts FLOAT,
PRIMARY KEY ("state_id"),
FOREIGN KEY("old_state_id") REFERENCES "states" ("state_id") ON UPDATE RESTRICT ON DELETE RESTRICT,
FOREIGN KEY("attributes_id") REFERENCES "state_attributes" ("attributes_id") ON UPDATE RESTRICT ON DELETE RESTRICT,
FOREIGN KEY("metadata_id") REFERENCES "states_meta" ("metadata_id") ON UPDATE RESTRICT ON DELETE RESTRICT
);
CREATE INDEX "ix_states_attributes_id" ON "states" ("attributes_id");
CREATE INDEX "ix_states_context_id_bin" ON "states" ("context_id_bin");
CREATE INDEX "ix_states_last_updated_ts" ON "states" ("last_updated_ts");
CREATE INDEX "ix_states_metadata_id_last_updated_ts" ON "states" ("metadata_id", "last_updated_ts");
CREATE INDEX "ix_states_old_state_id" ON "states" ("old_state_id");
sqlite> If you only see the 3 FOREIGN KEYs, the problem one has been removed and the rebuild is complete. If its still there, something else is wrong, and you'll need to enable |
I only see 2 FOREIGN KEYs:
|
It looks like you don't have this issue. There should be a FK on |
Then why does the recorder purge cause my graphs to flatline? |
While the symptoms are the same, since the problematic foreign key is not there, its a different problem which will require a different solution. Please start a fresh issue report with debug logs for # Example configuration.yaml entry
logger:
default: info
logs:
sqlalchemy: debug |
2024.7.2 has certainly solved the "recorder locking-up" problem, but something is still not right in my system. The home-assistant_v2.db file grew from 3.2GB to 3.8GB after the upgrade, but has been static in size over the last 5 days. I have the following in configuration.yaml: |
Hey, I don't really know what to do, I'm having the same problem as above. recorder stops every day at4:30 and i have to reboot. I updated to 2024.7.2 and its the same. When i look in SQLite the temp table for events existed so it seems the migration did not complete. When i save the DB and try to do the migration externally it fails copying the data from the old table to the new because of "unique constraints". I really do not want to lose all of my energy data. I'm super annoyed that it was all working fine until a recent update and now I'm spending hours trying to fix it. Any ideas of where i might go for help? I tied the custom component but it didn't do anything (i assume it thinks the migration was completed). |
I also have the problem, just like jasonwfrancis1 |
As an added datapoint, I have been having this issue on mariadb. Upgrading didn't fix it. |
i managed to fix it myself, i basically migrated to maria DB and used a good tutorial to migrate all of the data. a few gaps while i was doing the whole process but i think im back up and running properly. just kicked off a purge, and its running...fingers crossed. |
Hi @bdraco After some time, my database keeps increasing. Don't know how to reduce it? I don't know if your corrections work correctly. |
This problem is solved in 2024.7.2
Workaround: Disabling nightly auto purge will prevent the issue from occurring (this is not a long term solution)
Be sure to re-enable auto-purge after installing 2024.7.2 or your database will grow without bounds, and your system will eventually run out of disk space or become sluggish.
Cause: #117263 (comment)
Solution: #120779
The problem
Every night at around ~4:10 the histories for all entities stop. This has been happening since at least April 9th. I updated Home Assistant to 2024.4.1 on April 5th, but I can't say for sure if this issue started directly afterwards. A restart of Home Assistant allows recording again but does not restore the history missed since ~4:10. I suspect it has something to do with the Recorder auto purge at 4:12 because the same symptoms happen when the purge is run manually.
I don't think the manual or automatic purge is currently able to finish because the (SQLite) database seems way too large (>6GB) for my configured
purge_keep_days
of 7.If I run
recorder.purge
from the web UI the same symptoms happen like during the night. By looking at the mtime it is clearhome-assistant_v2.db
does not get written to anymore.htop
shows HA using 100% of one CPU core continously andiotop
show HA reading from disk at ~400MB/s continously. This went on for at least 25 minutes before I stopped the process.The logs show nothing unusual happening around 4:12. When I run
recorder.purge
from the web UI with verbose logging enabled the logs just show:When HA is stopped using SIGTERM the shutdown takes a long time and it is clear from the logs it is waiting for a Recorder task:
See the rest of the relevant messages during shutdown below.
What version of Home Assistant Core has the issue?
core-2024.5.2
What was the last working version of Home Assistant Core?
No response
What type of installation are you running?
Home Assistant Core
Integration causing the issue
Recorder
Link to integration documentation on our website
https://www.home-assistant.io/integrations/recorder/#service-purge
Diagnostics information
No response
Example YAML snippet
Anything in the logs that might be useful for us?
Additional information
I thought maybe my database could be corrupted, so with HA shutdown I ran
mv home-assistant_v2.db home-assistant_v2_old.db; sqlite3 home-assistant_v2_old.db ".recover" | sqlite3 home-assistant_v2.db
and then tried to run a purge again. Unfortunately the problem was not resolved. My database did shrink by about 1.5 GB.The text was updated successfully, but these errors were encountered: