Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ApiServerCommunicationError #220

Closed
mariusberget92 opened this issue Mar 29, 2022 · 15 comments · Fixed by #352
Closed

[BUG] ApiServerCommunicationError #220

mariusberget92 opened this issue Mar 29, 2022 · 15 comments · Fixed by #352
Labels
bug Something isn't working waiting for response

Comments

@mariusberget92
Copy link

Describe the bug
Followed the manual installation documentation and did everything as per the docs. I do see the web app from the browser, so that works fine and is setup to run at start as a service.

When trying to run the collector though; I get this error:

INFO[0000] Verifying required tools                      type=metrics
INFO[0000] Executing command: smartctl --scan -j         type=metrics
INFO[0000] Executing command: smartctl --info -j /dev/sdh  type=metrics
INFO[0000] Generating WWN                                type=metrics
INFO[0000] Executing command: smartctl --info -j /dev/sda  type=metrics
INFO[0000] Generating WWN                                type=metrics
INFO[0000] Executing command: smartctl --info -j /dev/sdb  type=metrics
INFO[0000] Generating WWN                                type=metrics
INFO[0000] Executing command: smartctl --info -j /dev/sdc  type=metrics
INFO[0000] Generating WWN                                type=metrics
INFO[0000] Executing command: smartctl --info -j /dev/sdd  type=metrics
INFO[0000] Generating WWN                                type=metrics
INFO[0000] Executing command: smartctl --info -j /dev/sde  type=metrics
INFO[0000] Generating WWN                                type=metrics
INFO[0000] Executing command: smartctl --info -j /dev/sdf  type=metrics
INFO[0000] Generating WWN                                type=metrics
INFO[0000] Executing command: smartctl --info -j /dev/sdg  type=metrics
INFO[0000] Generating WWN                                type=metrics
INFO[0000] Sending detected devices to API, for filtering & validation  type=metrics
ERRO[0000] An error occurred while retrieving filtered devices  type=metrics
2022/03/29 19:34:48 ERROR: ApiServerCommunicationError: "An error occurred while retrieving filtered devices"

I double checked that the port was open (6363 as I configured it to use). I can do an empty POST request using curl and that gives me back the website source code as a response. Don't know if I did something wrong or something. Thanks in advance.

Log Files
Debug file attached.
collector.log

@mariusberget92 mariusberget92 added the bug Something isn't working label Mar 29, 2022
@SolomonSklash
Copy link

I'm having this exact issue a well.

@mariusberget92
Copy link
Author

I'm having this exact issue a well.

Yea, I even tried the default port and it didn't work either.

@AnalogJ
Copy link
Owner

AnalogJ commented Apr 28, 2022

Looks like I may need to add some more debug level logging to the collector when it pushes data to the api server. The logs don't really specify if the error was due to communication issue or a failed response from the api server.

Do you have any logs from the api server? can you determine if the api server received a post request from the collector around the same time?

@mariusberget92
Copy link
Author

Looks like I may need to add some more debug level logging to the collector when it pushes data to the api server. The logs don't really specify if the error was due to communication issue or a failed response from the api server.

Do you have any logs from the api server? can you determine if the api server received a post request from the collector around the same time?

I have no idea. I could test that if I got some guidance. 😂 But I know the website runs on the specified port since I can access it. It's just empty without any devices since there is an API communication error.

AnalogJ added a commit that referenced this issue May 2, 2022
@sillmnvg
Copy link

Got the same issue when I tried to spin it up on my server (also used my own port instead of the default, switched back to 8080 in an attempt to troubleshoot)

@AnalogJ
Copy link
Owner

AnalogJ commented May 31, 2022

can you paste your config file/environmental variables/docker-compose file? @sillmnvg @mariusberget92

@mariusberget92
Copy link
Author

scrutiny:
    image: ghcr.io/analogj/scrutiny:master-omnibus
    container_name: scrutiny
    cap_add:
      - SYS_RAWIO
    volumes:
      - /run/udev:/run/udev:ro
      - /home/anoneemo/docker/scrutiny/config:/opt/scrutiny/config
      - /home/anoneemo/docker/scrutiny/influxdb:/opt/scrutiny/influxdb
    ports:
      - '8080:8080'
      - '8086:8086'
    devices:
      - '/dev/sda'
      - '/dev/sdb'
      - '/dev/sdc'
      - '/dev/sdd'
      - '/dev/sde'
      - '/dev/sdf'
      - '/dev/sdg'
      - '/dev/sdh'
    restart: always

@AnalogJ
Copy link
Owner

AnalogJ commented May 31, 2022

do you have a config file in /home/anoneemo/docker/scrutiny/config ?

@mariusberget92
Copy link
Author

do you have a config file in /home/anoneemo/docker/scrutiny/config ?

No config. only a database file.

@AnalogJ
Copy link
Owner

AnalogJ commented May 31, 2022

Can you enable DEBUG mode and log files using one of the two methods below

docker run command:


-e DEBUG=true \
-e COLLECTOR_LOG_FILE= /opt/scrutiny/config/collector.log \
-e SCRUTINY_LOG_FILE=/opt/scrutiny/config/web.log \

docker-compose:

environment:
    DEBUG: true
    COLLECTOR_LOG_FILE: /opt/scrutiny/config/collector.log
    SCRUTINY_LOG_FILE: /opt/scrutiny/config/web.log

the log files will be written to the mount path of your config directory on your host. Just attach them to this issue.

@mariusberget92
Copy link
Author

Oh snap, sorry, been on autopilot when replying. I switched to docker-compose, and that works (the master-omnibus tag atleast).

Can you enable DEBUG mode and log files using one of the two methods below

@sillmnvg will need to help out there since it works on my machine. 😆

@AnalogJ
Copy link
Owner

AnalogJ commented May 31, 2022

hm. ok in that case I'll close this issue.

@sillmnvg if you're still running into this problem, please comment/reopen this issue :)

@AnalogJ AnalogJ closed this as completed May 31, 2022
@KF5JWC
Copy link
Contributor

KF5JWC commented Aug 2, 2022

I've just run into this!

Restarting scrutiny (the [scrutiny+influxdb] pod in my case) fixed this issue


For posterity I am leaving my logs.

TL;DR: database is locked (5) (SQLITE_BUSY)

Client/Collector:

time="2022-08-01T18:55:02-05:00" level=info msg="Sending detected devices to API, for filtering & validation" type=metrics
time="2022-08-01T18:55:07-05:00" level=error msg="An error occurred while retrieving filtered devices" type=metrics
2022/08/01 18:55:07 ERROR: ApiServerCommunicationError: "An error occurred while retrieving filtered devices"

Server:

2022/08/01 18:55:07 /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/database/scrutiny_repository_device.go:23 database is locked (5) (SQLITE_BUSY)
[1.007ms] [rows:0] INSERT INTO `devices` (`created_at`,`updated_at`,`deleted_at`,`wwn`,`device_name`,`device_uuid`,`device_serial_id`,`device_label`,`manufacturer`,`model_name`,`interface_type`,`interface_speed`,`serial_number`,`firmware`,`rotation_speed`,`capacity`,`form_factor`,`smart_support`,`device_protocol`,`device_type`,`label`,`host_id`,`device_status`) VALUES ("2022-08-01 18:55:07.13","2022-08-01 18:55:07.13",NULL,"","nvme1n1","","","","","","","","","",0,,"",false,"NVMe","nvme","","synology.local","0") ON CONFLICT (`wwn`) DO UPDATE SET `host_id`=`excluded`.`host_id`,`device_name`=`excluded`.`device_name`,`device_type`=`excluded`.`device_type`,`device_uuid`=`excluded`.`device_uuid`,`device_serial_id`=`excluded`.`device_serial_id`,`device_label`=`excluded`.`device_label`

2022/08/01 18:55:07 /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/database/scrutiny_repository_device.go:23 database is locked (5) (SQLITE_BUSY)
[0.509ms] [rows:0] INSERT INTO `devices` (`created_at`,`updated_at`,`deleted_at`,`wwn`,`device_name`,`device_uuid`,`device_serial_id`,`device_label`,`manufacturer`,`model_name`,`interface_type`,`interface_speed`,`serial_number`,`firmware`,`rotation_speed`,`capacity`,`form_factor`,`smart_support`,`device_protocol`,`device_type`,`label`,`host_id`,`device_status`) VALUES ("2022-08-01 18:55:07.131","2022-08-01 18:55:07.131",NULL,"","nvme0","","","","","","","","","",0,,"",false,"NVMe","nvme","","synology.local","0") ON CONFLICT (`wwn`) DO UPDATE SET `host_id`=`excluded`.`host_id`,`device_name`=`excluded`.`device_name`,`device_type`=`excluded`.`device_type`,`device_uuid`=`excluded`.`device_uuid`,`device_serial_id`=`excluded`.`device_serial_id`,`device_label`=`excluded`.`device_label`

2022/08/01 18:55:07 /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/database/scrutiny_repository_device.go:23 database is locked (5) (SQLITE_BUSY)
[0.174ms] [rows:0] INSERT INTO `devices` (`created_at`,`updated_at`,`deleted_at`,`wwn`,`device_name`,`device_uuid`,`device_serial_id`,`device_label`,`manufacturer`,`model_name`,`interface_type`,`interface_speed`,`serial_number`,`firmware`,`rotation_speed`,`capacity`,`form_factor`,`smart_support`,`device_protocol`,`device_type`,`label`,`host_id`,`device_status`) VALUES ("2022-08-01 18:55:07.132","2022-08-01 18:55:07.132",NULL,"","nvme1","","","","","","","","","",0,,"",false,"NVMe","nvme","","synology.local","0") ON CONFLICT (`wwn`) DO UPDATE SET `host_id`=`excluded`.`host_id`,`device_name`=`excluded`.`device_name`,`device_type`=`excluded`.`device_type`,`device_uuid`=`excluded`.`device_uuid`,`device_serial_id`=`excluded`.`device_serial_id`,`device_label`=`excluded`.`device_label`

2022/08/01 18:55:07 /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/database/scrutiny_repository_device.go:23 database is locked (5) (SQLITE_BUSY)
[0.225ms] [rows:0] INSERT INTO `devices` (`created_at`,`updated_at`,`deleted_at`,`wwn`,`device_name`,`device_uuid`,`device_serial_id`,`device_label`,`manufacturer`,`model_name`,`interface_type`,`interface_speed`,`serial_number`,`firmware`,`rotation_speed`,`capacity`,`form_factor`,`smart_support`,`device_protocol`,`device_type`,`label`,`host_id`,`device_status`) VALUES ("2022-08-01 18:55:07.132","2022-08-01 18:55:07.132",NULL,"","sata1","","","","","","","6.0 Gb/s","","",7200,,"3.5 inches",false,"ATA","sat","","synology.local","0") ON CONFLICT (`wwn`) DO UPDATE SET `host_id`=`excluded`.`host_id`,`device_name`=`excluded`.`device_name`,`device_type`=`excluded`.`device_type`,`device_uuid`=`excluded`.`device_uuid`,`device_serial_id`=`excluded`.`device_serial_id`,`device_label`=`excluded`.`device_label`

2022/08/01 18:55:07 /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/database/scrutiny_repository_device.go:23 database is locked (5) (SQLITE_BUSY)
[0.213ms] [rows:0] INSERT INTO `devices` (`created_at`,`updated_at`,`deleted_at`,`wwn`,`device_name`,`device_uuid`,`device_serial_id`,`device_label`,`manufacturer`,`model_name`,`interface_type`,`interface_speed`,`serial_number`,`firmware`,`rotation_speed`,`capacity`,`form_factor`,`smart_support`,`device_protocol`,`device_type`,`label`,`host_id`,`device_status`) VALUES ("2022-08-01 18:55:07.133","2022-08-01 18:55:07.133",NULL,"","sata2","","","","","","","6.0 Gb/s","","",7200,,"3.5 inches",false,"ATA","sat","","synology.local","0") ON CONFLICT (`wwn`) DO UPDATE SET `host_id`=`excluded`.`host_id`,`device_name`=`excluded`.`device_name`,`device_type`=`excluded`.`device_type`,`device_uuid`=`excluded`.`device_uuid`,`device_serial_id`=`excluded`.`device_serial_id`,`device_label`=`excluded`.`device_label`

2022/08/01 18:55:07 /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/database/scrutiny_repository_device.go:23 database is locked (5) (SQLITE_BUSY)
[0.189ms] [rows:0] INSERT INTO `devices` (`created_at`,`updated_at`,`deleted_at`,`wwn`,`device_name`,`device_uuid`,`device_serial_id`,`device_label`,`manufacturer`,`model_name`,`interface_type`,`interface_speed`,`serial_number`,`firmware`,`rotation_speed`,`capacity`,`form_factor`,`smart_support`,`device_protocol`,`device_type`,`label`,`host_id`,`device_status`) VALUES ("2022-08-01 18:55:07.133","2022-08-01 18:55:07.133",NULL,"","sata3","","","","","","","6.0 Gb/s","","",5425,,"3.5 inches",false,"ATA","sat","","synology.local","0") ON CONFLICT (`wwn`) DO UPDATE SET `host_id`=`excluded`.`host_id`,`device_name`=`excluded`.`device_name`,`device_type`=`excluded`.`device_type`,`device_uuid`=`excluded`.`device_uuid`,`device_serial_id`=`excluded`.`device_serial_id`,`device_label`=`excluded`.`device_label`

2022/08/01 18:55:07 /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/database/scrutiny_repository_device.go:23 database is locked (5) (SQLITE_BUSY)
[0.219ms] [rows:0] INSERT INTO `devices` (`created_at`,`updated_at`,`deleted_at`,`wwn`,`device_name`,`device_uuid`,`device_serial_id`,`device_label`,`manufacturer`,`model_name`,`interface_type`,`interface_speed`,`serial_number`,`firmware`,`rotation_speed`,`capacity`,`form_factor`,`smart_support`,`device_protocol`,`device_type`,`label`,`host_id`,`device_status`) VALUES ("2022-08-01 18:55:07.134","2022-08-01 18:55:07.134",NULL,"","sata4","","","","","","","6.0 Gb/s","","",5425,,"3.5 inches",false,"ATA","sat","","synology.local","0") ON CONFLICT (`wwn`) DO UPDATE SET `host_id`=`excluded`.`host_id`,`device_name`=`excluded`.`device_name`,`device_type`=`excluded`.`device_type`,`device_uuid`=`excluded`.`device_uuid`,`device_serial_id`=`excluded`.`device_serial_id`,`device_label`=`excluded`.`device_label`

2022/08/01 18:55:07 /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/database/scrutiny_repository_device.go:23 database is locked (5) (SQLITE_BUSY)
[0.215ms] [rows:0] INSERT INTO `devices` (`created_at`,`updated_at`,`deleted_at`,`wwn`,`device_name`,`device_uuid`,`device_serial_id`,`device_label`,`manufacturer`,`model_name`,`interface_type`,`interface_speed`,`serial_number`,`firmware`,`rotation_speed`,`capacity`,`form_factor`,`smart_support`,`device_protocol`,`device_type`,`label`,`host_id`,`device_status`) VALUES ("2022-08-01 18:55:07.134","2022-08-01 18:55:07.134",NULL,"","nvme0n1","","","","","","","","","",0,,"",false,"NVMe","nvme","","synology.local","0") ON CONFLICT (`wwn`) DO UPDATE SET `host_id`=`excluded`.`host_id`,`device_name`=`excluded`.`device_name`,`device_type`=`excluded`.`device_type`,`device_uuid`=`excluded`.`device_uuid`,`device_serial_id`=`excluded`.`device_serial_id`,`device_label`=`excluded`.`device_label`
time="2022-08-01T18:55:07-05:00" level=error msg="An error occurred while registering devices [database is locked (5) (SQLITE_BUSY) database is locked (5) (SQLITE_BUSY) database is locked (5) (SQLITE_BUSY) database is locked (5) (SQLITE_BUSY) database is locked (5) (SQLITE_BUSY) database is locked (5) (SQLITE_BUSY) database is locked (5) (SQLITE_BUSY) database is locked (5) (SQLITE_BUSY)]"
time="2022-08-01T18:55:07-05:00" level=error msg="10.2.2.1 - scrutiny-5894d5497b-xpj94 [01/Aug/2022:18:55:07 -0500] \"POST /api/devices/register\" 500 17 \"\" \"Go-http-client/2.0\" (6ms)" clientIP=10.2.2.1 hostname=scrutiny-5894d5497b-xpj94 latency=6 method=POST path=/api/devices/register referer= respLength=17 statusCode=500 userAgent=Go-http-client/2.0

@KF5JWC
Copy link
Contributor

KF5JWC commented Aug 2, 2022

I can sorta reproduce this. It waffled between 2 database errors (lock, nested transaction).

I have 3 clients/collectors. They all share a NTP server, and they all run at the top of the minute (for now).

It looks like clients are not "safely" served concurrently:

SQL logic error: cannot start a transaction within a transaction (1)

Including this because it may be relevant for the environment I am running in:

2022/08/01 18:19:27 /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/database/scrutiny_repository_device.go:33 SLOW SQL >= 200ms
[1300.125ms] [rows:11] SELECT * FROM `devices`

@KF5JWC
Copy link
Contributor

KF5JWC commented Aug 2, 2022

Spacing each of them out by 2 minutes works fine.

time="2022-08-01T22:00:01-05:00" level=info msg="10.2.2.1 - scrutiny-5894d5497b-mz2ml [01/Aug/2022:22:00:01 -0500] \"POST /api/devices/register\" 200 1655 \"\" \"Go-http-client/2.0\" (16ms)" clientIP=10.2.2.1 hostname=scrutiny-5894d5497b-mz2ml latency=16 method=POST ...
time="2022-08-01T22:00:02-05:00" level=info msg="1.2.3.4 - scrutiny-5894d5497b-mz2ml [01/Aug/2022:22:00:02 -0500] \"POST /apidevice/abcdef/smart\" 200 16 \"\" \"Go-http-client/2.0\" (148ms)" clientIP=1.2.3.4 hostname=scrutiny-5894d5497b-mz2ml latency=148 method=POST ...
time="2022-08-01T22:00:02-05:00" level=info msg="10.2.2.1 - scrutiny-5894d5497b-mz2ml [01/Aug/2022:22:00:02 -0500] \"POST /apidevice/abcdef/smart\" 200 16 \"\" \"Go-http-client/2.0\" (98ms)" clientIP=10.2.2.1 hostname=scrutiny-5894d5497b-mz2ml latency=98 method=POST ...
time="2022-08-01T22:00:02-05:00" level=info msg="1.2.3.4 - scrutiny-5894d5497b-mz2ml [01/Aug/2022:22:00:02 -0500] \"POST /apidevice/abcdef/smart\" 200 16 \"\" \"Go-http-client/2.0\" (87ms)" clientIP=1.2.3.4 hostname=scrutiny-5894d5497b-mz2ml latency=87 method=POST ...
time="2022-08-01T22:02:01-05:00" level=info msg="10.2.2.1 - scrutiny-5894d5497b-mz2ml [01/Aug/2022:22:02:01 -0500] \"POST /api/devices/register\" 200 1141 \"\" \"Go-http-client/2.0\" (3ms)" clientIP=10.2.2.1 hostname=scrutiny-5894d5497b-mz2ml latency=3 method=POST ...
time="2022-08-01T22:02:04-05:00" level=info msg="1.2.3.4 - scrutiny-5894d5497b-mz2ml [01/Aug/2022:22:02:04 -0500] \"POST /apidevice/abcdef/smart\" 200 16 \"\" \"Go-http-client/2.0\" (1927ms)" clientIP=1.2.3.4 hostname=scrutiny-5894d5497b-mz2ml latency=1927 method=POST ...
time="2022-08-01T22:02:06-05:00" level=info msg="10.2.2.1 - scrutiny-5894d5497b-mz2ml [01/Aug/2022:22:02:06 -0500] \"POST /apidevice/abcdef/smart\" 200 16 \"\" \"Go-http-client/2.0\" (1743ms)" clientIP=10.2.2.1 hostname=scrutiny-5894d5497b-mz2ml latency=1743 method=POST ...
time="2022-08-01T22:04:02-05:00" level=info msg="1.2.3.4 - scrutiny-5894d5497b-mz2ml [01/Aug/2022:22:04:02 -0500] \"POST /api/devices/register\" 200 4384 \"\" \"Go-http-client/2.0\" (141ms)" clientIP=1.2.3.4 hostname=scrutiny-5894d5497b-mz2ml latency=141 method=POST ...
time="2022-08-01T22:04:04-05:00" level=info msg="1.2.3.4 - scrutiny-5894d5497b-mz2ml [01/Aug/2022:22:04:04 -0500] \"POST /apidevice/abcdef/smart\" 200 16 \"\" \"Go-http-client/2.0\" (1634ms)" clientIP=1.2.3.4 hostname=scrutiny-5894d5497b-mz2ml latency=1634 method=POST ...
time="2022-08-01T22:04:06-05:00" level=info msg="1.2.3.4 - scrutiny-5894d5497b-mz2ml [01/Aug/2022:22:04:06 -0500] \"POST /apidevice/abcdef/smart\" 200 16 \"\" \"Go-http-client/2.0\" (1904ms)" clientIP=1.2.3.4 hostname=scrutiny-5894d5497b-mz2ml latency=1904 method=POST ...
time="2022-08-01T22:04:06-05:00" level=info msg="1.2.3.4 - scrutiny-5894d5497b-mz2ml [01/Aug/2022:22:04:06 -0500] \"POST /apidevice/abcdef/smart\" 200 16 \"\" \"Go-http-client/2.0\" (56ms)" clientIP=1.2.3.4 hostname=scrutiny-5894d5497b-mz2ml latency=56 method=POST ...
time="2022-08-01T22:04:07-05:00" level=info msg="1.2.3.4 - scrutiny-5894d5497b-mz2ml [01/Aug/2022:22:04:07 -0500] \"POST /apidevice/abcdef/smart\" 200 16 \"\" \"Go-http-client/2.0\" (203ms)" clientIP=1.2.3.4 hostname=scrutiny-5894d5497b-mz2ml latency=203 method=POST ...
time="2022-08-01T22:04:07-05:00" level=info msg="1.2.3.4 - scrutiny-5894d5497b-mz2ml [01/Aug/2022:22:04:07 -0500] \"POST /apidevice/abcdef/smart\" 200 16 \"\" \"Go-http-client/2.0\" (108ms)" clientIP=1.2.3.4 hostname=scrutiny-5894d5497b-mz2ml latency=108 method=POST ...
time="2022-08-01T22:04:07-05:00" level=info msg="1.2.3.4 - scrutiny-5894d5497b-mz2ml [01/Aug/2022:22:04:07 -0500] \"POST /apidevice/abcdef/smart\" 200 16 \"\" \"Go-http-client/2.0\" (69ms)" clientIP=1.2.3.4 hostname=scrutiny-5894d5497b-mz2ml latency=69 method=POST ...
time="2022-08-01T22:04:09-05:00" level=info msg="1.2.3.4 - scrutiny-5894d5497b-mz2ml [01/Aug/2022:22:04:09 -0500] \"POST /apidevice/abcdef/smart\" 200 16 \"\" \"Go-http-client/2.0\" (1558ms)" clientIP=1.2.3.4 hostname=scrutiny-5894d5497b-mz2ml latency=1558 method=POST ...
time="2022-08-01T22:04:11-05:00" level=info msg="1.2.3.4 - scrutiny-5894d5497b-mz2ml [01/Aug/2022:22:04:11 -0500] \"POST /apidevice/abcdef/smart\" 200 16 \"\" \"Go-http-client/2.0\" (1596ms)" clientIP=1.2.3.4 hostname=scrutiny-5894d5497b-mz2ml latency=1596 method=POST ...

I think it might be notable that there are multiple requests which have a latency >1500ms, which may be leading to the SQLITE_BUSY error/timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working waiting for response
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants