Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add health check for SMT engine disk storage #141

Closed
ddaspit opened this issue Nov 22, 2023 · 7 comments · Fixed by sillsdev/serval#269 or #156
Closed

Add health check for SMT engine disk storage #141

ddaspit opened this issue Nov 22, 2023 · 7 comments · Fixed by sillsdev/serval#269 or #156
Assignees

Comments

@ddaspit
Copy link
Contributor

ddaspit commented Nov 22, 2023

There is no health check for low disk storage. If the disk storage gets too low, then SMT engines will be negatively impacted. We can use DiskStorageHealthCheck from the AspNetCore.HealthChecks.System library.

@johnml1135
Copy link
Collaborator

This is done through Alerts and Prometheus. It is currently working as of this week (the email was improperly configured). Errors come as emails and look like this:
image

This needs no further action at this time.

@ddaspit
Copy link
Contributor Author

ddaspit commented Nov 27, 2023

I would like to keep this open. It doesn't seem right to me that we return a "healthy" status from the health check endpoint when SMT suggestions do not work, because of low/no disk space.

@ddaspit ddaspit reopened this Nov 27, 2023
@johnml1135
Copy link
Collaborator

One option is to have "degraded" if we are getting GRPC errors (which we were) and keep it on for 30 minutes or so...

@ddaspit
Copy link
Contributor Author

ddaspit commented Nov 27, 2023

It would be easier to just check the disk space, especially since there is already a health check library that can do it.

@johnml1135
Copy link
Collaborator

Ok - it will be a redundant check, but I can see the value.

@Enkidu93
Copy link
Collaborator

In order to use this health check, we have to reference a specific drive by string (e.g. "C://"). How will this work in containers? Is this workable, @johnml1135?

@johnml1135
Copy link
Collaborator

I am assuming that the main paths /var/lib/serval and /var/lib/machine would need to be monitored. They are defined in appsettings.json:

  "DataFile": {
    "FilesDirectory": "/var/lib/serval/files"
  }

and

  "SmtTransferEngine": {
    "EnginesDir": "/var/lib/machine/engines"
  }

@johnml1135 johnml1135 modified the milestones: Serval API 1.1, Serval API 1.2 Jan 3, 2024
@johnml1135 johnml1135 assigned johnml1135 and unassigned Enkidu93 Jan 5, 2024
This was referenced Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

3 participants