Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add facultative forking mode or systemd-notify at mimir start #8220

Closed
raspbeguy opened this issue May 30, 2024 · 8 comments · Fixed by #8541 or #8555
Closed

Add facultative forking mode or systemd-notify at mimir start #8220

raspbeguy opened this issue May 30, 2024 · 8 comments · Fixed by #8541 or #8555
Labels
area/build enhancement New feature or request

Comments

@raspbeguy
Copy link

raspbeguy commented May 30, 2024

Is your feature request related to a problem? Please describe.

When (re)starting Mimir with systemd, the systemctl command is always successful even in case of faulty config or external problem (no space left, not enough ram, etc) because mimir service is in simple mode, which means that systemd job is only to start the process without waiting for it to fork or sending a notification to systemd that initialization went well. This is problematic because when restarting mimir on a large amount of instances with Ansible for instance, the playbook run should show an error instead showing all good.

Describe the solution you'd like

Any of those 2 solutions should work:

  • Have a Mimir setting to make it fork after initialization (config parsed, sockets opened, s3 connection opened, etc...) that would make possible to use type=forking systemd service.
  • Implement systemd-notify protocol to announce to systemd that it's ready, in that case we could use a type=notify systemd service.

Describe alternatives you've considered

Use a ExecStartPost directive in the service containing a sleep then check for the process health, or do that in the Ansible playbook instead. This workaround isn't ideal because time-based sanity checks are not the more reliable in this context in my opinion.

@aknuds1 aknuds1 added the enhancement New feature or request label May 30, 2024
@dimitarvdimitrov
Copy link
Contributor

Mimir exposes it's readiness via HTTP on GET /ready. If Mimir is ready then, it will return status code 200. Is it possible to use that somehow in ExecStartPost?

@raspbeguy
Copy link
Author

That is something to think about.

However my first thought about it is that the port used by Mimir to expose its API depends of the configuration, so the script invoked by ExecStartPost should start by reading Mimir conf to get that port. Doable but kind of overcomplicated compared to add a go equivalent of sd_notify() call in the code.

@raspbeguy
Copy link
Author

This dirty workaround seems to work (it needs curl and yq):

Write a script /opt/check_mimir_start.sh:

#!/bin/sh

MIMIR_PID=$1

until curl http://127.0.0.1:$(yq -r '.server.http_listen_port // 9009' /etc/mimir/config.yml)/ready
do
  if ! kill -0 $MIMIR_PID
  then
    exit 1
  fi
done

Then override mimir.service:

[Service]
ExecStartPost=/opt/check_mimir_start.sh $MAINPID

This is a really dirty hack, as it reminds me more about old sysvinit logic than what a clean systemd service should look about.

@dimitarvdimitrov
Copy link
Contributor

Have a Mimir setting to make it for after initialization (config parsed, sockets opened, s3 connection opened, etc...) that would make possible to use type=forking systemd service.

I didn't quite get this part. Can you elaborate?

Implement systemd-notify protocol to announce to systemd that it's ready, in that case we could use a type=notify systemd service.

how difficult is it to add support for systemd notifications? Is this something you have experience with?

also cc @wilfriedroset @bubu11e since you folks been working on systemd

@wilfriedroset
Copy link
Collaborator

This is a good point to discuss and a nice improvement.
My understanding is that systemd drop-in gives flexibility to define override the behavior of the service. So if you want a tailor-made solution this is the best thing to do.

As you pointed creating a one size fits all solution might be more difficult.

However my first thought about it is that the port used by Mimir to expose its API depends of the configuration

This is indeed true and this might make it difficult to find a suitable solution for everyone. curl might be around in most deployment but not yq. So it adds dependencies.

Implement systemd-notify protocol to announce to systemd that it's ready, in that case we could use a type=notify systemd service.

I'm not that familiar with systemd-notify so I can't tell how difficult it is to implement.
@raspbeguy do you have a rought idea regarding the difficulty or what should be done either in the definition of the service or mimir itself?

on our end we are quite happy with the simple service as it allow us to tailor it to our need and orchestrate the rollout of mimir with ansible. Basically, systemd does only the stop/start of the process, ansible does the rest.
This might be extra work but we manage to have a laser cut management of mimir.

@raspbeguy
Copy link
Author

@dimitarvdimitrov I meant "make it fork", sorry for the typo. I edited the original post.

@wilfriedroset As I understand it, systemd-notify is really minimalist, it's all about writing once in a unix socket. Here is the systemd sd_notify() function documentation : https://www.freedesktop.org/software/systemd/man/latest/sd_notify.html

On the service file, I guess all it would take is replacing Type=simple with Type=notify.

@dimitarvdimitrov
Copy link
Contributor

doing the notification seems trivial. There is also this simple library that can do it https://github.com/okzk/sdnotify

I think it makes sense to support sd_notify in Mimir. This is the place which handles the HTTP /ready endpoint. Since sd_notify is event-based, we can poll the state of services every 5 seconds and report when they are ready. From the looks of it, we can always attempt this notification without having to add a configuration option to explicitly enable it.

I'd defer changing from simple to notify to @wilfriedroset

@wilfriedroset
Copy link
Collaborator

I will see with my team how we can contribute on this.
cc @pbailhache

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build enhancement New feature or request
Projects
None yet
4 participants