make lf-app container shutdown properly #1421

megahirt · 2022-06-06T19:14:26Z

Change the docker-compose file to use a separate shell script that properly handles SIGTERM. This results in a clean and fast shutdown of the lf-app container.

I also looked into making the lf-next-app container shutdown properly and cleanly, however it looks like the way that SvelteKit is being run (with node-adapter?) doesn't contain code to specifically handle SIGTERM which is required for a docker clean shutdown.

References:

Other examples of running SvelteKit inside Docker have it running nginx infront of SveltKit (nginx provides proper SIGTERM handling). I'm not exactly sure the best way to make our next-app implementation properly handle SIGTERM. I really like the suggestions in the above referenced article. I wondered if wrapping the SvelteKit node start command in a shell script that handles SIGTERM would be an easy solution. Thoughts @longrunningprocess ?

Fixes # (issue)

Type of Change

chore

Screenshots

Testing on your branch

Please describe how to test and/or verify your changes. Provide instructions so we can reproduce. Please also provide relevant test data as necessary. These instructions will be used for QA testing below.

Test A
Test B

Checklist

I have performed a self-review of my own code
I have reviewed the title/description of this PR which will be used as the squashed PR commit message
I have commented my code, particularly in hard-to-understand areas
I have added tests that prove my fix is effective or that my feature works

qa.languageforge.org testing

Reviewers: add/replace your name below and check the box to sign-off/attest the feature works as expected on qa.languageforge.org

Reviewer1 (YYYY-MM-DD HH:MM)
Reviewer2 (YYYY-MM-DD HH:MM)

Change the docker-compose file to use a separate shell script that properly handles SIGTERM. This results in a clean and fast shutdown of the lf-app container.

github-actions · 2022-06-06T19:16:35Z

Unit Test Results

373 tests 373 ✔️ 12s ⏱️
    1 suites     0 💤
    1 files     0 ❌

Results for commit 559339e.

♻️ This comment has been updated with latest results.

rmunn · 2022-06-10T06:13:57Z

Tried using this approach in the LfMerge container, and ran into the problem that lfmerge spends almost all of its time in inotifywait, which doesn't respond to SIGTERM. So trapping the signal in the Bash script won't work for lfmerge. The solution for lfmerge is probably going to be to use dumb-init or tini, and that's probably the solution for the next-app container as well.

LfMerge version 2.0.123 has an update to use tini as its PID 1 process, which will allow it to respond to SIGTERM and shut down fast as well. Might as well include it in this PR.

rmunn · 2022-06-10T08:52:55Z

I've created LfMerge version 2.0.123 with tini as the PID 1 process, and verified that it does indeed shut down quickly when doing docker container stop lfmerge. Running docker-compose down appears to take a little longer, apparently because docker-compose waits to shut down some containers until other containers that they depend on have shut down. But the lfmerge container is, at least, no longer one of the bottlenecks slowing down docker-compose down.

So I've included a commit in this PR that updates the lfmerge container to 2.0.123.

Needed so init process will run.

rmunn

Unfortunately, this doesn't work. Shell scripts are pretty simple-minded and do not pass on signals to their child processes; instead, when the shell script receives SIGTERM it waits for the current command to finish before processing the signal and exiting. But since the current command is apache2-foreground, that doesn't ever exit, so the shell script never gets to handle SIGTERM. And therefore apache2-foreground never receives SIGTERM either.

See https://github.com/sillsdev/LfMerge/pull/275/files for how we'll need to handle this: install tini in the Dockerfile, and set it as the entrypoint. Also, the -g option to tini is probably required, as that sends the received signal to all the process group, which means that not only will the run-with-wait.sh script receive SIGTERM, but so will any child processes it launched, like apache2-foreground.

rmunn · 2022-06-10T09:17:13Z

BTW, I verified that it doesn't work by running time docker container stop lf-app and seeing that it took 10 seconds, whereas the lfmerge container using tini took less than 1 second to run time docker container stop lfmerge.

Note the `-g` option, necessary so tini will forward signals to entire process *groups*. Without the `-g`, apache2-foreground never receives the SIGTERM signal and doesn't know it should exit.

Now we have tini as PID 1, which works.

rmunn · 2022-06-10T09:36:35Z

With my latest commit, this PR works. But now that it's mostly code that I've written, I think someone else needs to review it.

megahirt · 2022-06-11T14:05:23Z

@rmunn hmm, this is frustrating since "it works for me" TM.

Here's a screenshot of my original commit working. I deleted the lf-app and rebuilt it on this commit (not shown).

It takes 0.0 seconds to shutdown. The two that take 10 seconds are lfmerge and lf-next-app. I wonder why it's not working for you? Did you make clean and rebuild the image?

I guess it's disappointing that we are getting different results - but I don't want to get hung up on something small like this.

I would like to have a standard way make all our docker containers behave on shutdown. I think tini is a legit solution to this problem.

@rmunn wrote:

Unfortunately, this doesn't work. Shell scripts are pretty simple-minded and do not pass on signals to their child processes; instead, when the shell script receives SIGTERM it waits for the current command to finish before processing the signal and exiting. But since the current command is apache2-foreground, that doesn't ever exit, so the shell script never gets to handle SIGTERM. And therefore apache2-foreground never receives SIGTERM either.

Not sure I agree with this conclusion. My shell script explicitly traps SIGTERM and exits immediately. It doesn't wait for apache-foreground. If I didn't have the trap statement, then yes, it would wait indefinitely which is why docker kills it after 10 seconds.

docker/app/Dockerfile

rmunn · 2022-06-12T01:13:09Z

@rmunn wrote:

Unfortunately, this doesn't work. Shell scripts are pretty simple-minded and do not pass on signals to their child processes; instead, when the shell script receives SIGTERM it waits for the current command to finish before processing the signal and exiting. But since the current command is apache2-foreground, that doesn't ever exit, so the shell script never gets to handle SIGTERM. And therefore apache2-foreground never receives SIGTERM either.

Not sure I agree with this conclusion. My shell script explicitly traps SIGTERM and exits immediately. It doesn't wait for apache-foreground. If I didn't have the trap statement, then yes, it would wait indefinitely which is why docker kills it after 10 seconds.

https://stackoverflow.com/questions/26858344/trap-not-working-in-shell-script shows a minimal reproduction script that can be used to test it, and also quotes the Bash manual that says, "If bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will not be executed until the command completes."

I suspect the reason you saw it working is because you ran dc down very soon after running dc up, so that the Bash script was still in the /wait step when dc down sent it SIGTERM. If you try that again, but this time wait 30-40 seconds before running dc down, I believe you'll see it take 10 seconds to kill the lf-app process as I did. (To be certain that it's in the state that will take 10 seconds, you can run docker exec lf-app ps aux. If you see several apache2-foreground processes, then it's ready to run dc down and observe that it takes 10 seconds.)

Anyway, I'll submit a new commit or two that moves the installation of tini into the base-php image.

rmunn · 2022-06-13T09:44:32Z

Have now moved tini into the base-php image. I'll wait to actually run the build-and-publish-base-php.yml workflow until this PR is approved, in case we want to make more changes. Once this PR is approved, I'll run that workflow so we have a new base PHP image to work from, and then I'll merge this into develop. (Until the base-php workflow runs, merging this into develop will fail becuase it will be building from a base image that doesn't have tini in it.)

megahirt

So I approve of these changes however GH won't let me approve my own PR. Go ahead and merge Robin.

make lf-app container shutdown properly

5d7fe5e

Change the docker-compose file to use a separate shell script that properly handles SIGTERM. This results in a clean and fast shutdown of the lf-app container.

megahirt self-assigned this Jun 7, 2022

Update LfMerge container to shut down fast as well

229c99a

LfMerge version 2.0.123 has an update to use tini as its PID 1 process, which will allow it to respond to SIGTERM and shut down fast as well. Might as well include it in this PR.

Set executable bit on run-with-wait.sh script

1962b05

Needed so init process will run.

rmunn previously requested changes Jun 10, 2022

View reviewed changes

Use tini as PID 1 to handle SIGTERM

fd035e4

Note the `-g` option, necessary so tini will forward signals to entire process *groups*. Without the `-g`, apache2-foreground never receives the SIGTERM signal and doesn't know it should exit.

rmunn requested a review from longrunningprocess June 10, 2022 09:36

megahirt commented Jun 11, 2022

View reviewed changes

docker/app/Dockerfile Outdated Show resolved Hide resolved

Move tini into base-php image

559339e

megahirt commented Jun 13, 2022

View reviewed changes

rmunn approved these changes Jun 14, 2022

View reviewed changes

rmunn merged commit 222020d into develop Jun 15, 2022

rmunn deleted the chore/dockerProperShutdown branch June 15, 2022 03:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make lf-app container shutdown properly #1421

make lf-app container shutdown properly #1421

megahirt commented Jun 6, 2022

github-actions bot commented Jun 6, 2022 •

edited

Loading

rmunn commented Jun 10, 2022

rmunn commented Jun 10, 2022

rmunn left a comment

rmunn commented Jun 10, 2022

rmunn commented Jun 10, 2022

megahirt commented Jun 11, 2022

rmunn commented Jun 12, 2022

rmunn commented Jun 13, 2022

megahirt left a comment

make lf-app container shutdown properly #1421

make lf-app container shutdown properly #1421

Conversation

megahirt commented Jun 6, 2022

Type of Change

Screenshots

Testing on your branch

Checklist

qa.languageforge.org testing

github-actions bot commented Jun 6, 2022 • edited Loading

Unit Test Results

rmunn commented Jun 10, 2022

rmunn commented Jun 10, 2022

rmunn left a comment

Choose a reason for hiding this comment

rmunn commented Jun 10, 2022

rmunn commented Jun 10, 2022

megahirt commented Jun 11, 2022

rmunn commented Jun 12, 2022

rmunn commented Jun 13, 2022

megahirt left a comment

Choose a reason for hiding this comment

github-actions bot commented Jun 6, 2022 •

edited

Loading