Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make lf-app container shutdown properly #1421

Merged
merged 5 commits into from
Jun 15, 2022
Merged

Conversation

megahirt
Copy link
Collaborator

@megahirt megahirt commented Jun 6, 2022

Change the docker-compose file to use a separate shell script that properly handles SIGTERM. This results in a clean and fast shutdown of the lf-app container.

I also looked into making the lf-next-app container shutdown properly and cleanly, however it looks like the way that SvelteKit is being run (with node-adapter?) doesn't contain code to specifically handle SIGTERM which is required for a docker clean shutdown.

References:

Other examples of running SvelteKit inside Docker have it running nginx infront of SveltKit (nginx provides proper SIGTERM handling). I'm not exactly sure the best way to make our next-app implementation properly handle SIGTERM. I really like the suggestions in the above referenced article. I wondered if wrapping the SvelteKit node start command in a shell script that handles SIGTERM would be an easy solution. Thoughts @longrunningprocess ?

Fixes # (issue)

Type of Change

  • chore

Screenshots

Testing on your branch

Please describe how to test and/or verify your changes. Provide instructions so we can reproduce. Please also provide relevant test data as necessary. These instructions will be used for QA testing below.

  • Test A
  • Test B

Checklist

  • I have performed a self-review of my own code
  • I have reviewed the title/description of this PR which will be used as the squashed PR commit message
  • I have commented my code, particularly in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works

qa.languageforge.org testing

Reviewers: add/replace your name below and check the box to sign-off/attest the feature works as expected on qa.languageforge.org

  • Reviewer1 (YYYY-MM-DD HH:MM)
  • Reviewer2 (YYYY-MM-DD HH:MM)

Change the docker-compose file to use a separate shell script that properly handles SIGTERM.  This results in a clean and fast shutdown of the lf-app container.
@github-actions
Copy link

github-actions bot commented Jun 6, 2022

Unit Test Results

373 tests   373 ✔️  12s ⏱️
    1 suites      0 💤
    1 files        0

Results for commit 559339e.

♻️ This comment has been updated with latest results.

@megahirt megahirt self-assigned this Jun 7, 2022
@rmunn
Copy link
Collaborator

rmunn commented Jun 10, 2022

Tried using this approach in the LfMerge container, and ran into the problem that lfmerge spends almost all of its time in inotifywait, which doesn't respond to SIGTERM. So trapping the signal in the Bash script won't work for lfmerge. The solution for lfmerge is probably going to be to use dumb-init or tini, and that's probably the solution for the next-app container as well.

LfMerge version 2.0.123 has an update to use tini as its PID 1 process,
which will allow it to respond to SIGTERM and shut down fast as well.
Might as well include it in this PR.
@rmunn
Copy link
Collaborator

rmunn commented Jun 10, 2022

I've created LfMerge version 2.0.123 with tini as the PID 1 process, and verified that it does indeed shut down quickly when doing docker container stop lfmerge. Running docker-compose down appears to take a little longer, apparently because docker-compose waits to shut down some containers until other containers that they depend on have shut down. But the lfmerge container is, at least, no longer one of the bottlenecks slowing down docker-compose down.

So I've included a commit in this PR that updates the lfmerge container to 2.0.123.

Needed so init process will run.
rmunn
rmunn previously requested changes Jun 10, 2022
Copy link
Collaborator

@rmunn rmunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this doesn't work. Shell scripts are pretty simple-minded and do not pass on signals to their child processes; instead, when the shell script receives SIGTERM it waits for the current command to finish before processing the signal and exiting. But since the current command is apache2-foreground, that doesn't ever exit, so the shell script never gets to handle SIGTERM. And therefore apache2-foreground never receives SIGTERM either.

See https://github.com/sillsdev/LfMerge/pull/275/files for how we'll need to handle this: install tini in the Dockerfile, and set it as the entrypoint. Also, the -g option to tini is probably required, as that sends the received signal to all the process group, which means that not only will the run-with-wait.sh script receive SIGTERM, but so will any child processes it launched, like apache2-foreground.

@rmunn
Copy link
Collaborator

rmunn commented Jun 10, 2022

BTW, I verified that it doesn't work by running time docker container stop lf-app and seeing that it took 10 seconds, whereas the lfmerge container using tini took less than 1 second to run time docker container stop lfmerge.

Note the `-g` option, necessary so tini will forward signals to entire
process *groups*. Without the `-g`, apache2-foreground never receives
the SIGTERM signal and doesn't know it should exit.
@rmunn rmunn dismissed their stale review June 10, 2022 09:35

Now we have tini as PID 1, which works.

@rmunn
Copy link
Collaborator

rmunn commented Jun 10, 2022

With my latest commit, this PR works. But now that it's mostly code that I've written, I think someone else needs to review it.

@megahirt
Copy link
Collaborator Author

@rmunn hmm, this is frustrating since "it works for me" TM.

Here's a screenshot of my original commit working. I deleted the lf-app and rebuilt it on this commit (not shown).

image

It takes 0.0 seconds to shutdown. The two that take 10 seconds are lfmerge and lf-next-app. I wonder why it's not working for you? Did you make clean and rebuild the image?

I guess it's disappointing that we are getting different results - but I don't want to get hung up on something small like this.

I would like to have a standard way make all our docker containers behave on shutdown. I think tini is a legit solution to this problem.

@rmunn wrote:

Unfortunately, this doesn't work. Shell scripts are pretty simple-minded and do not pass on signals to their child processes; instead, when the shell script receives SIGTERM it waits for the current command to finish before processing the signal and exiting. But since the current command is apache2-foreground, that doesn't ever exit, so the shell script never gets to handle SIGTERM. And therefore apache2-foreground never receives SIGTERM either.

Not sure I agree with this conclusion. My shell script explicitly traps SIGTERM and exits immediately. It doesn't wait for apache-foreground. If I didn't have the trap statement, then yes, it would wait indefinitely which is why docker kills it after 10 seconds.

docker/app/Dockerfile Outdated Show resolved Hide resolved
@rmunn
Copy link
Collaborator

rmunn commented Jun 12, 2022

@rmunn wrote:

Unfortunately, this doesn't work. Shell scripts are pretty simple-minded and do not pass on signals to their child processes; instead, when the shell script receives SIGTERM it waits for the current command to finish before processing the signal and exiting. But since the current command is apache2-foreground, that doesn't ever exit, so the shell script never gets to handle SIGTERM. And therefore apache2-foreground never receives SIGTERM either.

Not sure I agree with this conclusion. My shell script explicitly traps SIGTERM and exits immediately. It doesn't wait for apache-foreground. If I didn't have the trap statement, then yes, it would wait indefinitely which is why docker kills it after 10 seconds.

https://stackoverflow.com/questions/26858344/trap-not-working-in-shell-script shows a minimal reproduction script that can be used to test it, and also quotes the Bash manual that says, "If bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will not be executed until the command completes."

I suspect the reason you saw it working is because you ran dc down very soon after running dc up, so that the Bash script was still in the /wait step when dc down sent it SIGTERM. If you try that again, but this time wait 30-40 seconds before running dc down, I believe you'll see it take 10 seconds to kill the lf-app process as I did. (To be certain that it's in the state that will take 10 seconds, you can run docker exec lf-app ps aux. If you see several apache2-foreground processes, then it's ready to run dc down and observe that it takes 10 seconds.)

Anyway, I'll submit a new commit or two that moves the installation of tini into the base-php image.

@rmunn
Copy link
Collaborator

rmunn commented Jun 13, 2022

Have now moved tini into the base-php image. I'll wait to actually run the build-and-publish-base-php.yml workflow until this PR is approved, in case we want to make more changes. Once this PR is approved, I'll run that workflow so we have a new base PHP image to work from, and then I'll merge this into develop. (Until the base-php workflow runs, merging this into develop will fail becuase it will be building from a base image that doesn't have tini in it.)

Copy link
Collaborator Author

@megahirt megahirt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I approve of these changes however GH won't let me approve my own PR. Go ahead and merge Robin.

@rmunn rmunn merged commit 222020d into develop Jun 15, 2022
@rmunn rmunn deleted the chore/dockerProperShutdown branch June 15, 2022 03:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants