-
Notifications
You must be signed in to change notification settings - Fork 576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hypnotoad / Prefork dump cores on small values of accepts and disabled keep-alive #1449
Comments
When is this a problem in the real world? |
When your app does something memory-intensive but not very often, so you lower In my case setting |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
The problem seems too obscure for us to fix in Mojolicious. |
We see 100s of similar core dumps a week in production with |
We see the same issue in our production. At this point, our conclusion is that when That Perl resets the signal handlers before $SIG{QUIT} = sub {};
kill 'QUIT', $$;
print "After first quit\n";
END {
kill 'QUIT', $$;
print "After second quit\n";
} Running this program will only cause the first It seems that it is only handlers that are reset. If I'm not familiar with the innards of the perl-interpreter, but it seems plausible that it will have to reset the signal handlers before exiting, leaving a small window open for a race between exiting and receiving a |
localizing the signal handler in the |
I can't see that localizing the signal handler in the AFAICT the potential fix with the $SIG{QUIT} = 'IGNORE'; inbetween I would submit a pull-request for this, but I haven't yet wrapped my head around how to write a test to ensure that it actually works. |
@kraih Please consider reopening. We have Here is a maybe less obscure repro which dumps core for me about 50% of the time: Server (using strace in case core dumps are disabled):
Client:
The trick is doing enough work ( |
It turns out the Prefork signal handlers are My test server then becomes:
which never core dumps for me. I still think a change is warranted so that all users do not have to know about such a workaround. |
This bit us hard. Eating up huge amounts of disk space once we lowered |
There's a possible fix, unfortunately nobody has reviewed it yet. |
FWIW, this simple workaround has completely solved the problem for us (this is also what I discussed above):
The PR #1883 ensures that the stray signals never happen and the workaround is no longer needed. |
Steps to reproduce the behavior
It's a TL;DR version, scroll down and see attached archive for code and more details
I run a simple synchronous server
with a command
and a client in a second terminal to generate some load.
Expected behavior
I expect each worker process to exit normally after it served one request.
Actual behavior
Eventually a core dump of a worker process is generated.
gdb
showsProgram terminated with signal SIGQUIT, Quit.
.Setting up a clean environment
I use a Linux box, plenv to compile recent Perl version with debug symbols and cpanm to install Mojolicious locally.
Then make sure core dumps will end up in a current working directory (you may want to back up previous value but it won't persist across system boots). It's not a necessary step but
client.pl
andgdb
commands here assume it's done.Start server
and client from another terminal.
It usually takes up to 15 minutes to reproduce on my Intel i5-2500K box. After client terminates there should be 1 to 3
core.<pid>
files in a current working directory.If a core dump was generated by e.g. a process with pid 33156, it's backtrace can be inspected with the following command:
To observe signals-related behaviour I run server via strace.
It generates a lot of
traces/trace.<pid>
files which could take up some disk space.Dumps and traces
According to backtraces it seems processes usually terminate inside
Perl_pp_exit
orperl_destruct
functions. In a syscall traces of dumped processesSIGQUIT
arrives some microseconds afterSIGQUIT
handler is restored toSIG_DFL
(a default one).I assume there's a race condition due to a time gap between signal handlers are reset by Perl and process terminates.
Potential fix
See
server-fixed.pl
in attached archive. This seems to do the trick but race condition still remains (AFAIK a signal may arrive betweenrt_sigaction
settingSIGQUIT
handler toSIG_DFL
andrt_sigprocmask
blocking the process from recievingSIGQUIT
).The text was updated successfully, but these errors were encountered: