Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hanging in ARCH_FORK with CPUPROFILE #701

Open
GoogleCodeExporter opened this issue Jul 23, 2015 · 2 comments
Open

Hanging in ARCH_FORK with CPUPROFILE #701

GoogleCodeExporter opened this issue Jul 23, 2015 · 2 comments

Comments

@GoogleCodeExporter
Copy link

There are two ways I have been able to reproduce the problem.
The first method occurs at random, and in spans of time (running in release 
mode).
The second seems to occur every time I run internal tools linked against 
libprofiler with gdb/cgdb.
I have been unable to generate a simplified reproducer that can be shared.

What steps will reproduce the problem?
1. compile code in debug mode, linked against libprofiler.so
2. run executable in cgdb
3. wait
4. interrupt execution and observe that:
    a. all but one thread are waiting in poll, or epoll, or pthread_cond_wait, or etc.
    b. one thread is stuck in a fork system call, on the ARCH_FORK line
    c. CPU is at 100%

What is the expected output? What do you see instead?
The program is expected to finish normally.
The program hangs 'forever' in a call to fork(). On the ARCH_FORK() macro with 
$rax = -ERESTARTNOINTR

What version of the product are you using? On what operating system?
2.2.1 / 2.4
RHEL6

Please provide any additional information below.
I have a quick (non-complete) fix (attached) for this using pthread_atfork and 
pthread_sigmask to block SIGPROF before a fork and then re-enable it 
afterwards. From my testing, this always prevents the hanging issue.

I have communicated my fix with Developer Services at my job and they have 
indicated that it would be preferred if this solution could be patched into the 
gperftools source code.

While this is probably sufficient for the usecase at my job, it feels 
incomplete for the purposes of patching into the gperftools codebase.

Original issue reported on code.google.com by Sam.J.Ja...@gmail.com on 20 Jul 2015 at 5:18

Attachments:

@GoogleCodeExporter
Copy link
Author

Thanks for bug report.

I would like to understand it a bit more. I.e. it's great that blocking SIGPROF 
during fork helps your case, but I'm really curious why not having it causes 
fork to spin. Is that because signal always triggers during fork? But then how 
is that possible ?

Can you please submit some test program that causes this behavior ? Or maybe 
elaborate more on your finding?

Original comment by alkondratenko on 21 Jul 2015 at 2:44

@GoogleCodeExporter
Copy link
Author

The signal does not always trigger during fork when run in release mode. 
However, as far as I can tell is does always trigger with GDB/CGDB.

From my understanding, this errno is handled by the kernel by re-attempting the 
interrupted syscall (reset $rax and move the instruction pointer back). Why 
this gets trapped in a spin is beyond me though.

As I mentioned, I have as-of-yet been unable to create a reproducer case, but I 
will keep looking into it.

Original comment by Sam.J.Ja...@gmail.com on 21 Jul 2015 at 3:10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant