Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSAL timer tests may crash on 32-bit Linux #83

Closed
skliper opened this issue Sep 30, 2019 · 6 comments
Closed

OSAL timer tests may crash on 32-bit Linux #83

skliper opened this issue Sep 30, 2019 · 6 comments
Labels
Milestone

Comments

@skliper
Copy link
Contributor

skliper commented Sep 30, 2019

I have seen changes between test runs of the same branch
where sometimes the timer related test programs run OK
and other times when they die with a SIGSEGV.

Test results need to be robust and repeatable. I suspect
that these tests are sensitive to some condition that is
not being adequately controlled on the test targets.

I am making the initial assumption that this is going to
require an update to the test scripts for OSAL, but debugging
is going to require some tinkering inside OSAL to extract
what is going on.

So this bug is being filed in both OSAL and TEST.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Imported from trac issue 60. Created by glimes on 2015-06-23T11:29:04, last modified: 2015-09-08T12:53:25

@skliper skliper self-assigned this Sep 30, 2019
@skliper skliper added the bug label Sep 30, 2019
@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by glimes on 2015-06-23 11:30:17:

See TEST ticket [cfs_test:17] with the same title.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by glimes on 2015-06-23 11:55:38:

We have five programs that are sometimes failing.

  • osal_timer_UT
  • timer-test
  • bin-sem-test
  • bin-sem-timeout-test
  • queue-timeout-test
    I will gather information about each from an offline test VM
    using the "unbuffer" utility (from the "expect" package) to
    assure we get the entire output, and "strace" to see what
    system calls are being executed.

Yes, this is a notes-taking trac ticket, as I expect
it to get a bit strange.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by glimes on 2015-06-23 12:34:04:

osal_timer_UT:

Program being run as root on a VM providing a single 32-bit X86 core running a bare-bones installation of Linux (using Debian 8.1 with expect and strace installed for debugging).

The primary output file from the program (ut_ostimer_log.txt) includes its normal end-of-test summary, including the final presentation of individual test results grouped by kind of result.

The stdout of the program had only the line from the BSP about building the initial directories. ''Probably should not treat empty .out files from /unit-tests/ programs as an error.''

The stderr of the program was entirely empty. ''This is OK as we will expect the test-runner to add text to it indicating exit status or timeout or what signal killed the program.''

The output from strace for the final moments of the program, showing the behavior of the parent and child processes, is:

{{{
29968 0.001014 <... rt_sigsuspend resumed> ) = ? ERESTARTNOHAND (To be restarted if no handler)
29968 0.000014 --- SIGRT_31 {si_signo=SIGRT_31, si_code=SI_TIMER, si_timerid=0xf, si_overrun=0, si_value={int=0, ptr=0}} ---
29968 0.000015 clock_gettime(CLOCK_REALTIME, {21056, 783607771}) = 0
29968 0.000022 timer_settime(0xf, 0, {it_interval={0, 500000000}, it_value={0, 1000000}}, NULL) = 0
29968 0.000015 sigreturn() (mask ~[INT ILL ABRT BUS FPE KILL SEGV STOP RTMIN RT_1]) = -1 EINTR (Interrupted system call)
29968 0.000021 rt_sigsuspend([], 8 <unfinished ...>

29969 0.000011 <... nanosleep resumed> 0xb7772308) = 0
29969 0.000000 timer_delete(0xf) = 0
29969 0.000040 write(3, "OSAL Unit Test Output File for o"..., 4096) = 4096
29969 0.000044 write(3, "g [PASSED]\n #26 Nominal [P"..., 2125) = 2125
29969 0.000022 close(3) = 0
29969 0.000094 munmap(0xb7774000, 4096) = 0
29969 0.000031 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xb776f734} ---
29969 0.002851 +++ killed by SIGSEGV (core dumped) +++

29968 0.000019 +++ killed by SIGSEGV (core dumped) +++
}}}

Blank lines added at the thread switches.

This shows the last time the child receives a tick. Aside from the odd timer_settime() call, this looks normal. ''Odd because interval timers are intended to provide periodic ticks without having to be re-established.''

The parent is being shown deleting the timer, flushing the output to the test log, and closing the test log. This confirms that the test program is getting into the UT_os_teardown() function, which is at the end of the UT_timertest_task() procedure.

The interesting thing to me is that after the munmap call, ''both'' processes die from a SIGSEGV.
So there is a pointer being kept to a resource, used by both, which has been released
to the point of the library actually unmapping the address range.

Will need to pull a core dump into a debugger to see what is crashing here, but that's going to be a bit painful as this is not crashing on my dev machine, and my test VMs don't have GDB available. Am suspecting misuse of the (FILE*) for the UT LOG output.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by glimes on 2015-06-23 12:45:45:

Seemed nonsensical so I installed gdb on the test target.

Yep. This SIGSEGV is being delivered while we are in the gcov_exit() function.

Need to disable gcov during Bamboo driven builds.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by glimes on 2015-09-08 12:53:25:

Three months with gcov disabled and all is well.

@skliper skliper closed this as completed Sep 30, 2019
@skliper skliper removed their assignment Sep 30, 2019
@skliper skliper added this to the osal-4.2 milestone Sep 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant