Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detecting GLIBC version (DTLS SIGSEGV). #1322

Open
InverseRE opened this issue Sep 28, 2020 · 3 comments
Open

Detecting GLIBC version (DTLS SIGSEGV). #1322

InverseRE opened this issue Sep 28, 2020 · 3 comments

Comments

@InverseRE
Copy link

The issue was reported before:
#914
#1267
#1170

But seems like, neither of the solutions were implemented.

The main problem here is that the LeakSan can mistakenly detect the glibc version. It detects 2.19 for the real glibc of version 2.25 or higher.
And that causes SIGSEGV later in ScanRangeForPointers function from lsan_common.cc (because DTLS range is illegal).
The condition in DTLS_on_tls_get_addr() function can be false positive: sometimes ((tls_beg % 4096) == sizeof(Glibc_2_19_tls_header)) evaluates to true for pointers from glibc 2.25.

I kindly ask you to do something to avoid these annoying crashes. The LeakSan's code suggests gnu_get_libc_version() in order to limit the supported glibc versions :)

@InverseRE
Copy link
Author

InverseRE commented Sep 28, 2020

Temporarily we disable DTLS with ASAN_OPTIONS=intercept_tls_get_addr=0 to get reports from LeakSan for other storage types.

jktjkt added a commit to CESNET/CzechLight-dependencies that referenced this issue Oct 24, 2020
I was trying to update to the new sysrepo, and that included bumping
buildroot. That, however, resulted in too new `fmt`, which was not
compatible with the version of `spdlog` that we're bundling.

Solve that by using the systemwide spdlog, and while we're at it, do
that for pybind11 as well. I don't think we're ever changing it.

Also try to switch to the systemwide Boost. That's only 1.69 on Fedora
32 (which is older than we have today), but the good thing is that
Fedora 33 has Boost 1.73. Let's see what breaks.

I tried ot do this on legacy sysrepo, but there are some issues with
libev behavior, so I gave up and I'm interested in getting that thing
working with the current stack.

The old LSAN suppressions are not needed anymore, but we have a
wonderful new crash instead, this time in sysrepo's test_modules, and
*only* when running for the first time and with empty repository state.
It segfaults at exit:

 Tracer caught signal 11: addr=0x0 pc=0x4e9218 sp=0x7f9ad95f6d10
 ==25238==LeakSanitizer has encountered a fatal error.
 ==25238==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
 ==25238==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)

...and the backtrace (which is rather hard to obtain because LSAN
doesn't really work under gdb, so the trick is to run it as
`ASAN_OPTIONS=sleep_before_dying=666`, and then attach via `gdb -p
$(pidof test_modules)`, yay!) looks like this:

 (gdb) bt
 #0  __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffc7d3e90a0, rem=rem@entry=0x7ffc7d3e90a0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:79
 #1  0x00007fb5d5114157 in __GI___nanosleep (requested_time=requested_time@entry=0x7ffc7d3e90a0, remaining=remaining@entry=0x7ffc7d3e90a0) at nanosleep.c:27
 #2  0x00007fb5d511408e in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
 #3  0x00000000004c587b in __asan::AsanDie() ()
 #4  0x00000000004dd56e in __sanitizer::Die() ()
 #5  0x00000000004ec1eb in __lsan::CheckForLeaks() ()
 #6  0x00000000004ec219 in __lsan::DoLeakCheck() ()
 #7  0x00007fb5d50853a7 in __run_exit_handlers (status=0, listp=0x7fb5d5209578 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
 #8  0x00007fb5d5085550 in __GI_exit (status=<optimized out>) at exit.c:139
 #9  0x00007fb5d506d049 in __libc_start_main (main=0x4f3650 <main>, argc=1, argv=0x7ffc7d3e92f8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc7d3e92e8) at ../csu/libc-start.c:342
 #10 0x000000000041c6ce in _start ()

A TL;DR version is that this is *probably* related to how ASAN guesses
glibc versions, and running it via
`ASAN_OPTIONS=verbosity=2:intercept_tls_get_addr=1` prevents that
segfault.

That was very yummy to debug, of course.

Change-Id: Idf19d5f3ed4ed838dcb89a2dc624aa918900ac7b
Bug: google/sanitizers#1322
gnomesysadmins pushed a commit to GNOME/gjs that referenced this issue Feb 4, 2021
It appears that under particular circumstances, ASan will crash on exit
due to a bug where it mis-detects the glibc version.

See: google/sanitizers#1322
blueboxd pushed a commit to blueboxd/chromium-legacy that referenced this issue May 18, 2021
google/sanitizers#1322

According to that github issue, lsan will mistakenly detect "2.19 for
the real glibc of version 2.25 or higher."

On our xenial test bots, /lib/x86_64-linux-gnu/libc.so.6 is 2.23. On
our bionic test bots, libc is 2.27. So when we switched our asan/lsan
tests to bionic, we started hitting strange lsan errors:

Tracer caught signal 11: addr=0x330004f3 pc=0x55af5266644a sp=0x7f405e956d40
==28674==LeakSanitizer has encountered a fatal error.
eg: https://chromium-swarm.appspot.com/task?id=53745e69a9525810

It appears that error is the same reported in the github issue, as
disabling the DTLS check prevents those same LSan errors.

Though this doesn't entirely explain why the failures are transient
when nothing about the bots seemingly changes. eg:
https://ci.chromium.org/p/chromium/builders/ci/Linux%20ASan%20LSan%20Tests%20%281%29
Why does components_unittests fail in builds 89913 - 89916, but is
fine before and after?

Bug: 1200574
Change-Id: Id8ffed3bd50d648a1147de50bb4fd2f001c01d12
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2888418
Reviewed-by: Thomas Anderson <thomasanderson@chromium.org>
Reviewed-by: Garrett Beaty <gbeaty@chromium.org>
Reviewed-by: Dirk Pranke <dpranke@google.com>
Owners-Override: Garrett Beaty <gbeaty@chromium.org>
Commit-Queue: Ben Pastene <bpastene@chromium.org>
Cr-Commit-Position: refs/heads/master@{#883698}
@boris-kolpackov
Copy link

Seeing that this is affecting multiple projects and that no fix seems to be forthcoming, perhaps this test should be disabled by default, at least if using GLIBC?

glandium added a commit to glandium/git-cinnabar that referenced this issue Oct 1, 2021
See google/sanitizers#1322. It might be
the cause of the intermittent asan failures on CI.
Trott pushed a commit to nodejs/node that referenced this issue May 15, 2022
PR-URL: #43085
Refs: google/sanitizers#1322
Refs: #43082
Reviewed-By: Jiawen Geng <technicalcute@gmail.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
BethGriggs pushed a commit to nodejs/node that referenced this issue May 16, 2022
PR-URL: #43085
Refs: google/sanitizers#1322
Refs: #43082
Reviewed-By: Jiawen Geng <technicalcute@gmail.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
juanarbol pushed a commit to nodejs/node that referenced this issue May 31, 2022
PR-URL: #43085
Refs: google/sanitizers#1322
Refs: #43082
Reviewed-By: Jiawen Geng <technicalcute@gmail.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
danielleadams pushed a commit to nodejs/node that referenced this issue Jun 27, 2022
PR-URL: #43085
Refs: google/sanitizers#1322
Refs: #43082
Reviewed-By: Jiawen Geng <technicalcute@gmail.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
targos pushed a commit to nodejs/node that referenced this issue Jul 12, 2022
PR-URL: #43085
Refs: google/sanitizers#1322
Refs: #43082
Reviewed-By: Jiawen Geng <technicalcute@gmail.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
targos pushed a commit to nodejs/node that referenced this issue Jul 31, 2022
PR-URL: #43085
Refs: google/sanitizers#1322
Refs: #43082
Reviewed-By: Jiawen Geng <technicalcute@gmail.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
guangwong pushed a commit to noslate-project/node that referenced this issue Oct 10, 2022
PR-URL: nodejs/node#43085
Refs: google/sanitizers#1322
Refs: nodejs/node#43082
Reviewed-By: Jiawen Geng <technicalcute@gmail.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
dkoutsou pushed a commit to cvm-project/cvm-project that referenced this issue Oct 19, 2022
dkoutsou pushed a commit to cvm-project/cvm-project that referenced this issue Oct 19, 2022
kleisauke added a commit to kleisauke/libvips that referenced this issue Nov 29, 2022
It might be the cause of the intermittent ASan failures on CI.

See: google/sanitizers#1322
jcupitt pushed a commit to libvips/libvips that referenced this issue Nov 30, 2022
It might be the cause of the intermittent ASan failures on CI.

See: google/sanitizers#1322
pstorz added a commit to bareos/bareos that referenced this issue Mar 15, 2023
Set `ASAN_OPTIONS=intercept_tls_get_addr=0` to avoid problems with
sanitizers running in containers.

see: google/sanitizers#1322
pstorz added a commit to bareos/bareos that referenced this issue Mar 15, 2023
Set `ASAN_OPTIONS=intercept_tls_get_addr=0` to avoid problems with
sanitizers running in containers.

see: google/sanitizers#1322
(cherry picked from commit 30a263a)
sebsura pushed a commit to sebsura/bareos that referenced this issue Mar 30, 2023
Set `ASAN_OPTIONS=intercept_tls_get_addr=0` to avoid problems with
sanitizers running in containers.

see: google/sanitizers#1322
@thurstond
Copy link
Contributor

"Fix tls_get_addr handling for glibc >=2.25" (https://reviews.llvm.org/D147459) landed in LLVM upstream last Friday, which should fix the issue.

github-merge-queue bot pushed a commit to eic/EICrecon that referenced this issue Oct 5, 2023
### Briefly, what does this PR introduce?
Potentially resolves this, per
google/sanitizers#1322...

### What kind of change does this PR introduce?
- [x] Bug fix (issue #1043)
- [ ] New feature (issue #__)
- [ ] Documentation update
- [ ] Other: __

### Please check if this PR fulfills the following:
- [ ] Tests for the changes have been added
- [ ] Documentation has been added / updated
- [ ] Changes have been communicated to collaborators

### Does this PR introduce breaking changes? What changes might users
need to make to their code?
No.

### Does this PR change default behavior?
No.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants