Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cfi tests failing on 32-bit arm linux #43502

Open
rovka opened this issue Nov 27, 2019 · 12 comments
Open

cfi tests failing on 32-bit arm linux #43502

rovka opened this issue Nov 27, 2019 · 12 comments
Labels
bugzilla Issues migrated from bugzilla compiler-rt:cfi Control Flow Integrity

Comments

@rovka
Copy link
Collaborator

rovka commented Nov 27, 2019

Bugzilla Link 44157
Version unspecified
OS Linux
Blocks #51489
Attachments Output of cfi tests failures
CC @zatrazz,@zmodem,@slacka,@tstellar

Extended Description

We get the following failures:
cfi-devirt-lld-armhf :: cross-dso/simple-fail.cpp
cfi-devirt-lld-armhf :: cross-dso/simple-pass.cpp
cfi-devirt-lld-armhf :: cross-dso/stats.cpp
cfi-devirt-lld-thinlto-armhf :: cross-dso/simple-fail.cpp
cfi-devirt-lld-thinlto-armhf :: cross-dso/simple-pass.cpp
cfi-devirt-lld-thinlto-armhf :: cross-dso/stats.cpp
cfi-standalone-lld-armhf :: cross-dso/simple-fail.cpp
cfi-standalone-lld-armhf :: cross-dso/simple-pass.cpp
cfi-standalone-lld-armhf :: cross-dso/stats.cpp
cfi-standalone-lld-thinlto-armhf :: cross-dso/simple-fail.cpp
cfi-standalone-lld-thinlto-armhf :: cross-dso/simple-pass.cpp
cfi-standalone-lld-thinlto-armhf :: cross-dso/stats.cpp
More details in the attached log.

@rovka
Copy link
Collaborator Author

rovka commented Jan 20, 2020

The issue reproduces when running test-release.sh on master. I've been looking at stats.cpp.

The issue that we're hitting is an invalid instruction while running __cfi_check. This looks like a Thumb function (it's disassembled that way by gdb, and it's also after a $t symbol in an objdump of the executable). However, we're hitting it in ARM mode ('p $cpsr & 0x20' returns 0). So we fetch 4 bytes instead of 2 for the first instruction (which is just a 'push {r7, lr}') and then things go boom as we try to fetch from the middle of the following 4-byte instruction.

I've been stepping through to see how we end up here, and it seems __cfi_check is called from __cfi_slowpath, which is also a Thumb function according to the objdump. We're running that one in ARM mode too, although it seems to have only 4-byte instructions so I'm guessing it's just a coincidence that it doesn't blow up.

__cfi_slowpath is called from _dl_runtime_resolve by doing a 'bx r12', where r12 contains the value returned by _dl_fixup: 0x2a01eff0. I think _dl_fixup should have returned an address with the 0 bit set to 1, since as I mentioned earlier this is a Thumb function according to objdump. I'm not sure how or why this is happening though.

@rovka
Copy link
Collaborator Author

rovka commented Jan 20, 2020

@zatrazz
Copy link
Member

zatrazz commented Jan 20, 2020

Hi Diana,

The glibc _dl_fixup handle thumb symbols similar than arm ones, it is up to static linker to setup both symbol symbol value and DT_SYMTAB base to the expected values.

On _dl_fixup, global symbols address on PLT calls will be set as:

67 const ElfW(Sym) *const symtab
68 = (const void *) D_PTR (l, l_info[DT_SYMTAB]);
[...]
73 const ElfW(Sym) *sym = &symtab[ELFW(R_SYM) (reloc->r_info)];
[...]
112 result = _dl_lookup_symbol_x (strtab + sym->st_name, l, &sym, l->l_scope,
113 version, ELF_RTYPE_CLASS_PLT, flags, NULL);
[...]
126 value = DL_FIXUP_MAKE_VALUE (result,
127 SYMBOL_ADDRESS (result, sym, false));
[...]
138 value = elf_machine_plt_value (l, reloc, value);
[...]
148 return elf_machine_fixup_plt (l, result, refsym, sym, reloc, rel_addr, value);

(assuming that the symbol is not an IFUNC and LD_BIND_NOW is not set).

The arm 'elf_machine_plt_value' is a nop operation (it just returns the 'value' input) and 'elf_machine_fixup_plt' just write down the 'value' on 'rel_addr':

275 static inline Elf32_Addr
276 elf_machine_fixup_plt (struct link_map *map, lookup_t t,
277 const ElfW(Sym) *refsym, const ElfW(Sym) *sym,
278 const Elf32_Rel *reloc,
279 Elf32_Addr *reloc_addr, Elf32_Addr value)
280 {
281 return reloc_addr = value;
282 }
283
284 /
Return the final value of a plt relocation. */
285 static inline Elf32_Addr
286 elf_machine_plt_value (struct link_map *map, const Elf32_Rel *reloc,
287 Elf32_Addr value)
288 {
289 return value;
290 }

Also arm uses the generic DL_FIXUP_MAKE_VALUE definition, which does:

23 #define DL_FIXUP_MAKE_VALUE(map, addr) (addr)

So the _dl_fixup returned value for this case will be essentially:

const ElfW(Sym) *const symtab = (const void *) D_PTR (l, l_info[DT_SYMTAB]);
const ElfW(Sym) *sym = &symtab[ELFW(R_SYM) (reloc->r_info)];
result = _dl_lookup_symbol_x (strtab + sym->st_name, l, &sym, l->l_scope,
version, ELF_RTYPE_CLASS_PLT, flags, NULL);
return result->l_addr + sym->st_value;

So, for instance with a simple example:

--
#include

class Foo
{
public:
void foo (void)
{
std::cout << func << std::endl;
}
};

extern "C" {

void foo_c (void)
{
class Foo foo;
foo.foo ();
}

}

$ c++ -O0 -g -fno-stack-protector -mthumb -fpic -c -o libfoo-thumb.o libfoo.cc
$ c++ -mthumb -shared -Wl,-soname,libfoo-thumb.so -o libfoo-thumb.so libfoo-thumb.o
$ readelf -a
[...]
20: 00000779 22 FUNC GLOBAL DEFAULT 12 foo_c
[...]

This should be the value of 'sym->st_value' on _dl_fixup which will be added on the mmap memory region of the PT_LOAD that represent the text segment of the symbol obtained from _dl_lookup_symbol_x.

What I can't answer is why 'readelf -s' and 'objdump --sym' is showing different results for the same shared library:

--
$ readelf -s libfoo-thumb.so | grep foo_c
20: 00000779 22 FUNC GLOBAL DEFAULT 12 foo_c
72: 00000779 22 FUNC GLOBAL DEFAULT 12 foo_c
$ objdump --syms --special-syms libfoo-thumb.so | grep foo_c
00000778 g F .text 00000016 foo_c

Could you check if readelf shows the same st_value output for __cfi_slowpath symbol?

@rovka
Copy link
Collaborator Author

rovka commented Jan 21, 2020

Oh, interesting, it seems readelf and objdump show different things. I think objdump always shows the "real" address, without the mode bit set, but readelf shows the mode bit as well. With readelf you can see that __cfi_check has the thumb bit set, but cfi_slowpath doesn't.

readelf -a stats.cpp.tmp | grep cfi_slowpath

0x1eff0 <__cfi_slowpath>: 0x8001848f
69: 0001eff0 308 FUNC GLOBAL DEFAULT 15 __cfi_slowpath
2371: 0001eff0 308 FUNC GLOBAL DEFAULT 15 __cfi_slowpath

objdump --syms stats.cpp.tmp | grep cfi_slowpath
0001eff0 g F .text 00000134 __cfi_slowpath


readelf -a stats.cpp.tmp | grep "cfi_check"
0x24000 <__cfi_check>: 0x808408b0
79: 00024001 204 FUNC GLOBAL DEFAULT 15 __cfi_check
2388: 00024001 204 FUNC GLOBAL DEFAULT 15 __cfi_check

objdump --syms stats.cpp.tmp | grep cfi_check
00024000 g F .text 000000cc __cfi_check

@rovka
Copy link
Collaborator Author

rovka commented Jan 21, 2020

Also, here are the commands for building stats.cpp.tmp:

clang -fuse-ld=lld -flto -fsanitize=cfi -fwhole-program-vtables --driver-mode=g++ -fsanitize-cfi-cross-dso -fvisibility=default -DSHARED_LIB -fPIC -g -fsanitize-stats -shared -o stats.cpp.tmp.so llvm-project/compiler-rt/test/cfi/cross-dso/stats.cpp

clang -fuse-ld=lld -flto -fsanitize=cfi -fwhole-program-vtables --driver-mode=g++ -fsanitize-cfi-cross-dso -fvisibility=default -g -fsanitize-stats -o stats.cpp.tmp llvm-project/compiler-rt/test/cfi/cross-dso/stats.cpp stats.cpp.tmp.so

So there's no explicit -mthumb there. I'm not sure whether the cfi functions are expected to be thumb or arm in this case.

@zmodem
Copy link
Collaborator

zmodem commented Feb 25, 2020

Is this a regression from 9 -> 10, and if so would it be possible to bisect it?

Otherwise we probably shouldn't block on this.

@rovka
Copy link
Collaborator Author

rovka commented Feb 26, 2020

IIRC this wasn't in 9.0.0, but it showed up in 9.0.1, so that's probably the branch that we should bisect. I'd need to check again with 9.0.0 though since I don't remember whether we had any infrastructure changes between 9.0.0 and 9.0.1. Unfortunately I haven't had much time to look into it. In any case, we can unblock.

@zmodem
Copy link
Collaborator

zmodem commented Feb 27, 2020

IIRC this wasn't in 9.0.0, but it showed up in 9.0.1, so that's probably the
branch that we should bisect. I'd need to check again with 9.0.0 though
since I don't remember whether we had any infrastructure changes between
9.0.0 and 9.0.1. Unfortunately I haven't had much time to look into it. In
any case, we can unblock.

Okay, unblocking.

@rovka
Copy link
Collaborator Author

rovka commented Aug 27, 2020

Seeing a new cfi failure in 11.0.0-rc2 : cfi-devirt-lld-armhf :: bad-cast.cpp.

@rovka
Copy link
Collaborator Author

rovka commented Sep 16, 2021

bad-cast.cpp is not failing anymore, but as of 13.0.0-rc3 we're having a failure in multiple-inheritance.cpp

@tstellar
Copy link
Collaborator

mentioned in issue #51489

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@asl asl added this to the LLVM 13.0.1 release milestone Dec 12, 2021
@tstellar
Copy link
Collaborator

The deadline for requesting fixes for the release has passed. This bug is being removed from the LLVM 13.0.1 release milestone. If you have a fix or think this bug is important enough to block the release, please explain why in a comment and add the bug back to the LLVM 13.0.1 release milestone.

@tstellar tstellar removed this from the LLVM 13.0.1 release milestone Dec 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla compiler-rt:cfi Control Flow Integrity
Projects
None yet
Development

No branches or pull requests

5 participants