Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does panicking in a Drop impl cause SIGILL? #81895

Closed
camelid opened this issue Feb 8, 2021 · 23 comments
Closed

Why does panicking in a Drop impl cause SIGILL? #81895

camelid opened this issue Feb 8, 2021 · 23 comments
Labels
A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows C-discussion Category: Discussion or questions that doesn't represent real issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@camelid
Copy link
Member

camelid commented Feb 8, 2021

Why does this code trigger a SIGILL when run? I think what's happening is the
first element is dropped, which causes a panic, and then the second element is
dropped, which causes another panic. But why does that cause SIGILL?

My understanding from #53814 is that this is expected behavior (not a bug), but
I'm curious why it happens. I would think if anything it would trigger a
SIGABRT.

code

(playground)

struct Foo(Vec<u8>);

impl Drop for Foo {
    fn drop(&mut self) {
        panic!()
    }
}

fn main() {
    let foos = vec![Foo(vec![]), Foo(vec![1, 2, 3])];
}

stderr

   Compiling playground v0.0.1 (/playground)
warning: unused variable: `foos`
  --> src/main.rs:10:9
   |
10 |     let foos = vec![Foo(vec![]), Foo(vec![1, 2, 3])];
   |         ^^^^ help: if this is intentional, prefix it with an underscore: `_foos`
   |
   = note: `#[warn(unused_variables)]` on by default

warning: 1 warning emitted

    Finished dev [unoptimized + debuginfo] target(s) in 0.83s
     Running `target/debug/playground`
thread 'main' panicked at 'explicit panic', src/main.rs:5:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'explicit panic', src/main.rs:5:9
stack backtrace:
   0:     0x561934642290 - std::backtrace_rs::backtrace::libunwind::trace::h04d12fdcddff82aa
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/../../backtrace/src/backtrace/libunwind.rs:100:5
   1:     0x561934642290 - std::backtrace_rs::backtrace::trace_unsynchronized::h1459b974b6fbe5e1
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x561934642290 - std::sys_common::backtrace::_print_fmt::h9b8396a669123d95
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/sys_common/backtrace.rs:67:5
   3:     0x561934642290 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::he009dcaaa75eed60
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/sys_common/backtrace.rs:46:22
   4:     0x56193465b18c - core::fmt::write::h77b4746b0dea1dd3
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/core/src/fmt/mod.rs:1078:17
   5:     0x5619346407a2 - std::io::Write::write_fmt::heb7e50902e98831c
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/io/mod.rs:1518:15
   6:     0x5619346443f5 - std::sys_common::backtrace::_print::h2d880c9e69a21be9
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/sys_common/backtrace.rs:49:5
   7:     0x5619346443f5 - std::sys_common::backtrace::print::h5f02b1bb49f36879
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/sys_common/backtrace.rs:36:9
   8:     0x5619346443f5 - std::panicking::default_hook::{{closure}}::h658e288a7a809b29
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:208:50
   9:     0x561934644098 - std::panicking::default_hook::hb52d73f0da9a4bb8
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:227:9
  10:     0x561934644b31 - std::panicking::rust_panic_with_hook::hfe7e1c684e3e6462
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:593:17
  11:     0x56193462e40a - std::panicking::begin_panic::{{closure}}::h614b3cbb40c6ac97
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:522:9
  12:     0x56193462e1d8 - std::sys_common::backtrace::__rust_end_short_backtrace::h80519f17256a9add
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/sys_common/backtrace.rs:141:18
  13:     0x56193462e347 - std::panicking::begin_panic::h3645bdd6626a0ee3
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:521:12
  14:     0x56193463102d - <playground::Foo as core::ops::drop::Drop>::drop::hce9ef5afe53ec370
                               at /playground/src/main.rs:5:9
  15:     0x56193462e682 - core::ptr::drop_in_place::h26876705d1b946dc
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/core/src/ptr/mod.rs:175:1
  16:     0x56193462e780 - core::ptr::drop_in_place::h4b57af26293eab52
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/core/src/ptr/mod.rs:175:1
  17:     0x561934630ca2 - <alloc::vec::Vec<T> as core::ops::drop::Drop>::drop::hb6788eaf6b5dc8d0
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/alloc/src/vec.rs:2595:13
  18:     0x56193462e812 - core::ptr::drop_in_place::h77d15d17e0480110
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/core/src/ptr/mod.rs:175:1
  19:     0x561934631126 - playground::main::h09e49a94267467fc
                               at /playground/src/main.rs:11:1
  20:     0x56193462e61b - core::ops::function::FnOnce::call_once::h5041d89337c06587
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/core/src/ops/function.rs:227:5
  21:     0x56193462e20e - std::sys_common::backtrace::__rust_begin_short_backtrace::hf3c8a86706865b3d
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/sys_common/backtrace.rs:125:18
  22:     0x56193462e2b1 - std::rt::lang_start::{{closure}}::hb426d5912cbdfebe
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/rt.rs:66:18
  23:     0x561934644f57 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h57e2a071d427b24c
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/core/src/ops/function.rs:259:13
  24:     0x561934644f57 - std::panicking::try::do_call::h81cbbe0c3b30a28e
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:381:40
  25:     0x561934644f57 - std::panicking::try::hbeeb95b4e1f0a876
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:345:19
  26:     0x561934644f57 - std::panic::catch_unwind::h59c48ccb40a0bf20
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panic.rs:396:14
  27:     0x561934644f57 - std::rt::lang_start_internal::ha53ab63f88fee728
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/rt.rs:51:25
  28:     0x56193462e287 - std::rt::lang_start::hb78caef74d29b421
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/rt.rs:65:5
  29:     0x5619346311aa - main
  30:     0x7f173d6a50b3 - __libc_start_main
  31:     0x56193462e09e - _start
  32:                0x0 - <unknown>
thread panicked while panicking. aborting.
timeout: the monitored command dumped core
/playground/tools/entrypoint.sh: line 11:     7 Illegal instruction     timeout --signal=KILL ${timeout} "$@"
@camelid camelid added A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows C-discussion Category: Discussion or questions that doesn't represent real issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 8, 2021
@jonas-schievink
Copy link
Contributor

It calls intrinsics::abort(), which LLVM turns into a ub2 instruction, which is illegal, thus SIGILL

@camelid
Copy link
Member Author

camelid commented Feb 8, 2021

Oh... I think I might've seen a thread about that somewhere. Can we change it somehow so LLVM doesn't turn it into ub2?

@nagisa
Copy link
Member

nagisa commented Feb 9, 2021

There's a message ("thread panicked while panicking. aborting.") )in the output that points out the cause and what will be done. With that in mind does it really matter that it aborts with a SIGILL as opposed to some other obscure signal (SIGSEGV, SIGBUS or similar)?

SIGABRT could be gotten with a syscall, but that's a maintenance burden and more code to fill instruction caches with.

@camelid
Copy link
Member Author

camelid commented Feb 9, 2021

I guess when I see SIGILL it makes me think there's memory corruption or something. Maybe I don't understand the meaning of SIGILL :)

@jyn514
Copy link
Member

jyn514 commented Feb 9, 2021

man 7 signal says

   Signal dispositions
       Each signal has a current disposition, which determines how the process behaves when it is delivered the signal.

       The entries in the "Action" column of the tables below specify the default disposition for each signal, as follows:

       Core   Default action is to terminate the process and dump core (see core(5)).


   Standard signals
       Linux supports the standard signals listed below.  Several signal numbers are architecture-dependent, as indicated in the "Value" column.  (Where three values are
       given,  the  first  one  is  usually valid for alpha and sparc, the middle one for x86, arm, and most other architectures, and the last one for mips.  (Values for
       parisc are not shown; see the Linux kernel source for signal numbering on that architecture.)  A dash (-) denotes that a signal is absent on the corresponding ar‐
       chitecture.

       First the signals described in the original POSIX.1-1990 standard.

       Signal     Value     Action   Comment
       ──────────────────────────────────────────────────────────────────────
SIGILL        4       Core    Illegal Instruction

So SIGILL means "to execute ud2 or some other illegal instruction", and has the same effect in practice as SIGABRT.

@jyn514
Copy link
Member

jyn514 commented Feb 9, 2021

Also consider that abort() requires at least an operating system, while ud2 works even on embedded systems.

@ojeda
Copy link
Contributor

ojeda commented Feb 9, 2021

I guess when I see SIGILL it makes me think there's memory corruption or something. Maybe I don't understand the meaning of SIGILL :)

Why do you think so? Perhaps due to their bad reputation? :) Note that it is not a SIGSEGV or SIGBUS. More generally, signals do not even necessarily mean something bad has happened.

@camelid
Copy link
Member Author

camelid commented Feb 10, 2021

I guess when I hear "illegal" it makes me think there's a memory issue. It's not very rational :)

@jgarvin
Copy link

jgarvin commented Feb 21, 2021

I ended up finding this thread because I was confused by seeing SIGILL because I also assumed that if I was getting it that it meant I had a memory management problem. The whole idea with using Rust instead of C++ is not seeing SIGBUS, SIGSEGV, SIGILL etc anymore so I assumed I had made a mistake ;) But a panic during a panic is not technically unsafe (though it is guaranteed to end your program). I'm not sure if there is anything that could be done to make this clearer to the user. At least it seems it should be documented somewhere that comes up earlier in google results than this github issue. Maybe in the nomicon.

@nagisa
Copy link
Member

nagisa commented Feb 21, 2021

We could add additional blurb to the

thread panicked while panicking. aborting.

I guess? Like "thread panicked while panicking. forcing termination via CPU trap". If people search for what CPU trap means they get wikipedia as first hit.

@camelid
Copy link
Member Author

camelid commented Feb 21, 2021

Or:

thread panicked while panicking. aborting via CPU trap.

Do you get a SIGILL on all architectures, or just x86-64?

@nagisa
Copy link
Member

nagisa commented Feb 21, 2021

You are not guaranteed to get SIGILL everywhere, no. Some targets may even end up calling an abort() as they don't actually have a good way to force a trap, but those are exceedingly rare.

@camelid
Copy link
Member Author

camelid commented Feb 21, 2021

If it doesn't always use a trap, then it's probably more confusing to add "aborting via CPU trap". Maybe we could just add documentation to the reference about this?

@nagisa
Copy link
Member

nagisa commented Feb 21, 2021

It does almost always use a trap of some sort, though. It will do so for the 99.manynines% of the market share. Targets supported by Rust that do not use a CPU trap to abort this way are pretty much just msp430 right now AFAIK.

I'm not sure the reference would be the right place to describe what is effectively an implementation detail of the standard library.

@ojeda
Copy link
Contributor

ojeda commented Feb 22, 2021

There is another behavior: for WebAssembly, the trap will end up as a WebAssembly.RuntimeError in JS land.

I think the root of the confusion mostly comes from stating aborting in the error message, since it is not actually aborting but trapping. A user reading "abort" is likely to expect an abnormal termination of the program, yes, but not something as "bad" or as "strange" as an illegal instruction or an infinite loop; which as we have seen can indeed be read as something being deeply broken.

Thus I think changing that word to "trapping" would alleviate the problem, regardless of whether we add extra information to explain what happened [*]. It would be more correct, more specific (easier to search for) and hints the user something is going on, different from the usual "abort" they may be accustomed to from std::process::abort(), from C / POSIX's abort() or from the usual Aborted (core dumped) etc. messages.

Related: I'd argue core::intrinsics::abort() should also be renamed core::intrinsics::trap(), which reflects better what it does, matches the usual lingo and makes it less prone to be confused with the usual std::process::abort(), etc. For similar reasons I sent #81530 in an attempt to avoid potential confusion with an abort vs. unreachable instance in wasm.

[*] If we wanted to be even more user-friendly, we could add a custom message per arch that explains what the trap actually does for that target, e.g.:

thread panicked while panicking. trapping (in x86_64, via illegal instruction).

@thomcc
Copy link
Member

thomcc commented Feb 22, 2021

SystemZ maps the trap to an infinite loop

Hmm, doesn't SystemZ have multithreading? If so, this seems invalid. core::intrinsics::abort is used in some cases where it indicates cross-thread memory corruption may occur (for example, in sync::Arc on refcount overflow)...

@ojeda
Copy link
Contributor

ojeda commented Feb 22, 2021

Ah, it does create a loop to itself plus two bytes, so I guess it is done as a way to jump into an illegal instruction (or perhaps a misaligned address). Edited, thanks!

@ojeda
Copy link
Contributor

ojeda commented Feb 22, 2021

From a quick test under user QEMU, that jump in SystemZ ends up as a segmentation fault, rather than an illegal instruction. AArch64, instead, gives SIGTRAP rather than SIGILL, I guess since brk is documented as a debug breakpoint.

So if those behaviors are faithful to what happens natively, it seems worth to give a custom message even when considering only Linux targets, e.g.:

thread panicked while panicking. trapping (arm64 Linux will raise SIGTRAP).

@camelid
Copy link
Member Author

camelid commented Feb 22, 2021

We could just be more vague and say:

thread panicked while panicking. crashing via trap or other means.

@jyn514
Copy link
Member

jyn514 commented Mar 14, 2021

I think this is a duplicate of #40230, or the other way around.

@darthdeus
Copy link

One thing that might be worth noting is that this causes problems with tests that panic! in Drop, since the test runner aborts and you don't even get a summary of successful/failed tests, it just stops as is.

An easy way to reproduce this is to have something that needs to be modified otherwise it panics in Drop, and then have a failing assert in the middle of the test. The failing assert causes a panic, and as drop() gets called and panics again the whole thing crashes and the test suite stops - or at least that's what seems to happen.


The case that I'm running into is that I have a tree structure that has nodes that will panic unless they're traversed. That way I don't have to explicitly assert that everything that I need accessed was during the test, and the test will simply fail if the algorithm doesn't traverse in the correct way. But this only works if such test runs to completion and the nodes are dropped normally. Any failing assert before causes a double panic.

@jyn514
Copy link
Member

jyn514 commented Sep 21, 2021

@darthdeus #87323

@jyn514
Copy link
Member

jyn514 commented Apr 26, 2023

I'm going to close this since the behavior was documented in #85377 and we're not planning to change it.

@jyn514 jyn514 closed this as completed Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows C-discussion Category: Discussion or questions that doesn't represent real issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants