Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in std::sync::mpmc::waker::SyncWaker::notify #113726

Closed
arthurprs opened this issue Jul 15, 2023 · 8 comments · Fixed by #113861
Closed

Segfault in std::sync::mpmc::waker::SyncWaker::notify #113726

arthurprs opened this issue Jul 15, 2023 · 8 comments · Fixed by #113861
Labels
A-atomic Area: Atomics, barriers, and sync primitives C-bug Category: This is a bug. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@arthurprs
Copy link
Contributor

arthurprs commented Jul 15, 2023

I have code that resembles the following in my application. I'm trying to write a reproduction code, but so far no luck.

let (tx, rx) = mpsc::sync_channel(64);
let tx = Arc::new(tx);

// Rx thread
loop {
  match rx.recv_timeout(wake_up_interval) {
      Err(sync::mpsc::RecvTimeoutError::Timeout) | Ok(()) => (),
      Err(sync::mpsc::RecvTimeoutError::Disconnected) => unreachable!(),
  }
  ...
}

// Tx thread(s)
tx.try_send(()); // this segfaults

I expected to see this happen:

That it works reliably

Instead, this happened:

Very rarely I get a segfault, but when I do, gdb always shows the following stack trace

core::sync::atomic::AtomicUsize::fetch_sub () at /rustc/ad963232d9b987d66a6f8e6ec4141f672b8b9900/library/core/src/sync/atomic.rs:2540
2540	                unsafe { atomic_sub(self.v.get(), val, order) }
#0  core::sync::atomic::AtomicUsize::fetch_sub () at /rustc/ad963232d9b987d66a6f8e6ec4141f672b8b9900/library/core/src/sync/atomic.rs:2540
#1  alloc::sync::{impl#27}::drop<std::sync::mpmc::context::Inner> (self=0x7fffabbfa708) at /rustc/ad963232d9b987d66a6f8e6ec4141f672b8b9900/library/alloc/src/sync.rs:1862
#2  0x000055555560d59a in core::ptr::drop_in_place<alloc::sync::Arc<std::sync::mpmc::context::Inner>> () at /rustc/ad963232d9b987d66a6f8e6ec4141f672b8b9900/library/core/src/ptr/mod.rs:497
#3  0x000055555560bc8a in core::ptr::drop_in_place<std::sync::mpmc::context::Context> () at /rustc/ad963232d9b987d66a6f8e6ec4141f672b8b9900/library/core/src/ptr/mod.rs:497
#4  0x000055555560b8ba in core::ptr::drop_in_place<std::sync::mpmc::waker::Entry> () at /rustc/ad963232d9b987d66a6f8e6ec4141f672b8b9900/library/core/src/ptr/mod.rs:497
#5  0x000055555560d9c5 in core::ptr::drop_in_place<core::option::Option<std::sync::mpmc::waker::Entry>> () at /rustc/ad963232d9b987d66a6f8e6ec4141f672b8b9900/library/core/src/ptr/mod.rs:497
#6  0x000055555583298f in std::sync::mpmc::waker::SyncWaker::notify (self=0x7fff901ca8c0) at /rustc/ad963232d9b987d66a6f8e6ec4141f672b8b9900/library/std/src/sync/mpmc/waker.rs:170
#7  0x00005555555f9869 in std::sync::mpmc::array::Channel<()>::write<()> (self=0x7fff901ca780, token=0x7fffabbfa8f0, msg=()) at /rustc/ad963232d9b987d66a6f8e6ec4141f672b8b9900/library/std/src/sync/mpmc/array.rs:210
#8  0x00005555555fa472 in std::sync::mpmc::array::Channel<()>::try_send<()> (self=0x7fff901ca780, msg=()) at /rustc/ad963232d9b987d66a6f8e6ec4141f672b8b9900/library/std/src/sync/mpmc/array.rs:309
#9  0x0000555555682fd2 in std::sync::mpmc::Sender<()>::try_send<()> (self=0x7fff900fca80, msg=()) at /rustc/ad963232d9b987d66a6f8e6ec4141f672b8b9900/library/std/src/sync/mpmc/mod.rs:111
#10 0x000055555589963e in std::sync::mpsc::SyncSender<()>::try_send<()> (self=0x7fff900fca80, t=()) at /rustc/ad963232d9b987d66a6f8e6ec4141f672b8b9900/library/std/src/sync/mpsc/mod.rs:739

Meta

rustc --version --verbose:

rustc 1.73.0-nightly (ad963232d 2023-07-14)
binary: rustc
commit-hash: ad963232d9b987d66a6f8e6ec4141f672b8b9900
commit-date: 2023-07-14
host: x86_64-unknown-linux-gnu
release: 1.73.0-nightly
LLVM version: 16.0.5

I could also see it with rustc 1.72.0-nightly (f7ca9df69 2023-06-24)

@arthurprs arthurprs added the C-bug Category: This is a bug. label Jul 15, 2023
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jul 15, 2023
@jyn514 jyn514 added I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-libs Relevant to the library team, which will review and decide on the PR/issue. A-atomic Area: Atomics, barriers, and sync primitives labels Jul 15, 2023
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jul 15, 2023
@jyn514
Copy link
Member

jyn514 commented Jul 15, 2023

relevant code

unsafe { atomic_sub(self.v.get(), val, order) }

@Noratrieb Noratrieb removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jul 15, 2023
@Noratrieb
Copy link
Member

cc @ibraheemdev @taiki-e

@ibraheemdev
Copy link
Member

@arthurprs does your code contain any unsafe? Waker access is always guarded behind a mutex and the backtrace here actually points to Arc::drop. It seems unlikely that this is a bug in the standard library.

@arthurprs
Copy link
Contributor Author

Yes there are a few instances of unsafe, all guarded by debug assertions and such and this is running in debug mode. But I guess it's possible...

My understanding is that this happens during drop of Entry after a call to try_select here (which has a struct Context { Arc<Inner> } inside, which is what eventually segfaults)

inner.try_select();

Which theoretically was put there by a waiting receiver.

@taiki-e
Copy link
Member

taiki-e commented Jul 15, 2023

Does this problem exist in both std mpsc and crossbeam? Or only in std mpsc?

If the latter, the current std mpsc's try_select appears to be based on crossbeam 0.5.2 or 0.5.3 that is affected by compiler or std or OS bug related to TLS access, so changing it may fix the problem. crossbeam-rs/crossbeam#802

@arthurprs
Copy link
Contributor Author

arthurprs commented Jul 15, 2023

@taiki-e I left my "repro loop" running for like 1h, and it doesn't seem to fail with crossbeam, only std mpsc (within a minute).

Looking at the code I can't tell how the TLS bug would lead to a segfault, but I'm not familiar with the codebase.

@apiraino
Copy link
Contributor

WG-prioritization assigning priority (Zulip discussion).

If I understand, a MCVE would help. @arthurprs am I correct in assuming the codebase is not publicly available? Also, to clarify, you are not reporting a regression, so there's no "good rustc" that doesn't reproduce the issue, correct?

@rustbot label -I-prioritize +P-high +E-needs-mcve

@rustbot rustbot added E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example P-high High priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Jul 19, 2023
@bors bors closed this as completed in 6b53175 Jul 20, 2023
@arthurprs
Copy link
Contributor Author

@apiraino that's correct. I tried, but couldn't successfully write a MCVE. What I was using to "reproduce" was calling the app test suite in a bash loop.

github-actions bot pushed a commit to rust-lang/miri that referenced this issue Jul 22, 2023
Avoid tls access while iterating through mpsc thread entries

Upstream fix: crossbeam-rs/crossbeam#802. Possibly fixes rust-lang/rust#113726.
thomcc pushed a commit to tcdi/postgrestd that referenced this issue Oct 17, 2023
Avoid tls access while iterating through mpsc thread entries

Upstream fix: crossbeam-rs/crossbeam#802. Possibly fixes rust-lang/rust#113726.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-atomic Area: Atomics, barriers, and sync primitives C-bug Category: This is a bug. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants