Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Interrupted falling out of thread crash #11665

Merged
merged 6 commits into from
Oct 16, 2024

Conversation

roberth
Copy link
Member

@roberth roberth commented Oct 10, 2024

Motivation

Context

What I missed is that threads will crash the process if they end in an uncaught exception.
Thank you @lf- for pinging me about this bug!

This does not need a backport unless 2.25 is released first somehow.

Priorities and Process

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@roberth roberth force-pushed the fix-Interrupted-falling-out-of-thread branch from 0f8de62 to b6d157d Compare October 10, 2024 09:00
@roberth roberth added the bug label Oct 10, 2024
@roberth roberth changed the title Fix interrupted falling out of thread Fix Interrupted falling out of thread crash Oct 10, 2024
@roberth roberth force-pushed the fix-Interrupted-falling-out-of-thread branch from b6d157d to a88068f Compare October 10, 2024 09:09
@@ -1504,7 +1505,7 @@ void LocalDerivationGoal::startDaemon()

daemonThread = std::thread([this, store]() {

while (true) {
loopUntilInterrupted([this, store]() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really clear to me why we don't just call checkInterrupt() in the while loop.

The issue of having an uncaught exception is orthogonal to the interruption issue. It's better to have an appropriate try { ... } catch (...) { ... } at top-level in the thread.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an alternate fix, 31fb9a4

See also my accidental quote reply #11665 (comment)

return Continue;
});
// We've been interrupted or there's no more work to do; either way we
// must return now.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this create the possibility that if there is an interrupt, the ThreadPool will return successfully? I.e. pool.process() will return without an error, even though not all of the work items have been executed. And then (for instance) the entire Nix command could succeed if the calling thread doesn't check for interrupts.

@edolstra
Copy link
Member

Hm, it's not clear to me where ThreadPool could have an uncaught Interrupted exception. It catches all exceptions from the work item (try { w(); } catch (...) { exc = std::current_exception(); }) and I don't see where else an Interrupted could be thrown.

Same with the daemonThread, though that definitely has an uncaught exception issue if accept() fails.

@roberth roberth force-pushed the fix-Interrupted-falling-out-of-thread branch from a88068f to 7c337e4 Compare October 10, 2024 11:30
@roberth
Copy link
Member Author

roberth commented Oct 10, 2024

Hm, it's not clear to me where ThreadPool could have an uncaught Interrupted exception. It catches all exceptions from the work item (try { w(); } catch (...) { exc = std::current_exception(); }) and I don't see where else an Interrupted could be thrown.

Same with the daemonThread, though that definitely has an uncaught exception issue if accept() fails.

It would be thrown by ignoreExceptionExceptInterrupt. I guess that's not so obvious, having two layers of weirdness in the name. I think we should change ignoreException -> reportException, ExceptInterrupt -> RethrowInterrupt.

@roberth
Copy link
Member Author

roberth commented Oct 10, 2024

Also the ExceptInterrupt or RethrowInterrupt part is a bit suggestive of the wrong assumptions, given that it's currently often used in contexts where Interrupt isn't caught. That part - while otherwise harmless - is only relevant when catching any of its superclasses or ..., so realistically: std::exception or nix::BaseError.

So we might prefer to "inline" its behavior into the catch clauses where it matters, and rely on reviews or perhaps linting to keep overly broad catch clauses out of the code. Wdyt?

@lf-
Copy link
Member

lf- commented Oct 10, 2024

So we might prefer to "inline" its behavior into the catch clauses where it matters, and rely on reviews or perhaps linting to keep overly broad catch clauses out of the code. Wdyt?

Commentary from the Lix side: we have nuked the most egregious catch in nix (which is std::exception falling out of commands not being an immediate crash with core dump) and caused it to call std::terminate which then prints a stack trace and the exception type id in a custom handler. In general these catch clauses are evil and severely impede debugging: if you have an unknown exception in the daemon you are just screwed, because there's a pile of other exceptions thrown in normal program operation so breakpoints on __cxa_throw or so hit dozens of false positives.

(granted printing a stack trace still isn't useful because it's not the stack trace of the throw so it doesn't make the one in main much less bad but it does make every other less broad one less evil; see the library cpptrace for one possible solution to that)

You might want to do similarly.

@roberth roberth force-pushed the fix-Interrupted-falling-out-of-thread branch from 7c337e4 to 47e3e7a Compare October 13, 2024 11:20
@roberth
Copy link
Member Author

roberth commented Oct 13, 2024

I've got rid of the loopUntilInterrupted HOF, which I only ended up using in one case, where it wasn't clear that Interrupted could even be raised.

This is a nice and short diff now.

I think @lf- is making a good point, and I'd focus on removing/replacing bad catch clauses in future fixes and cleanups.
I'll follow this PR up with renames that clarify the behavior of these so-called ignore functions, but keep this diff small.

@edolstra
Copy link
Member

@roberth Looks like you need to add an #include "signals.hh".

Otherwise, if checkInterrupt() in any of the supported store operations
would catch onto a user interrupt, the exception would bubble to the thread
start and be handled by std::terminate(): a crash.
Introduced in 8f6b347 without explanation.

Throwing anything that's not that is a programming mistake that we don't want
to ignore silently. A crash would be ok, because that means we/they can fix
the offending throw.
@roberth roberth force-pushed the fix-Interrupted-falling-out-of-thread branch 2 times, most recently from 7c337e4 to fd8a4a8 Compare October 16, 2024 15:57
src/libutil/thread-pool.cc Outdated Show resolved Hide resolved
Co-authored-by: Cole Helbling <cole.e.helbling@outlook.com>
@roberth roberth merged commit f51974d into NixOS:master Oct 16, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants