Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1828066 - At shutdown block with a timeout and bail out if that fails. #2461

Merged
merged 2 commits into from
May 10, 2023

Conversation

badboy
Copy link
Member

@badboy badboy commented May 3, 2023

No description provided.

@badboy badboy requested a review from chutten May 3, 2023 12:50
@badboy
Copy link
Member Author

badboy commented May 3, 2023

What happens in the bug?

Crash report: https://crash-stats.mozilla.org/report/index/47e42dbc-f99c-474f-8154-bbf470230413#allthreads

main thread hangs in glean_core::dispatcher::global::block_on_queue() during shutdown (code).

thread glean.init hangs in dispatcher::flush_init() (code)
Assumption: some task scheduled a ping?

thread glean.dispatcher triggered handle_client_active(), thus submitting a ping synchronously.
This hangs in std::sys::windows::fs::rename().

thread glean.upload is waiting for viaduct to make a request (why?)

This fix

We only briefly wait for the dispatcher queue to empty. If it currently is running a task that takes too long, we just skip any further shutdown cleanup.

Guarantees?
We only wait up to 5s on the dispatcher.
If the dispatcher unblocks quicker we continue.
However we then shut down the dispatcher, which blocks again. In theory there's a window where new tasks get put on the dispatcher and could block for longer.
We also keep the blocking on the uploader shutdown (up to 30s), as well as persisting ping lifetime data (itself I/O, so could block).
So really we do catch this one instance where somehow a ping submission is in progress when shutting down, but taking long.

The 5s timeout is currently hardcoded. Should we make this compile-time/runtime configurable?

glean-core/src/dispatcher/mod.rs Show resolved Hide resolved
glean-core/src/dispatcher/mod.rs Outdated Show resolved Hide resolved
glean-core/src/lib.rs Show resolved Hide resolved
@badboy badboy force-pushed the 1828066/timeout-on-shutdown branch 2 times, most recently from 968d39d to 3288f48 Compare May 5, 2023 09:44
@badboy badboy marked this pull request as ready for review May 5, 2023 09:48
@badboy badboy requested a review from a team as a code owner May 5, 2023 09:48
@badboy badboy force-pushed the 1828066/timeout-on-shutdown branch from 3288f48 to a389965 Compare May 5, 2023 09:49
@badboy badboy requested a review from chutten May 9, 2023 09:09
type: timing_distribution
time_unit: millisecond
description: |
Time waited for the dispatcher to unblock during shutdown.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we've recently learned the benefit of knowing timeouts within wait timing distributions, perhaps include a note that we expect most samples to be less than the default timeout of 10s

@badboy badboy force-pushed the 1828066/timeout-on-shutdown branch from a389965 to 4bae59a Compare May 10, 2023 08:38
@badboy badboy enabled auto-merge (rebase) May 10, 2023 08:41
@badboy badboy merged commit ce37542 into main May 10, 2023
@badboy badboy deleted the 1828066/timeout-on-shutdown branch May 10, 2023 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants