-
-
Notifications
You must be signed in to change notification settings - Fork 631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move fork context management to rust #5521
Conversation
Reviewable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not immediately apparently to me how this leads to being able to run background threads, but explicit fine-grained locking is probably good regardless, and I'm sure it's an important step :)
@@ -401,6 +401,14 @@ def _kill(self, kill_sig): | |||
if self.pid: | |||
os.kill(self.pid, kill_sig) | |||
|
|||
def _noop_fork_context(self, func): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is this the correct thing to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inlined and moved this docstring into the daemonize
pydoc.
src/python/pants/engine/scheduler.py
Outdated
|
||
def visualize_to_dir(self): | ||
return self._native.visualize_to_dir | ||
|
||
def to_keys(self, subjects): | ||
return list(self._to_key(subject) for subject in subjects) | ||
|
||
def pre_fork(self): | ||
self._native.lib.scheduler_pre_fork(self._scheduler) | ||
def with_fork_context(self, func): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a quick pydoc explaining what this is and how it should be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll refer to the rust docs on the topic (in lib.rs
).
|
||
# Perform the double fork under the fork_context. Three outcomes are possible after the double | ||
# fork: we're either the original process, the double-fork parent, or the double-fork child. | ||
# These are represented by parent_or_child being None, True, or False, respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this could use a tuple(is_original, is_parent) or something more enum-y, rather than a tri-state boolean?
if parent_or_child:
...
else:
...
doesn't read fantastically...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ache for actual enums... sigh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -37,20 +37,16 @@ def _launch_thread(f): | |||
def _extend_lease(self): | |||
while 1: | |||
# Use the fork lock to ensure this thread isn't cloned via fork while holding the graph lock. | |||
with self.fork_lock: | |||
self._logger.debug('Extending leases') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I have my logging back please? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops. Yep.
/// | ||
/// Run a function while the pool is shut down, and restore the pool after it completes. | ||
/// | ||
pub fn with_shutdown<F, T>(&self, f: F) -> T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do away with the lock entirely by making with_shutdown
take &mut self
?
(I can believe this is impractical, but it would be nice if possible...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, no... we have a reference to the pool via an Arc
, and getting a mutable reference into that would require either cloning or something potentially panicy.
@@ -190,6 +201,8 @@ pub fn unsafe_call(func: &Function, args: &[Value]) -> Value { | |||
///////////////////////////////////////////////////////////////////////////////////////// | |||
|
|||
lazy_static! { | |||
// NB: Unfortunately, it's not currently possible to merge these locks, because mutating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice comment :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity, is running everything to see if the behavior remains the same the best way to test something like this?
@@ -448,32 +456,43 @@ def daemonize(self, pre_fork_opts=None, post_fork_parent_opts=None, post_fork_ch | |||
daemons. Having a disparate umask from pre-vs-post fork causes files written in each phase to | |||
differ in their permissions without good reason - in this case, we want to inherit the umask. | |||
""" | |||
fork_context = fork_context or self._noop_fork_context | |||
|
|||
def double_fork(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth explaining in this context why double forking is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's explained in the comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that! Sorry!
c51ffad
to
e49fd02
Compare
Yea, basically. Kris has added a lot of tests to cover daemon usecases, so we can be pretty confident that nothing is fundamentally broken. |
e49fd02
to
6c47781
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great :) Thanks!
I'm going to hold onto this branch, but the direction we're headed in will no longer require forking. |
I believe that this is related to #6356, so I'm re-opening it in order to push a rebase and resume progress. |
6c47781
to
650af24
Compare
This rebased version of the patch has the same "shape" as the old version ( |
e04d1af
to
d7a4105
Compare
2cd2b97
to
166c3c8
Compare
…_context, which acquires all relevant resources.
166c3c8
to
c3c8afa
Compare
I didn't see a clear way to do "composition of a bunch of resettable objects" without running into object safety, so planning to land this as is. |
self.fs_pool.reset(); | ||
fn with_shutdown(&self, f: &mut FnMut() -> ()) { | ||
// TODO: Although we have a Resettable<CpuPool>, we do not shut it down, because our caller | ||
// will (and attempting to shut things down twice guarantees a deadlock because Resettable is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than deadlock, you'd actually panic: RwLock panics on reentrance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
As described in #6356, we currently suspect that there are cases where resources within the engine are being used during a fork. The python-side `fork_lock` attempts to approximate a bunch of other locks which it would be more accurate to directly acquire instead. Move "fork context" management to rust, and execute our double fork for `DaemonPantsRunner` inside the scheduler's fork context. This acquires all existing locks, which removes the need for a `fork_lock` that would approximate those locks. Also has the benefit that we can eagerly re-start the scheduler's CpuPool. It should be easier to add additional threads and locks on the rust side, without worrying that we have acquired the `fork_lock` in enough places. A series of replays of our internal benchmarks no longer reproduce the hang described in #6356, so this likely fixes #6356.
Problem
As described in #6356, we currently suspect that there are cases where resources within the engine are being used during a fork. The python-side
fork_lock
attempts to approximate a bunch of other locks which it would be more accurate to directly acquire instead.Solution
Move "fork context" management to rust, and execute our double fork for
DaemonPantsRunner
inside the scheduler's fork context. This acquires all existing locks, which removes the need for afork_lock
that would approximate those locks. Also has the benefit that we can eagerly re-start the scheduler's CpuPool.Result
It should be easier to add additional threads and locks on the rust side, without worrying that we have acquired the
fork_lock
in enough places.A series of replays of our internal benchmarks no longer reproduce the hang described in #6356, so this likely fixes #6356.