You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As we debug broker resiliency for the system instance, it may be useful for testing or to trigger recovery in a hung instance, to be able to force a broker peer to "panic" and restart its subtree, as though its parent had crashed.
Problem: a misbehaving node may need to be administratively
detached from the flux instance.
Define a keepalive message type of KEEPALIVE_DISCONNECT that can
be sent by parent to child to force a disconnect. Upon receiving
this message, the child disconnects the socket, purges the parent
RPC tracker, and marks the connection offline so future RPCs fail
with EHOSTUNREACH.
Add an RPC overlay.disconnect-subtree that takes a rank argument,
so that a system administrator could initiate teardown of a problem
node.
Fixesflux-framework#3805
Problem: a misbehaving node may need to be administratively
detached from the flux instance.
Define a keepalive message type of KEEPALIVE_DISCONNECT that can
be sent by parent to child to force a disconnect. Upon receiving
this message, the child disconnects the socket, purges the parent
RPC tracker, and marks the connection offline so future RPCs fail
with EHOSTUNREACH.
Add an RPC overlay.disconnect-subtree that takes a rank argument,
so that a system administrator could initiate teardown of a problem
node.
Fixesflux-framework#3805
As we debug broker resiliency for the system instance, it may be useful for testing or to trigger recovery in a hung instance, to be able to force a broker peer to "panic" and restart its subtree, as though its parent had crashed.
This may be useful in conjunction with #2797
The text was updated successfully, but these errors were encountered: