Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Warn on low connectivity. #3408

Merged
merged 3 commits into from
Jul 5, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 30 additions & 3 deletions node/network/gossip-support/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,12 @@ const LOG_TARGET: &str = "parachain::gossip-support";
// since the last authority discovery resolution failure.
const BACKOFF_DURATION: Duration = Duration::from_secs(5);

/// Duration after which we consider low connectivity a problem.
///
/// Especially at startup low connectivity is expected (authority discovery cache needs to be
/// populated). After one minute of low connectivity a warning seems to be justified.
const STARTUP_PERIOD: Duration = Duration::from_secs(60);
eskimor marked this conversation as resolved.
Show resolved Hide resolved

/// The Gossip Support subsystem.
pub struct GossipSupport {
keystore: SyncCryptoStorePtr,
Expand All @@ -64,6 +70,11 @@ struct State {
// at least a third of authorities the last time.
// `None` otherwise.
last_failure: Option<Instant>,

/// First time we did not reach our connectivity treshold.
eskimor marked this conversation as resolved.
Show resolved Hide resolved
///
/// Will be cleared once we reached it.
first_failure: Option<Instant>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the reason for this value. Especially as you reset it below, meaning this isn't always the first failure.

Wouldn't just a start: Instant be enough to check that we are above the threshold and start with the printing of the warning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first failure in a sequence of failures. So the first time we did not connect to enough validators of potentially many attempts. This is most important at startup, but I kept it more generic: Whenever we are not connected to at least 2/3rds of all validators for at least a minute, a warning is printed. I'll clarify the comment.

drahnr marked this conversation as resolved.
Show resolved Hide resolved
}

impl GossipSupport {
Expand Down Expand Up @@ -313,10 +324,26 @@ impl State {

// issue another request for the same session
// if at least a third of the authorities were not resolved
self.last_failure = if failures >= num / 3 {
Some(Instant::now())
if failures >= num / 3 {
let timestamp = Instant::now();
match self.first_failure {
None => self.first_failure = Some(timestamp),
Some(first) if first.elapsed() >= STARTUP_PERIOD => {
tracing::warn!(
eskimor marked this conversation as resolved.
Show resolved Hide resolved
target: LOG_TARGET,
connected = ?(num - failures),
target = ?num,
"Low connectivity"
);

}
Some(_) => {}
}

self.last_failure = Some(timestamp);
} else {
None
self.last_failure = None;
self.first_failure = None;
};

Ok(())
Expand Down