Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Validators finality sometimes stalling on new sessions #923

Closed
ddorgan opened this issue Mar 21, 2020 · 6 comments
Closed

Validators finality sometimes stalling on new sessions #923

ddorgan opened this issue Mar 21, 2020 · 6 comments
Assignees
Labels
U1-asap No need to stop dead in your tracks, however issue should be addressed as soon as possible.

Comments

@ddorgan
Copy link
Member

ddorgan commented Mar 21, 2020

Some validators are sometimes having finality stalls on the start of a new session. These nodes aren't as well connected as other validators who are not showing the issue. But they should have plenty of sentry nodes to operate.

I've posted the logs in chat (parity access only)

@ddorgan ddorgan added the U1-asap No need to stop dead in your tracks, however issue should be addressed as soon as possible. label Mar 21, 2020
@arkpar
Copy link
Member

arkpar commented Mar 31, 2020

I don't see any syncing issues in the logs. @andresilva Could you check what's going on with finality?

@andresilva
Copy link
Contributor

Issue seems to be that sentries don't treat their peers preferably regarding gossip. Peer selection is essentially random so the sentry might not select one of its reserved peers and eventually the peer starts getting behind as it doesn't get the latest vote data from anywhere else. The less sentry nodes the validator has the more likely this is to happen and hence why we're seeing it more often on the AWS region validators.

We rolled out a hacky fix to our AWS nodes to see if this was indeed the cause and it seems to have worked so far, we are not observing these hourly stalls anymore.

@arkpar arkpar closed this as completed Apr 9, 2020
@ddorgan
Copy link
Member Author

ddorgan commented Apr 9, 2020

@arkpar is this fixed in master? We still have nodes alerting quite frequently on this and would like to test the fix.

@andresilva
Copy link
Contributor

Yes, master should have the proper fix that I mentioned above. Keep in mind that the --sentry flag now takes a list of validator node addresses, it's no longer needed to add them as reserved nodes.

@ddorgan
Copy link
Member Author

ddorgan commented Apr 9, 2020

Perfect thanks!

HCastano added a commit to HCastano/polkadot that referenced this issue Apr 27, 2021
f95cc7a5 Merge branch 'master' into hc-add-wococo-support
a13ee0bc Bump Substrate (paritytech#939)
f8680cbf jsonrpsee alpha6 (paritytech#938)
6163bcbf reonnect to failed client in on-demand relay background task (paritytech#936)
14e82bea Do not spawn additional task for on-demand relays (paritytech#933)
b1557b88 Relay at least one header for every source chain session (paritytech#923)
9420649c Remove deprecated Runtime Header APIs (paritytech#932)
9627011e Update README.md (paritytech#931)
7b736b9c Truncate output in logs. (paritytech#930)
faad06e3 Make sure that relayers have dates in logs. (paritytech#927)
07734535 Update dump-logs script. (paritytech#928)
efe215e4 RustFmt
02522249 Fix test
48b41d82 Add support for relaying headers between Rococo and Wococo
f16b6b41 Add CLI support for initializing the Wococo<>Rococo bridge
8c6e6443 Add more Wococo boilerplate code
c2d56b2e Add pruning to bechmarks & update weights. (paritytech#918)
a30c51dc Add properties to Chain Spec (paritytech#917)
28d3ed2f Add Wococo primitives crate
d691c73e Fix issue with on-demand headers relay not starting (paritytech#921)
8ee55c1e Fix image publishing. (paritytech#922)
f51fb59d Prefix in relay loops logs (paritytech#920)

git-subtree-dir: bridges
git-subtree-split: f95cc7a57b48948f17d33f5be3ea01c752deba94
HCastano added a commit that referenced this issue Apr 29, 2021
801c99f3 Add Wococo<>Rococo Header Relayer (#925)
21f49051 Remove Westend<>Rococo header sync (#940)
06235f16 do not panic if pallet is not yet initialized (#937)
a13ee0bc Bump Substrate (#939)
f8680cbf jsonrpsee alpha6 (#938)
6163bcbf reonnect to failed client in on-demand relay background task (#936)
14e82bea Do not spawn additional task for on-demand relays (#933)
b1557b88 Relay at least one header for every source chain session (#923)
9420649c Remove deprecated Runtime Header APIs (#932)
9627011e Update README.md (#931)
7b736b9c Truncate output in logs. (#930)
faad06e3 Make sure that relayers have dates in logs. (#927)
07734535 Update dump-logs script. (#928)
c2d56b2e Add pruning to bechmarks & update weights. (#918)
a30c51dc Add properties to Chain Spec (#917)
d691c73e Fix issue with on-demand headers relay not starting (#921)
8ee55c1e Fix image publishing. (#922)
f51fb59d Prefix in relay loops logs (#920)

git-subtree-dir: bridges
git-subtree-split: 801c99f3de0fa4d0b61e4e065fa30817179368ea
tomusdrw added a commit that referenced this issue May 3, 2021
f43c92430 Fix account derivation in CLI (#952)
9ac07e733 Add backbone configuration of cargo-spellcheck (#924)
2761c3fef Message dispatch support multiple instances (#942)
801c99f3d Add Wococo<>Rococo Header Relayer (#925)
21f490514 Remove Westend<>Rococo header sync (#940)
06235f162 do not panic if pallet is not yet initialized (#937)
a13ee0bc3 Bump Substrate (#939)
f8680cbfc jsonrpsee alpha6 (#938)
6163bcbf4 reonnect to failed client in on-demand relay background task (#936)
14e82bea3 Do not spawn additional task for on-demand relays (#933)
b1557b882 Relay at least one header for every source chain session (#923)
9420649c1 Remove deprecated Runtime Header APIs (#932)
9627011e1 Update README.md (#931)
7b736b9cc Truncate output in logs. (#930)
faad06e39 Make sure that relayers have dates in logs. (#927)
077345351 Update dump-logs script. (#928)
c2d56b2e9 Add pruning to bechmarks & update weights. (#918)
a30c51dc9 Add properties to Chain Spec (#917)
d691c73e9 Fix issue with on-demand headers relay not starting (#921)
8ee55c1e1 Fix image publishing. (#922)
f51fb59d0 Prefix in relay loops logs (#920)

git-subtree-dir: bridges
git-subtree-split: f43c924301c227d29ec161f6815d9bac458a211d
HCastano added a commit that referenced this issue May 4, 2021
b2099c5c Bump Substrate to `b094edaf` (#958)
3f037094 Bump endowment amounts on Rialto and Millau (#957)
b21fd07c Bump Substrate WASM builder (#947)
30ccd07c Bump Substrate to `ec180313` (#955)
a7422ab1 Upgrade to GitHub-native Dependabot (#945)
ed20ef34 Move pallet-bridge-dispatch types to primitives (#948)
2070c4d6 Endow accounts and add `bridgeIds` to chainspec. (#951)
f43c9243 Fix account derivation in CLI (#952)
9ac07e73 Add backbone configuration of cargo-spellcheck (#924)
2761c3fe Message dispatch support multiple instances (#942)
801c99f3 Add Wococo<>Rococo Header Relayer (#925)
21f49051 Remove Westend<>Rococo header sync (#940)
06235f16 do not panic if pallet is not yet initialized (#937)
a13ee0bc Bump Substrate (#939)
f8680cbf jsonrpsee alpha6 (#938)
6163bcbf reonnect to failed client in on-demand relay background task (#936)
14e82bea Do not spawn additional task for on-demand relays (#933)
b1557b88 Relay at least one header for every source chain session (#923)
9420649c Remove deprecated Runtime Header APIs (#932)
9627011e Update README.md (#931)
7b736b9c Truncate output in logs. (#930)
faad06e3 Make sure that relayers have dates in logs. (#927)
07734535 Update dump-logs script. (#928)
c2d56b2e Add pruning to bechmarks & update weights. (#918)
a30c51dc Add properties to Chain Spec (#917)
d691c73e Fix issue with on-demand headers relay not starting (#921)
8ee55c1e Fix image publishing. (#922)
f51fb59d Prefix in relay loops logs (#920)

git-subtree-dir: bridges
git-subtree-split: b2099c5c0baf569e2ec7228507b6e4f3972143cc
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
U1-asap No need to stop dead in your tracks, however issue should be addressed as soon as possible.
Projects
None yet
Development

No branches or pull requests

5 participants