Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Prevent dispute-coordinator from doing any work before the initial node sync is complete. #6682

Closed
wants to merge 1 commit into from

Conversation

tdimitrov
Copy link
Contributor

@tdimitrov tdimitrov commented Feb 7, 2023

On the first ActiveLeavesUpdate the subsystem queries the runtime to obtain RollingSessionWindow. This often leads to errors because the first leaf update generated by overseer is either the genesis block (when the local database is empty) or the last seen block before the node was stopped. This often leads to NotSupported errors when querying the runtime api.

The mitigation is to pass a SyncOracle instance when constructing dispute coordinator and don't do any work until the full sync is complete.

Related to paritytech/polkadot-sdk#793

…node sync is complete.

On the first `ActiveLeavesUpdate` the subsystem queries the runtime to
obtain `RollingSessionWindow`. This often leads to errors because the
first leaf update generated by overseer is either the genesis block
(when the local database is empty) or the last seen block before the
node was stopped. This often leads to `NotSupported` errors when
querying the runtime api.

The mitigation is to pass a `SyncOracle` instance when constructing
`dispute coordinator` and don't do any work until the full sync is
complete.
@tdimitrov tdimitrov force-pushed the tsv-dispute-coord-wait-for-sync branch from a601ff8 to 0a1cd7f Compare February 7, 2023 14:52
@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable
Logs: https://gitlab.parity.io/parity/mirrors/polkadot/-/jobs/2361367

loop {
match ctx.recv().await? {
FromOrchestra::Signal(OverseerSignal::Conclude) => return Ok(None),
FromOrchestra::Signal(OverseerSignal::ActiveLeaves(update)) => {
if sync_oracle.is_major_syncing() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this could be racy:

  1. We receive the signal while still syncing, but we catch up quickly afterwards.
  2. is_major_syncing returns false, although we were syncing when receiving that update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: This PR is old and I have forgotten to close it in time.

Yes, it is. I was relying on the fact that we don't get ActiveLeaves during major sync (unless there is a reorg) so this should be the initial leaf.

But then I quickly realized this is a bad assumption and thought about putting a flag in ActiveLeavesUpdate indicating if we are in initial major sync.

Then @ordian filed #6694 and we decided to fix the problem more generally by blocking signals in overseer during initial major sync.

The result is #6689 which is still work in progress.

@tdimitrov
Copy link
Contributor Author

Closing this in favor of #6689

@tdimitrov tdimitrov closed this Feb 13, 2023
@bkchr bkchr deleted the tsv-dispute-coord-wait-for-sync branch February 13, 2023 21:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants