Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

At startup, wait until all previously-observed commits have been sent to the consensus handler. #19344

Merged
merged 1 commit into from
Sep 14, 2024

Conversation

mystenmark
Copy link
Contributor

@mystenmark mystenmark commented Sep 12, 2024

This is necessary for proper crash recovery with data quarantining, as we will need to wait until all previously-built checkpoints have been rebuilt before starting CheckpointExecutor

Copy link

vercel bot commented Sep 12, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 14, 2024 0:10am
3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Sep 14, 2024 0:10am
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Sep 14, 2024 0:10am
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Sep 14, 2024 0:10am

Copy link
Member

@mwtian mwtian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

) {
let highest_handled_commit = self.highest_handled_commit();
assert!(
highest_observed_commit_at_startup >= highest_handled_commit,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be the case if we want to revert and recreate latest consensus commits, but somehow doesn't need to revert state on the Sui side. But it seems unlikely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm confused how that would be possible, but we can always fix the assert later if that comes up

if highest_handled >= highest_observed_commit_at_startup {
return;
}
rx.changed().await.unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe exit the loop instead of panic in case of shutdown order issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think hitting the unwrap is impossible - this method has a &self reference, so the watch::Sender cannot possibly be dropped while we are in this method.

consensus/core/src/commit_consumer.rs Outdated Show resolved Hide resolved

assert!(
*commit == 0,
"highest_known_commit_at_startup can only be set once"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/highest_known_commit_at_startup/highest_observed_commit_at_startup

@@ -203,6 +212,9 @@ where
let store_path = context.parameters.db_path.as_path().to_str().unwrap();
let store = Arc::new(RocksDBStore::new(store_path));
let dag_state = Arc::new(RwLock::new(DagState::new(context.clone(), store.clone())));

let highest_known_commit_at_startup = dag_state.read().last_commit_index();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, optional: highest_observed_commit, and maybe set it closer to its usage?

@mystenmark mystenmark enabled auto-merge (squash) September 14, 2024 00:09
@mystenmark mystenmark merged commit 494f252 into main Sep 14, 2024
48 checks passed
@mystenmark mystenmark deleted the mlogan-consensus-info branch September 14, 2024 00:29
suiwombat pushed a commit that referenced this pull request Sep 16, 2024
… to the consensus handler. (#19344)

This is necessary for proper crash recovery with data quarantining, as
we will need to wait until all previously-built checkpoints have been
rebuilt before starting CheckpointExecutor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants