Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Better progress reporting for stage checkpoints #2982

Merged
merged 13 commits into from
Jun 5, 2023
Merged

Conversation

gakonst
Copy link
Member

@gakonst gakonst commented Jun 5, 2023

Main benefit is that progress reporting is now normalized for all stages by the cost unit for that stage. This allows better node CLI UX, % completion metrics etc. Next step would be better performance by normalizing the load on db commits.

We introduced two new metrics for sync: reth_sync_entities_processed and reth_sync_entities_total. By dividing them, we can get a percentage progress for each stage.

  • Headersprocessed is the number of downloaded headers, total is the number of total headers since genesis with respect to the known remote tip.
  • Bodies – reported as a block number, no change.
  • SenderRecoveryprocessed is the number of entries in the TxSenders table, total is the number of entries in the Transactions table.
  • TotalDifficulty – reported as a block number, no change.
  • (Account|Storage)Hashingprocessed is the number of entries in the Hashed(Account|Storage) table, total is the number of entries in the Plain(Account|Storage)State table.
  • Index(Account|Storage)Historyprocessed is the number of walked changesets, total is the number of total changesets in the (Account|Storage)ChangeSet table.
  • Merkleprocessed is the number of walked hashed accounts/storages, total is the number of entries in the HashedAccount and HashedStorage tables.
  • Executionprocessed is the amount of gas executed, total is the total amount of gas from headers.
  • TransactionLookupprocessed is the number of entries in the TxHashNumber table, total is the number of entries in the Transactions table.
  • Finish – N/A

We also need to be careful with the situation when we do an initial sync using Pipeline, then keep up with the network using BlockchainTree, then do a Pipeline sync again, because we don't go through stages during the BlockchainTree phase, hence no updates to checkpoint progresses is done beyond block numbers.
Currently, the solution is for stages to be able to detect the validity of checkpoint according to block range with which it was last ran. If it's invalid, we recalculate the whole checkpoint progress or the missing part, comparing the block ranges.

@gakonst gakonst changed the title Feat/checkpoints Better progress reporting for stage checkpoints Jun 5, 2023
@gakonst gakonst changed the title Better progress reporting for stage checkpoints feat: Better progress reporting for stage checkpoints Jun 5, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jun 5, 2023

Codecov Report

Merging #2982 (32c0618) into main (6fab79b) will increase coverage by 0.13%.
The diff coverage is 85.06%.

@@            Coverage Diff             @@
##             main    #2982      +/-   ##
==========================================
+ Coverage   71.04%   71.17%   +0.13%     
==========================================
  Files         519      519              
  Lines       67318    68077     +759     
==========================================
+ Hits        47824    48457     +633     
- Misses      19494    19620     +126     
Flag Coverage Δ
integration-tests 17.08% <0.84%> (-0.19%) ⬇️
unit-tests 66.13% <84.85%> (+0.20%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
bin/reth/src/args/stage_args.rs 0.00% <ø> (ø)
bin/reth/src/stage/drop.rs 2.46% <0.00%> (-0.57%) ⬇️
bin/reth/src/stage/run.rs 1.43% <0.00%> (-0.07%) ⬇️
crates/interfaces/src/db.rs 100.00% <ø> (ø)
crates/stages/src/stage.rs 94.23% <ø> (ø)
crates/storage/db/src/abstraction/mock.rs 0.00% <0.00%> (ø)
crates/storage/db/src/tables/mod.rs 0.00% <ø> (ø)
crates/trie/src/progress.rs 0.00% <ø> (ø)
crates/stages/src/pipeline/sync_metrics.rs 72.41% <12.50%> (-15.09%) ⬇️
crates/stages/src/pipeline/mod.rs 90.50% <77.77%> (-0.02%) ⬇️
... and 15 more

... and 9 files with indirect coverage changes

Copy link
Member

@onbjerg onbjerg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, next step is allowing stages to report progress in between commits, e.g. the execution/header/bodies stages could say how much they have processed every x seconds and not only when they commit (since these take a long time between commits)

}
}

// The function proceeds as follows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rewrite as doc comment so it shows up in tooltips etc

}
}

// The function proceeds as follows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same as above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shekhirin pls do in followup

@onbjerg onbjerg added C-enhancement New feature or request A-staged-sync Related to staged sync (pipelines and stages) labels Jun 5, 2023
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we/can we remove the SyncStageProgress table, since the mid-block checkpointing info is now stored in SyncStage?

Copy link
Collaborator

@shekhirin shekhirin Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really, the Merkle stage still uses it for the execute checkpoints. We couldn't merge it into the SyncStage straight away because it would force all the structs depending on StageCheckpoint to be Clone. It's not a breaking DB change to do it later because adding a new enum variant to StageUnitCheckpoint is backward-compatible.

Merged via the queue into main with commit 0890074 Jun 5, 2023
@gakonst gakonst deleted the feat/checkpoints branch June 5, 2023 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-staged-sync Related to staged sync (pipelines and stages) C-enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants