Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Healing Mechanism for Flat Database in Besu #5319

Merged
merged 49 commits into from
Jun 13, 2023

Conversation

matkt
Copy link
Contributor

@matkt matkt commented Apr 7, 2023

PR description

WARNING: Experimental Feature
This functionality is currently experimental and its stability cannot be guaranteed.
We have opened this feature to allow users to help us test it in order to transition it
to a non-experimental state as quickly as possible. Therefore, please use it only for testing purposes.

Sync time

I did severals test and this is the time I needed to sync on m6A.2xlarge

Finished worldstate snapsync with nodes 322101439 (healed=8281097) duration 14:17:0,69

For non-developers:

This feature aims to improve the performance of your node by enabling faster block processing time. It achieves this by reducing the number of disk accesses during block processing. The cost of this feature is a slightly larger database size and a slightly longer sync time (an additional 2 hours on m6.2xlarge).

Regarding storage It can vary, depending on your database. However, with the current state of the world state, we anticipate a maximum increase of 55 GB.

Account flat db : 10 GiB (ACCOUNT_INFO_STATE)
Storage flat db : 43 GiB (ACCOUNT_STORAGE_STORAGE)

Regarding sync time, we are also working on other optimizations to reduce the overall sync time, so this will be quickly offset.

Since the processing time will be faster, you should further reduce the chances of missing attestations, making your validator even better.

You need to run a Bonsai Besu node with this flag --Xsnapsync-synchronizer-flat-db-healing-enabled=true and resync the worldstate.

To do that you can delete your database and resync from scratch or you can just call this RPC endoint in an already synced Besu. This RPC call will trigger a resync for the worldstate only without downloading alll the blocks again, like that your sync will be faster (sometimes you need to restart besu after this call to really triggger the resync).

curl --location --request POST 'http://localhost:8545' \
--header 'Content-Type: application/json' \
--data-raw '{
    "jsonrpc": "2.0",
    "method": "debug_resyncWorldState",
    "params": [],
    "id": 1
}' 

For developers:

Why ?

Besu uses a rocksdb-based database to store the state of Ethereum as a Merkle tree and a flat database. The flat database contains the leaf nodes of the Merkle tree, allowing direct access through accountHash or slotHash without the need to traverse the entire tree.

During the snapsync process, Besu switches pivot blocks multiple times, resulting in a mix of several blocks in the state. While there exists a healing step to correct the tree (similar to fastsync), there was previously no mechanism to heal the flat database. Consequently, it was necessary to clear and rebuild the flat database post-sync, which significantly impacted SLOAD performance.

The proposed pull request introduces a feature that allows healing of the flat database by streaming the flat database data and validating it by generating a proof from the trie structure. If the proof is found to be invalid, the code traverses the trie to fix the invalid range. To optimize the process and avoid checking the entire flat database, the PR includes enhancements such as tracking the accounts that need to be repaired during SnapSync. By implementing these optimizations, the PR aims to significantly reduce the time and resources required for repairing the flat database.

Healing of the flat database is running at the end of SnapSync or Checkpoint Sync. This process is expected to take approximately 2/3 hours on the mainnet and is designed to improve the processing time of blocks by reducing the number of read database operations for SLOAD. By performing this database repair, it is anticipated that overall system performance will be enhanced, resulting in more efficient block processing and improved responsiveness for some rpc calls.

You need to see something like that during the healing step

{"@timestamp":"2023-05-16T17:44:00,413","level":"INFO","thread":"EthScheduler-Services-30 (batchHealAndPersistFlatAccountData)","class":"SnapsyncMetricsManager","message":"Worldstate flat database healing progress: 91.31%, Peer count: 25","throwable":""}
{"@timestamp":"2023-05-16T17:45:00,413","level":"INFO","thread":"EthScheduler-Services-30 (batchHealAndPersistFlatAccountData)","class":"SnapsyncMetricsManager","message":"Worldstate flat database healing progress: 92.25%, Peer count: 25","throwable":""}
{"@timestamp":"2023-05-16T17:46:09,351","level":"INFO","thread":"EthScheduler-Services-30 (batchHealAndPersistFlatAccountData)","class":"SnapsyncMetricsManager","message":"Worldstate flat database healing progress: 93.23%, Peer count: 25","throwable":""}
{"@timestamp":"2023-05-16T17:47:09,351","level":"INFO","thread":"EthScheduler-Services-30 (batchHealAndPersistFlatAccountData)","class":"SnapsyncMetricsManager","message":"Worldstate flat database healing progress: 94.18%, Peer count: 25","throwable":""}
{"@timestamp":"2023-05-16T17:48:55,268","level":"INFO","thread":"EthScheduler-Services-30 (batchHealAndPersistFlatAccountData)","class":"SnapsyncMetricsManager","message":"Worldstate flat database healing progress: 94.33%, Peer count: 25","throwable":""}
{"@timestamp":"2023-05-16T17:49:55,306","level":"INFO","thread":"EthScheduler-Services-30 (batchHealAndPersistFlatAccountData)","class":"SnapsyncMetricsManager","message":"Worldstate flat database healing progress: 95.39%, Peer count: 25","throwable":""}
{"@timestamp":"2023-05-16T17:50:55,306","level":"INFO","thread":"EthScheduler-Services-30 (batchHealAndPersistFlatAccountData)","class":"SnapsyncMetricsManager","message":"Worldstate flat database healing progress: 96.54%, Peer count: 25","throwable":""}
{"@timestamp":"2023-05-16T17:51:55,306","level":"INFO","thread":"EthScheduler-Services-30 (batchHealAndPersistFlatAccountData)","class":"SnapsyncMetricsManager","message":"Worldstate flat database healing progress: 97.41%, Peer count: 25","throwable":""}
{"@timestamp":"2023-05-16T17:52:55,314","level":"INFO","thread":"EthScheduler-Services-30 (batchHealAndPersistFlatAccountData)","class":"SnapsyncMetricsManager","message":"Worldstate flat database healing progress: 98.38%, Peer count: 25","throwable":""}
{"@timestamp":"2023-05-16T17:53:55,316","level":"INFO","thread":"EthScheduler-Services-30 (batchHealAndPersistFlatAccountData)","class":"SnapsyncMetricsManager","message":"Worldstate flat database healing progress: 99.57%, Peer count: 25","throwable":""}
{"@timestamp":"2023-05-16T17:54:31,685","level":"INFO","thread":"EthScheduler-Services-36 (requestCompleteTask)","class":"SnapsyncMetricsManager","message":"Finished worldstate snapsync with nodes 312311304 (healed=570567) duration 6:59:24,983.","throwable":""}

Tested

  • Snapsync with healing
  • Stop an resume snapsync
  • Check all the entire state with Bela
  • Test auto heal with flat db mode downgrade

Healing Mechanism for Flat Database in Besu

The purpose of the healing mechanism is to ensure a complete flat database after the sync process and eliminate the need for fallbacks. This documentation outlines the steps involved in healing the flat database and improving SLOAD performance.

Healing Process

The healing process for the flat database involves the following steps:

  1. Tree Healing: Before healing the flat database, the Merkle tree is healed using the existing process, ensuring the tree accurately represents the state of Ethereum.

  2. Flat Database Verification: After the tree healing process, the flat database is verified for validity by traversing it in ranges and comparing the data with the Merkle tree. The purpose is to identify any inconsistencies between the flat database and the tree.

    • Define a range within the flat database to be verified.
    • Retrieve the corresponding data from the flat database within the specified range.
    • Generate a range proof for the retrieved data using the Merkle tree.
  3. Healing the Flat Database: If any inconsistencies are found during the verification process, the flat database needs to be corrected. To achieve this, the following steps are performed:

    • Identify the range within the flat database that contains incorrect data.
    • Traverse the corresponding range in the Merkle tree to locate the correct leaf nodes.
    • Replace the incorrect leaf nodes in the flat database with the correct ones obtained from the Merkle tree.
    • Repeat this process for all identified inconsistent ranges within the flat database.
  4. Completing the Healing Process: Once all identified inconsistent ranges have been corrected, the healing process for the flat database is complete. The flat database now accurately reflects the state of Ethereum after the sync process.

Performance Improvements

The healing mechanism for the flat database provides significant performance improvements, particularly for SLOAD operations and READ ZERO operations. With a complete and accurate flat database, the need for fallbacks to the Merkle tree is eliminated.

Previously, when data was not present in the flat database, Besu had to fallback to the Merkle tree for each SLOAD operation, resulting in multiple database accesses. This fallback process was unnecessary and had a significant impact on performance. Similarly, READ ZERO operations incurred redundant fallbacks.

By healing the flat database and ensuring its completeness, the number of fallbacks to the Merkle tree is reduced to 0, resulting in improved SLOAD performance. READ ZERO operations also benefit from the elimination of unnecessary fallbacks.

Range Proof Explanation:
A range proof is a cryptographic proof that allows verification of a range of leaf nodes in a Merkle trie. It demonstrates that a specific range of leaf nodes is part of the trie and that their hashes contribute to the root hash.

With a range proofs, a verifier can reconstruct the path from the root to the specified leaf nodes and calculate the root hash. If the calculated root hash matches the actual root hash , the range is considered valid, indicating that the corresponding data in the flat database is correct.

Block processing performance:
We can notice a 20% improvement (274 ms instead of 344 ms on 50th percentile) on the node running the flat database feature compared the node running current main branch.
image

Additionally, there has been a significant enhancement in outliers (99th and 100th percentiles), which will enhance the attestions' performance impacted by such outliers.

image image

CPU profiling:
The improvement is pretty clear when checking the profiling of both nodes, especially on the SLOAD operation

Without this PR (current main)
image

With this PR
image

Fixed Issue(s)

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
@github-actions
Copy link

github-actions bot commented Apr 7, 2023

  • I thought about documentation and added the doc-change-required label to this PR if updates are required.
  • I have considered running ./gradlew acceptanceTestNonMainnet locally if my PR affects non-mainnet modules.
  • I thought about the changelog and included a changelog update if required.

matkt added 10 commits April 7, 2023 11:37
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
@matkt matkt added performance mainnet TeamChupa GH issues worked on by Chupacabara Team bonsai labels May 2, 2023
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
@matkt matkt force-pushed the feature/try-fill-flat-db branch from a13fece to 35af584 Compare May 17, 2023 15:09
matkt added 5 commits May 17, 2023 17:12
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
@matkt matkt force-pushed the feature/try-fill-flat-db branch from a1a1bb3 to a3173ee Compare May 19, 2023 20:34
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
return false;
}

public abstract Stream<SnapDataRequest> getChildRequests(
final SnapWorldDownloadState downloadState,
final WorldStateStorage worldStateStorage,
final SnapSyncState snapSyncState);
final SnapSyncProcessState snapSyncState);

Check notice

Code scanning / CodeQL

Useless parameter

The parameter 'snapSyncState' is never used.
}
}

public HashSet<Bytes> getAccountsToBeRepaired() {

Check failure

Code scanning / CodeQL

Inconsistent synchronization of getter and setter

This get method is unsynchronized, but the corresponding [set method](1) is synchronized.
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
@matkt matkt force-pushed the feature/try-fill-flat-db branch from 7413945 to c2c5214 Compare May 29, 2023 14:26
matkt added 7 commits May 29, 2023 16:38
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
@matkt matkt requested a review from garyschulte June 3, 2023 06:40
Copy link
Contributor

@garyschulte garyschulte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets 🚢

garyschulte and others added 5 commits June 9, 2023 16:24
Signed-off-by: garyschulte <garyschulte@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
@matkt matkt enabled auto-merge (squash) June 13, 2023 12:57
@matkt matkt disabled auto-merge June 13, 2023 12:59
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
@matkt matkt enabled auto-merge (squash) June 13, 2023 13:53
@matkt matkt merged commit 180c751 into hyperledger:main Jun 13, 2023
pinges added a commit to pinges/besu that referenced this pull request Jun 19, 2023
pinges added a commit to pinges/besu that referenced this pull request Jun 19, 2023
This reverts commit 180c751.

Signed-off-by: Stefan <stefan.pingel@consensys.net>
davidkngo pushed a commit to liquichain/besu that referenced this pull request Jun 28, 2023
The proposed pull request introduces a feature that allows healing of the flat database by streaming the flat database data and validating it by generating a proof from the trie structure. If the proof is found to be invalid, the code traverses the trie to fix the invalid range. To optimize the process and avoid checking the entire flat database, the PR includes enhancements such as tracking the accounts that need to be repaired during SnapSync. By implementing these optimizations, the PR aims to significantly reduce the time and resources required for repairing the flat database.

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
davidkngo pushed a commit to liquichain/besu that referenced this pull request Jul 21, 2023
The proposed pull request introduces a feature that allows healing of the flat database by streaming the flat database data and validating it by generating a proof from the trie structure. If the proof is found to be invalid, the code traverses the trie to fix the invalid range. To optimize the process and avoid checking the entire flat database, the PR includes enhancements such as tracking the accounts that need to be repaired during SnapSync. By implementing these optimizations, the PR aims to significantly reduce the time and resources required for repairing the flat database.

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
davidkngo added a commit to liquichain/besu that referenced this pull request Jul 21, 2023
elenduuche pushed a commit to elenduuche/besu that referenced this pull request Aug 16, 2023
The proposed pull request introduces a feature that allows healing of the flat database by streaming the flat database data and validating it by generating a proof from the trie structure. If the proof is found to be invalid, the code traverses the trie to fix the invalid range. To optimize the process and avoid checking the entire flat database, the PR includes enhancements such as tracking the accounts that need to be repaired during SnapSync. By implementing these optimizations, the PR aims to significantly reduce the time and resources required for repairing the flat database.

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
eum602 pushed a commit to lacchain/besu that referenced this pull request Nov 3, 2023
The proposed pull request introduces a feature that allows healing of the flat database by streaming the flat database data and validating it by generating a proof from the trie structure. If the proof is found to be invalid, the code traverses the trie to fix the invalid range. To optimize the process and avoid checking the entire flat database, the PR includes enhancements such as tracking the accounts that need to be repaired during SnapSync. By implementing these optimizations, the PR aims to significantly reduce the time and resources required for repairing the flat database.

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bonsai mainnet performance TeamChupa GH issues worked on by Chupacabara Team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants