-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix trie by asking peers #4312
fix trie by asking peers #4312
Conversation
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Hi folks, I'm using Besu v22.7.1 with Nimbus v22.8.1 on Goerli/Prater and ran into an issue that @garyschulte recommended I provide here, so you can gauge whether or not this PR would fix it. Yesterday my Besu node ran into a critical error and it stopped importing new blocks:
At which point, Nimbus started printing the following over and over again and stopped syncing:
This was on one of my Does this look like it would be caught by this PR and resolved autonomously, or is there something else to do here? |
do you still have the error? if so could you run this PR to see if it fixes the problem ? |
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logger name needs to be corrected, and I think we can do better than ifPresent
for that Optional. Other than that, just comments.
return maybeFallbackNodeFinder; | ||
} | ||
|
||
public void addFallbackNodeFinder(final Optional<PeerTrieNodeFinder> maybeFallbackNodeFinder) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I saw this was called addFallbackNodeFinder, I expected it to be adding to a collection. This is not the case as there is only a single member to use for fallback. perhaps setFallbackNodeFiner
or useFallbackNodeFinder
would be more intuitive.
...ain/java/org/hyperledger/besu/ethereum/eth/sync/worldstate/WorldStatePeerTrieNodeFinder.java
Outdated
Show resolved
Hide resolved
...ain/java/org/hyperledger/besu/ethereum/eth/sync/worldstate/WorldStatePeerTrieNodeFinder.java
Outdated
Show resolved
Hide resolved
msg.capability, new DefaultMessage(peerConnection, response)); | ||
} else | ||
snapProtocolManager.ifPresent( | ||
protocolManager -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also wrap this in a capability check to make sure snap manager can handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure to understand this one ? modifying this test class or another part ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this test class. It will exercise the capabilities collection checking.
if (response.containsKey(nodeHash)) { | ||
LOG.trace("Found node {} with getNodeData request", nodeHash); | ||
return Optional.of(response.get(nodeHash)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else case isn't handled, I think it might warrant a TRACE statement just like the if case.
if (nodeValue != null && Hash.hash(nodeValue).equals(nodeHash)) { | ||
LOG.trace("Found node {} with getTrieNode request", nodeHash); | ||
return Optional.of(nodeValue); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same suggestion here, to better track our assumptions.
ethereum/eth/src/main/java/org/hyperledger/besu/ethereum/eth/sync/DefaultSynchronizer.java
Show resolved
Hide resolved
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
@@ -101,6 +101,8 @@ private DefaultBlockchain( | |||
chainHeadTransactionCount = chainHeadBody.getTransactions().size(); | |||
chainHeadOmmerCount = chainHeadBody.getOmmers().size(); | |||
|
|||
System.out.println("head " + getChainHead().getHeight() + " " + getChainHead().getHash()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably meant to use a Logger or remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for test forgot to remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Yes, I've actually had pretty consistent failures on Prater using 22.7.1 (and later a 22.7.2 snapshot from a few days ago). It's always related to a Here's another example:
I will build an image with this PR and try it out. |
Unfortunately, this PR didn't seem to fix my problem - it seems to have made it considerably worse; I can barely keep Besu synced for an hour before it breaks with constant log messages such as:
It looks like RocksDB is constantly timing out. Perhaps there's a setting somewhere that is just overly aggressively tuned? CC @garyschulte, did that change between 22.7.1 and this PR? I've attached a full log file here for reference so you can see the stream of errors... hopefully there is some insight to be gleaned: |
* fix trie by asking peers Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
* fix trie by asking peers Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
* fix trie by asking peers Signed-off-by: Karim TAAM <karim.t2am@gmail.com> Signed-off-by: garyschulte <garyschulte@gmail.com>
* fix trie by asking peers Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM karim.t2am@gmail.com
PR description
we may have an inconsistency in the worldstate trie. If this happens even if the cases are very rare and complex to reproduce this PR can help resolve the issue. Indeed if we detect a problem in the trie we can fix it directly by asking our peers for the correct version of the node
We start by asking with the GetNodeData request (which will soon be deprecated but which is still compatible and more practical) and if the response is empty we try with GetTrieNode
Fixed Issue(s)
Documentation
doc-change-required
label to this PR ifupdates are required.
Changelog