-
Notifications
You must be signed in to change notification settings - Fork 380
Remove "dead" leaves #432
Comments
I hope this can be done before Beyond the End of the Century 😅 |
Maybe :P (I propose that you google this term :P) |
One of our testnet RPC node crashed again due to this and looks like the only solution is to purge db and resync. |
I think that could be a good idea to generalize the leaves pruning feature to Substrate and not limit it to Cumulus. The idea is to allow the user to pragmatically set the lifetime of chain leaves in order to optionally prune the leaves that are assumed to be "dead". This pruning strategy can be enabled where is more appropriate, for example to solve the current observed behavior in cumulus where new blocks are never accepted because we've saturated the max leaves per height limit (currently 32). For the pruning criteria I see three options:
Pruning criteria using lifetime.When a new leaf is added to the chain then we are going to start some kind of timer in order to limit the leaf lifespan. The timer for a leaf 'L' is stopped if a child 'C' is added to 'L', in this case the timer is restarted for the new leaf 'C'. If the lifetime for a leaf 'L' is over, then we are going to remove all its parents down to one of the following nodes
|
@davxy For the scenario described at the start of the issue, where we have reached the maximum allowed leaves at a given height, the timer-based solution you propose would be the same as just constantly deleting the oldest stall leaf, right? (because those will time out first) |
I would first start concentrating on Cumulus aka Parachains. For normal chains this can not really happen that easily that you build too many blocks on the same level (yes it is still possible, but much more unlikely then for Parachains). We have once seen this error on a Polkadot test net, because the chain selection rule had gone apeshit, but that was a bug. This issue is also not really that much about leaves, more about blocks on the same height. I just used the naming as it is being used in Substrate. |
Equilibrium catch the same |
@mn13 could you share how you fixed it? Facing the same issue also on Moonbase Alpha atm. |
Resync or you can also try to call |
We have applied v9.0.37 which includes this fix and the problem happened again. "revert" is not working at all and I have a separate issue open for that. Anyone managed to fix this and keep the data without purging the chain?
|
|
Please see answers below: 1: We had this issue twice past week, we have updated one of the collators on Tuesday to v0.9.37, then it solved the problem temporarly, then we updated the rest of the collators to v0.9.37 too. 2: I have the log for you how would you like me to send it? 3: I believe we do use the latest, please see it here |
There has been a slight change, I have deleted the DB for one of the nodes, waited for it to resync with the other nodes, then I found the following error message in the logs: Block import error: Potential long-range attack: block not in finalized chain. After that this node started producing blocks and lost connection to all other 3 collators. So now I deleted the DB of the other 3 collators and they are syncing now with the working node. Hopefully this will work now. |
Wherever you like, I have no preferences |
@davxy I have mailed a download link across. |
Looking at your logs I can't see any message from the monitor that should keep the number of blocks per level within the limit (it is using debug messages with target In particular, when you start the node you should see something like: Then instead of seeing the overflow error you should see something like: I can't spot these messages in your logs. Can you try to start the node and check that at least |
When proposing new PoV blocks with a collator, it can happen that we are "stuck" as the relay chain for whatever reason doesn't include our blocks. After X blocks the collator will start to fail importing new blocks, because we have imported too many leaves.
Using paritytech/substrate#8533 we will be able to delete "dead" leaves.
We will require some criterias to decide wether a leave should be seen as "dead":
The text was updated successfully, but these errors were encountered: