Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sidechain jams pending extrinsics (one extrinsic nonce skipped?) #1594

Open
brenzi opened this issue Mar 27, 2024 · 5 comments
Open

sidechain jams pending extrinsics (one extrinsic nonce skipped?) #1594

brenzi opened this issue Mar 27, 2024 · 5 comments

Comments

@brenzi
Copy link
Collaborator

brenzi commented Mar 27, 2024

Since running against polkadot-1.6.0 para (or since running SBliff legacy), we observe that after a few minutes or hours of operation, the validateer can't send extrinsics to the para anymore

checking pending extrinsics on the validateers rpc node (on same machine) shows that pending extrinsics are stacking up. The other nodes in the network don't see these extrinsics (or banned them?)

  1. run validateer and wait until no more block confirmation events onchain
    • last sidechain.FinalizedSidechainBlock: sidechain: 75,441, integritee: 7950
    • last enclaveBridge.ProcessedParentchainBlock: 7946
  2. verify there are pending extrinsics on the validateers rpc node
  3. restart rpc node
  1. restart SCV
    • first attempt:
      • first xt gets included (register QE collateral)
      • second xt fails with (register TCBinfo) Transaction is outdated
    • in the same block we get
      • 472x enclaveBridge.ProcessedParentchainBlock (7948..8419)
      • 1x sidechain.FinalizedSidechainBlock(#75521) (only one valid. many failed with sidechain.AncestorNumberMismatch which makes sense all tried ancestor 75381) failures:(75461..81041 in steps of 20)
  2. restart SCV again
    • smooth start.
      • first enclaveBridge.ProcessedParentchainBlock: 8420
      • first sidechain.FinalizedSidechainBlock: 81061,

The fact that restarting the SCV triggers the avalanche tells me that all jammed extrinsics had futurestatus and that the next expected nonce was only coming in after the SCV restart fetched the latest nonce from the chain fresh.

possible workaround:

  • upon each parentchain block import, refresh nonce. but that may still not be acceptable becasue we can't afford missing a single extrinsic (could be unshield)
@brenzi brenzi changed the title sidechain jams pending extrinsics sidechain jams pending extrinsics (one extrinsic nonce skipped?) Mar 27, 2024
@brenzi
Copy link
Collaborator Author

brenzi commented Mar 27, 2024

it happened again. this time I was fast enough to collect logs:

last parachain block that has events: 8531

  • @ 8531 enclaveBridge.confirmProcessedParentchainBlock 8527 (nonce: e53b: 3833)
  • @ 8529 sidechain.confirmImportedSidechainBlock ancestorMismatch! (nonce e13b: 3832) SCV sends (ancestor: 81,761, candidate 81,841) but in that block, ancestor should be: 81,821
  • @ 8529 enclaveBridge... (nonce dd3b: 3831)
  • @ 8528 last successful sidechain.confirmImportedSidechainBlock ancestorMismatch! SCV sends (ancestor: 81,761, candidate 81,821). nonce: d53b
  • @ 8528 enclaveBridge... (nonce d93b)

enclave account @ 8710

system.account: FrameSystemAccountInfo
{
  nonce: 3,834
  consumers: 0
  providers: 1
  sufficients: 0
  data: {
    free: 29,089,793,572,770
    reserved: 0
    frozen: 0
    flags: 170,141,183,460,469,231,731,687,303,715,884,105,728
  }
}

@brenzi
Copy link
Collaborator Author

brenzi commented Mar 28, 2024

Analyzing the pending extrinsics pool on the validateers node:

conclusion:

  • no extrinsic has been lost. the next xt up for inclusion is 3834 and this one is present in the mempool. But why do the collators not include it?

@brenzi
Copy link
Collaborator Author

brenzi commented Mar 28, 2024

manually injecting the next xt on the collator works:
curl http://localhost:9933 -X POST -H "Content-Type: application/json" --data '{"method":"author_submitExtrinsic","params":["0x35038400d5af4118cfbe99a94626d14cd004c41eb05b4e98f559f432dfc5a5f2c8392584000fdbab1c43040b4b03751a3e4decde03e0969bb1c5e1c7d1e95788376a5935362c7bc5f682c5ba04c651e29085652135e6605716c5dfb7c315295d166039f80500e93b003601496e636f676e69746565546573746e6574303030303030303030303030303033d3cada17c76f6207e0c6b7e16c2d59ee0966bccd1ae654bd23f72096f50bbca65021000034a62e525e9f4cd4e2d2f7ca524e43af543cc9dac3a8e85457a63a581b4174fc"],"id":1,"jsonrpc":"2.0"}'

It seems there is no problem with the extrinsic or the validateer. it's a peering/gossiping problem

manually submitted the 3 xt's which were in the paseo-rpc-0 as pending. didn't trigger an avalanche of all remaining xt's

curl -sS -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "system_peers"}' http://localhost:9933/
{"jsonrpc":"2.0","result":[{"peerId":"12D3KooWFkUQi6u5a2gQssgWQ5PtZWNcfVdP2Va4SqJcS35LRNFR","roles":"AUTHORITY","bestHash":"0xb086b9a7798c376c63ce9f0acc52ace67e5b4bff4ef48a36a5347f70f729b7cb","bestNumber":14266},{"peerId":"12D3KooWEoz4SJBU18MgmRLaQjnm571WsZUThBZnL2vpc6biSPfu","roles":"AUTHORITY","bestHash":"0xb086b9a7798c376c63ce9f0acc52ace67e5b4bff4ef48a36a5347f7

the validateer's node is peered with the authorities (but not with the rpc nodes currently). The authority has 4 peers (so we are fully connected there) So it has connection to authorities but still doesn't broadcast its pending extrinsics

If I ask iro-collator-02 what the enclave's next nonce would be:
curl http://localhost:9933 -X POST -H "Content-Type: application/json" --data '{"method":"system_accountNextIndex","params":["5Gtt7oCnD3aZPUMx5B2QrXAt5nfoi4LfBSG8xeTMLmDq9qJN"],"id":1,"jsonrpc":"2.0"}'
it tells me 3837 now, which is in line with the last included xt.
Asking the same question to the validateer's node, it returns the same (although that node has an xt with that nonce in its pending extrinsics)

@brenzi
Copy link
Collaborator Author

brenzi commented Mar 28, 2024

@brenzi
Copy link
Collaborator Author

brenzi commented Jun 12, 2024

workaround: restart authority collator every 5min

proper solution, hopefully: paritytech/polkadot-sdk#1202

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant