Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors "error flushing netlink socket" after upgrade to v0.9.41 #671

Open
Jony321 opened this issue Apr 6, 2023 · 3 comments
Open

Errors "error flushing netlink socket" after upgrade to v0.9.41 #671

Jony321 opened this issue Apr 6, 2023 · 3 comments

Comments

@Jony321
Copy link

Jony321 commented Apr 6, 2023

Hi!

After updating from v0.9.40 to v0.9.41, errors were found in the container logs (listing below), and the validator disappeared from the telemetry page (https://telemetry.polkadot.io/). Unfortunately, I didn't look at the logs before updating, but when I try to roll back to version v0.9.40, I get the same errors.
I have no idea why this could happen and how to restore the validator, besides, a similar version update was carried out on the test server and ended without errors.
Please help me to solve the problem.

Role of the node: validator.
Startup type: Docker CE (23.0.3)
Polkadot Image: parity/polkadot:v0.9.41

Command-line options:

        "Args": [
            "--chain",
            "kusama",
            "--name",
            "[OMMITED]",
            "--validator",
            "--database",
            "rocksdb",
            "--public-addr",
            "/ip4/[OMMITED]/tcp/30333",
            "--base-path",
            "/data/kusama/data",
            "--telemetry-url",
            "wss://telemetry-backend.w3f.community/submit 1",
            "--telemetry-url",
            "wss://telemetry.polkadot.io/submit/ 0",
            "--wasm-execution=Compiled",
            "--unsafe-rpc-external",
            "--unsafe-ws-external"

Node information:
OS: Ubuntu 20.04.4 LTS
Kernel: 5.4.0-135-generic
MemTotal: 32880944 kB
SwapTotal: 1999868 kB
CPU Threads: 16
CPU Frequency: 2000 MHz

Logs (repeated many times)

2023-04-06 14:18:07 error flushing netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." }    
2023-04-06 14:18:07 if watch returned an error: rtnetlink socket closed    
2023-04-06 14:18:07 failed to read from netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." }    
2023-04-06 14:18:07 netlink socket stream shut down    
2023-04-06 14:18:07 error flushing netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." }    
2023-04-06 14:18:07 if watch returned an error: rtnetlink socket closed    
2023-04-06 14:18:07 failed to read from netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." }    
2023-04-06 14:18:07 netlink socket stream shut down    
2023-04-06 14:18:07 error flushing netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." }    
2023-04-06 14:18:07 if watch returned an error: rtnetlink socket closed    
2023-04-06 14:18:07 failed to read from netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." }    
2023-04-06 14:18:07 netlink socket stream shut down    
2023-04-06 14:18:07 error flushing netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." }    
2023-04-06 14:18:07 if watch returned an error: rtnetlink socket closed    
2023-04-06 14:18:07 failed to read from netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." }    
2023-04-06 14:18:07 netlink socket stream shut down
@altonen
Copy link
Contributor

altonen commented Apr 6, 2023

I think this originates from libp2p: libp2p/rust-libp2p#3390

I'm not sure we can do anything about it. I've noticed the same issue when I start the node and right after starting it, kill it with SIGINT. It enters into this annoying loop that is hard to get out of. It looks like there is something in if-watch that doesn't want to get disrupted when it's starting. So far I've killed the process with kill -s SEGV but I wouldn't recommend it because I don't know how much Substrate likes that.

@Sophia-Gold Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023
@mrq1911
Copy link
Contributor

mrq1911 commented Nov 12, 2023

the same issue appeared randomly to me during sync of rococo chain

polkadot 1.3.0-7c9fd83805c

i starts spamming the log everytime i start the service

polkadot --chain=rococo --database=paritydb --pruning=1000 --rpc-external --rpc-port 19911 --rpc-cors all

I guess it was because process was not shut down correctly in case of disk space threshold

Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 💻 Target environment: gnu
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 💻 CPU: AMD Ryzen 5 3600X 6-Core Processor
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 💻 CPU cores: 6
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 💻 Memory: 32029MB
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 💻 Kernel: 5.4.0-135-generic
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 💻 Linux distribution: Ubuntu 20.04.5 LTS
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 💻 Virtual machine: no
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 📦 Highest known block at #3132023
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 〽️ Prometheus exporter started at 127.0.0.1:9615
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 Running JSON-RPC server: addr=0.0.0.0:19911, allowed origins=["*"]
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 🏁 CPU score: 1.37 GiBs
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 🏁 Memory score: 14.93 GiBs
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 🏁 Disk score (seq. writes): 1.16 GiBs
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 🏁 Disk score (rand. writes): 594.60 MiBs
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 🥩 BEEFY gadget waiting for BEEFY pallet to become available...
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 Available space 1022MiB for path `/root/.local/share/polkadot/chains/rococo_v2_2/paritydb/full` dropped below threshold: 1024MiB , terminating...
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 Protocol controllers receiver stream has returned `None`. Ignore this error if the node is shutting down.
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 Protocol controllers receiver stream has returned `None`. Ignore this error if the node is shutting down.
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 Protocol controllers receiver stream has returned `None`. Ignore this error if the node is shutting down.
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 Protocol controllers receiver stream has returned `None`. Ignore this error if the node is shutting down.
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 Protocol controllers receiver stream has returned `None`. Ignore this error if the node is shutting down.
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 Protocol controllers receiver stream has returned `None`. Ignore this error if the node is shutting down.
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 Protocol controllers receiver stream has returned `None`. Ignore this error if the node is shutting down.
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 Protocol controllers receiver stream has returned `None`. Ignore this error if the node is shutting down.
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 Protocol controllers receiver stream has returned `None`. Ignore this error if the node is shutting down.
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 Protocol controllers receiver stream has returned `None`. Ignore this error if the node is shutting down.
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 Protocol controllers receiver stream has returned `None`. Ignore this error if the node is shutting down.
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 failed to read from netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." }
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 netlink socket stream shut down
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 error flushing netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." }
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 if watch returned an error: rtnetlink socket closed
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 failed to read from netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." }
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 netlink socket stream shut down
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 error flushing netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." }
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 if watch returned an error: rtnetlink socket closed
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 failed to read from netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." }
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 netlink socket stream shut down
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 error flushing netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." }
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 if watch returned an error: rtnetlink socket closed
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 failed to read from netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." }
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 netlink socket stream shut down
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 error flushing netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." }
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 if watch returned an error: rtnetlink socket closed
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 failed to read from netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." }
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 netlink socket stream shut down
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 error flushing netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." }
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 if watch returned an error: rtnetlink socket closed
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 failed to read from netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." }
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 netlink socket stream shut down
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 error flushing netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." }
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 if watch returned an error: rtnetlink socket closed
Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 failed to read from netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." }```

@altonen
Copy link
Contributor

altonen commented Nov 12, 2023

This is your main problem:

Nov 12 10:25:05 rococo-relaychain rococo.service[831960]: 2023-11-12 10:25:05 Available space 1022MiB for path /root/.local/share/polkadot/chains/rococo_v2_2/paritydb/full dropped below threshold: 1024MiB , terminating...

which causes the node to shut down. Disabling mDNS with --no-mdns should make the ifwatch prints to disappear.

claravanstaden added a commit to Snowfork/polkadot-sdk that referenced this issue Dec 8, 2023
…annel (paritytech#671)

* Resume syncing progress

* Progress on resume syncing

* First version of resume syncing

* Revert message backfill logic

* Remove unused variable

* Reverts verifier implementation

* Clean and improvements

* Store last verified block number in inbound basic and incentivized channel

* Fix after merge

* Resuming message syncing progress

* Beacon relay refactor and message syncing

* Almost finished with message syncing

* Message processing

* Adds nonce processing.

* Last few tweaks

* Current period's next sync committee should be verified

* Adds a few more comments.

* Use errGroups to synchronize goroutine error handling.

* Use destructuring tuple to simplify "verify" method tuple response handling.

* Update relayer/relays/beacon/header/header.go

Co-authored-by: Vincent Geddes <vincent@snowfork.com>

Co-authored-by: claravanstaden <Cats 4 life!>
Co-authored-by: Vincent Geddes <vincent@snowfork.com>
helin6 pushed a commit to boolnetwork/polkadot-sdk that referenced this issue Feb 5, 2024
Bumps [styfle/cancel-workflow-action](https://github.com/styfle/cancel-workflow-action) from 0.3.1 to 0.9.1.
- [Release notes](https://github.com/styfle/cancel-workflow-action/releases)
- [Commits](styfle/cancel-workflow-action@0.3.1...0.9.1)

---
updated-dependencies:
- dependency-name: styfle/cancel-workflow-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 10, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 10, 2024
bkchr pushed a commit that referenced this issue Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants