Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Thread 'tokio-runtime-worker' panicked at 'Receiver::next_message called after None' #6871

Open
fgimenez opened this issue Aug 11, 2020 · 6 comments
Labels
J2-unconfirmed Issue might be valid, but it’s not yet known.

Comments

@fgimenez
Copy link

A node running v0.8.22 is eventually crashing with the error in the issue title, full log here https://paste.ubuntu.com/p/KNd7cHfRZ6/

@tomaka tomaka transferred this issue from paritytech/polkadot Aug 11, 2020
@github-actions github-actions bot added the J2-unconfirmed Issue might be valid, but it’s not yet known. label Aug 11, 2020
@mxinden
Copy link
Contributor

mxinden commented Aug 11, 2020

8: sp_consensus::import_queue::buffered_link::BufferedLinkReceiver<B>::poll_actions

panicked at 'Receiver::next_message called after None'

From a first look BufferedLinkReceiver should either take a Fused receiver or never poll again once poll_next returned Poll::Ready(None).

let msg = if let Poll::Ready(Some(msg)) = Stream::poll_next(Pin::new(&mut self.rx), cx) {
msg
} else {
break
};

I am not familiar enough with the code to judge whether it should be impossible that the receiver returns Poll::Ready(None).

@tomaka
Copy link
Contributor

tomaka commented Aug 11, 2020

What is worrisome is that this can only happen if the block import task shuts down, and this shut down is not supposed to happen.

@tomaka
Copy link
Contributor

tomaka commented Aug 11, 2020

To give some context about this code: the BasicQueue::new function accepts a impl sp_core::traits::SpawnNamed as parameter, which it uses to spawn the actual import queue, which then runs in the background.

The BasicQueue struct completely hides any detail of this background task. It is BasicQueue that holds the channels that communicate with the background queue. The only way to send work to the queue is to use methods of the BasicQueue, and the only way for poll_actions to be reached is through BasicQueue as well.

The background task is designed to gracefully shut down if one of the channels is closed. Again, these channels are fields in the BasicQueue. If a channel is closed, that means the BasicQueue is dropped, and poll_actions shouldn't be reached.

Consequently, the only ways this issue can happen is:

  • If the background task panics, but we would see a backtrace I suppose?
  • If the executor drops the future for no reason, but I don't see why it would do that?

@arkpar
Copy link
Member

arkpar commented Aug 11, 2020

It could be that the process was shut down by the OOM killer.
@fgimenez is there a record in /var/log/messages around that time?

@fgimenez
Copy link
Author

@arkpar no messages around that time, in general the memory usage looks pretty sane compared to the total memory of the node (16gb), you can see in the image attached the memory usage in the last 6 hours, with the crashes happening always below 2GiB
Screenshot_2020-08-11_17-31-38

@tomaka
Copy link
Contributor

tomaka commented Aug 12, 2020

We merged #6876, which doesn't fix the root of the problem but will change the error to a different one (that might be more useful) for the next time.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
J2-unconfirmed Issue might be valid, but it’s not yet known.
Projects
None yet
Development

No branches or pull requests

4 participants