Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kademlia: Records republication exceeds inbound stream limit #3236

Closed
shamil-gadelshin opened this issue Dec 13, 2022 · 8 comments · Fixed by #3287
Closed

kademlia: Records republication exceeds inbound stream limit #3236

shamil-gadelshin opened this issue Dec 13, 2022 · 8 comments · Fixed by #3287

Comments

@shamil-gadelshin
Copy link
Contributor

Summary

We have a feature in Kademlia for provider records republication which is controlled by provider_publication_interval configuration parameter.

However, if we have multiple provider records (several hundred) that should be republished we could experience "inbound stream issues" because Kademlia has a hard-coded limit (MAX_NUM_INBOUND_SUBSTREAMS = 32).

Some of the messages relate to "reusing of the inbound streams" and seem OK. Other relate to dropped streams.

Currently, we don't have means to limit rates of the "republication" requests because it's a built-in protocol feature. Am I wrong here?

Expected behaviour

I expect an internal Kademlia process to work within Kademlia's limits or have a setting to control its rate.

Actual behaviour

Peers exceed other peers` inbound stream limits and produce warnings and actually drop republication messages.

Debug Output

2022-12-13T12:54:30.637379Z  WARN libp2p_kad::handler: New inbound substream to PeerId("12D3KooWFUrJybJKR4EkeXY3WPknWJ25RtZU4XoXivCTvQsqFHCw") exceeds inbound substream limit. No older substream waiting to be reused. Dropping new substream.    
2022-12-13T12:54:30.637470Z  WARN libp2p_kad::handler: New inbound substream to PeerId("12D3KooWFUrJybJKR4EkeXY3WPknWJ25RtZU4XoXivCTvQsqFHCw") exceeds inbound substream limit. No older substream waiting to be reused. Dropping new substream.    
2022-12-13T12:54:30.637557Z  WARN libp2p_kad::handler: New inbound substream to PeerId("12D3KooWFUrJybJKR4EkeXY3WPknWJ25RtZU4XoXivCTvQsqFHCw") exceeds inbound substream limit. No older substream waiting to be reused. Dropping new substream.    
2022-12-13T12:54:30.637644Z  WARN libp2p_kad::handler: New inbound substream to PeerId("12D3KooWFUrJybJKR4EkeXY3WPknWJ25RtZU4XoXivCTvQsqFHCw") exceeds inbound substream limit. No older substream waiting to be reused. Dropping new substream.    
2022-12-13T12:54:30.637730Z  WARN libp2p_kad::handler: New inbound substream to PeerId("12D3KooWFUrJybJKR4EkeXY3WPknWJ25RtZU4XoXivCTvQsqFHCw") exceeds inbound substream limit. No older substream waiting to be reused. Dropping new substream.    
2022-12-13T12:54:30.637818Z  WARN libp2p_kad::handler: New inbound substream to PeerId("12D3KooWFUrJybJKR4EkeXY3WPknWJ25RtZU4XoXivCTvQsqFHCw") exceeds inbound substream limit. No older substream waiting to be reused. Dropping new substream.    
2022-12-13T12:54:30.637909Z  WARN libp2p_kad::handler: New inbound substream to PeerId("12D3KooWFUrJybJKR4EkeXY3WPknWJ25RtZU4XoXivCTvQsqFHCw") exceeds inbound substream limit. No older substream waiting to be reused. Dropping new substream.    
2022-12-13T12:54:30.659954Z  WARN libp2p_kad::handler: New inbound substream to PeerId("12D3KooWFUrJybJKR4EkeXY3WPknWJ25RtZU4XoXivCTvQsqFHCw") exceeds inbound substream limit. Removed older substream waiting to be reused.    
2022-12-13T12:54:30.660070Z  WARN libp2p_kad::handler: New inbound substream to PeerId("12D3KooWFUrJybJKR4EkeXY3WPknWJ25RtZU4XoXivCTvQsqFHCw") exceeds inbound substream limit. Removed older substream waiting to be reused.    
2022-12-13T12:54:30.660169Z  WARN libp2p_kad::handler: New inbound substream to PeerId("12D3KooWFUrJybJKR4EkeXY3WPknWJ25RtZU4XoXivCTvQsqFHCw") exceeds inbound substream limit. Removed older substream waiting to be reused.    
2022-12-13T12:54:30.660261Z  WARN libp2p_kad::handler: New inbound substream to PeerId("12D3KooWFUrJybJKR4EkeXY3WPknWJ25RtZU4XoXivCTvQsqFHCw") exceeds inbound substream limit. Removed older substream waiting to be reused.    
2022-12-13T12:54:30.660356Z  WARN libp2p_kad::handler: New inbound substream to PeerId("12D3KooWFUrJybJKR4EkeXY3WPknWJ25RtZU4XoXivCTvQsqFHCw") exceeds inbound substream limit. Removed older substream waiting to be reused.    
2022-12-13T12:54:30.660446Z  WARN libp2p_kad::handler: New inbound substream to PeerId("12D3KooWFUrJybJKR4EkeXY3WPknWJ25RtZU4XoXivCTvQsqFHCw") exceeds inbound substream limit. Removed older substream waiting to be reused.    
2022-12-13T12:54:30.660537Z  WARN libp2p_kad::handler: New inbound substream to PeerId("12D3KooWFUrJybJKR4EkeXY3WPknWJ25RtZU4XoXivCTvQsqFHCw") exceeds inbound substream limit. Removed older substream waiting to be reused.    
2022-12-13T12:54:30.660628Z  WARN libp2p_kad::handler: New inbound substream to PeerId("12D3KooWFUrJybJKR4EkeXY3WPknWJ25RtZU4XoXivCTvQsqFHCw") exceeds inbound substream limit. Removed older substream waiting to be reused.    

Possible Solution

Version

v0.50.0

Would you like to work on fixing this bug?

No

@mxinden
Copy link
Member

mxinden commented Dec 19, 2022

Thanks for the detailed write-up. The above is correct and worth fixing.

As a first step I suggest limiting the libp2p-kad ConnectionHandler to open no more than 32 outbound substreams, thus no longer exceeding the limit on the other end. requests can be buffered on the outbound side, I doubt this has a large memory footprint given that Kademlia requests are lightweight.

Long term, I would like for the libp2p-kad ConnectionHandler to enforce backpressue to the NetworkBehaviour implementation, thus no longer having to buffer outbound requests that would exceed the limit.

@mxinden
Copy link
Member

mxinden commented Dec 19, 2022

Also referencing #3078 here for the "long term" solution.

@mxinden mxinden changed the title Kademlia. Provider records republication seems spamming other peers. kademlia: Provider records republication exceeds inbound stream limit Dec 19, 2022
@mxinden mxinden changed the title kademlia: Provider records republication exceeds inbound stream limit kademlia: Records republication exceeds inbound stream limit Dec 19, 2022
@mxinden
Copy link
Member

mxinden commented Dec 26, 2022

Expanding on the above two steps:

  1. Buffer requests in the outbound ConnectionHandler.

    • Have no more than 32 requests in-flight.
    • Sender no longer overwhelming the receivers.
    • Requests are small, thus likely not leading to excessive memory usage in the outbound ConnectionHandlers.
  2. Enforce backpressure from the outbound ConnectionHandler to the NetworkBehaviour implementation.

    • Option for backpressure could be a mechanism on top of NetworkBehaviourAction::NotifyHandler.
    • Option for backpressure could be separate bounded channel from NetworkBehaviour to each ConnectionHandler. (preference)
    • Request scheduling can make informed decisions based on backpressure / utilization of each connection.

Long term this would lead to a third step:

  1. Drop arbitrary magic number 32 and enforce backpressure via stream limit of multiplexer.
    • Multiplexer (i.e. QUIC) should enforce backpressure on the number of streams.
    • Each Kademlia request should use it's own stream.
    • Thus backpressure is enforced from inbound ConnectionHandler to outbound ConnectionHandler.

@nazar-pc
Copy link
Contributor

Sender no longer overwhelming the receivers.

This can still lead to inbound streams being dropped on receiver unless we implement stream reuse as well IIRC.

@mxinden
Copy link
Member

mxinden commented Dec 27, 2022

Yes, receivers might still drop inbound streams due to reaching the magic number 32, given that operations across streams are not ordered. E.g. the following could happen:

  1. A has 32 streams open to B.
  2. A closes an existing stream, thus having 31 streams open.
  3. A opens a new stream, thus having 32 streams open.
  4. B receives the new stream, before the signal to close the existing stream, thus exceeding the limit.

In my eyes we should test whether this happens often in the wild and whether it thus needs a fix (like stream reuse).

@nazar-pc
Copy link
Contributor

Okay, I'll start with outbound buffering and we'll go from there.

@nazar-pc
Copy link
Contributor

Second attempt from #3287 seems to work fine from my limited testing with our testnet so far.

@mergify mergify bot closed this as completed in #3287 Jan 24, 2023
mergify bot pushed a commit that referenced this issue Jan 24, 2023
Limit number of active outbound streams to not exceed configured number of streams.

Resolves #3236.
@mxinden
Copy link
Member

mxinden commented Jan 24, 2023

This pull request was closed by #3287. While #3287 does only do the first step, namely moving backpressure from the receiver to the sender ConnectionHandler, I think it is still fine to close here and further track the overarching effort in #3078.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants