feat(dcutr): keep connection alive while we are using it #3960

tcoratger · 2023-05-17T17:04:06Z

Description

Similar to #3876, we now compute connection_keep_alive based on whether we are still using the connection, applied to the dcutr protocol.

Related: #3844.

Notes & open questions

Change checklist

I have performed a self-review of my own code
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
A changelog entry has been made in the appropriate crates

thomaseizinger

Thanks for tackling this!

The tests are failing because the handler has nothing to do, right after we established the connection.

We can fix this by passing in "to be done work" in the constructor of the handler.
Check the event handler for established connections in the associated behaviour. We are queuing an event there using NotifyHandler (

rust-libp2p/protocols/dcutr/src/behaviour_impl.rs

Line 131 in cc5b346

ToSwarm::NotifyHandler {

). Instead of passing that event to the handler via NotifyHandler, we should instead pass it in via the constructor of the handler.

That way, it immediately has something to do on startup and returns KeepAlive::Yes. Does that make sense?

tcoratger · 2023-05-17T22:50:35Z

Thanks for tackling this!

The tests are failing because the handler has nothing to do, right after we established the connection.

We can fix this by passing in "to be done work" in the constructor of the handler. Check the event handler for established connections in the associated behaviour. We are queuing an event there using NotifyHandler (

rust-libp2p/protocols/dcutr/src/behaviour_impl.rs

Line 131 in cc5b346

ToSwarm::NotifyHandler {

). Instead of passing that event to the handler via NotifyHandler, we should instead pass it in via the constructor of the handler.
That way, it immediately has something to do on startup and returns KeepAlive::Yes. Does that make sense?

@thomaseizinger Yes I get your point here, so that you mean replacing

self.queued_events.extend([
    ToSwarm::NotifyHandler {
        peer_id,
        handler: NotifyHandler::One(connection_id),
        event: Either::Left(handler::relayed::Command::Connect {
            obs_addrs: self.observed_addreses(),
        }),
    },
    ToSwarm::GenerateEvent(Event::InitiatedDirectConnectionUpgrade {
        remote_peer_id: peer_id,
        local_relayed_addr: match connected_point {
            ConnectedPoint::Listener { local_addr, .. } => local_addr.clone(),
            ConnectedPoint::Dialer { .. } => unreachable!("Due to outer if."),
        },
    }),
]);

by something like (roughly):

handler::relayed::Handler::new(connected_point, true);
self.queued_events.extend([
    ToSwarm::GenerateEvent(Event::InitiatedDirectConnectionUpgrade {
        remote_peer_id: peer_id,
        local_relayed_addr: match connected_point {
            ConnectedPoint::Listener { local_addr, .. } => local_addr.clone(),
            ConnectedPoint::Dialer { .. } => unreachable!("Due to outer if."),
        },
    }),
]);

where true is a boolean value representing "to be done work". Then I add this condition into the connection_keep_alive to trigger the KeepAlive::Yes just after connection is established, right?

thomaseizinger · 2023-05-18T12:48:17Z

Almost! Instead of adding an extra boolean, have a look at what the ConnectionHandler does when it receives the message from the behaviour:

rust-libp2p/protocols/dcutr/src/handler/relayed.rs

Lines 287 to 317 in cc5b346

    
               fn on_behaviour_event(&mut self, event: Self::FromBehaviour) { 
        
                   match event { 
        
                       Command::Connect { obs_addrs } => { 
        
                           self.queued_events 
        
                               .push_back(ConnectionHandlerEvent::OutboundSubstreamRequest { 
        
                                   protocol: SubstreamProtocol::new( 
        
                                       protocol::outbound::Upgrade::new(obs_addrs), 
        
                                       (), 
        
                                   ), 
        
                               }); 
        
                       } 
        
                       Command::AcceptInboundConnect { 
        
                           inbound_connect, 
        
                           obs_addrs, 
        
                       } => { 
        
                           if self 
        
                               .inbound_connect 
        
                               .replace(inbound_connect.accept(obs_addrs).boxed()) 
        
                               .is_some() 
        
                           { 
        
                               log::warn!( 
        
                                   "New inbound connect stream while still upgrading previous one. \ 
        
                                    Replacing previous with new.", 
        
                               ); 
        
                           } 
        
                       } 
        
                       Command::UpgradeFinishedDontKeepAlive => { 
        
                           self.keep_alive = KeepAlive::No; 
        
                       } 
        
                   } 
        
               }

It adds an event to queued_events. If this queue is not empty, you are already returning KeepAlive::Yes in this PR! Thus, if you extend the constructor to take in the event itself and directly add it to the queue, all should be fine :)

Plus, this should allow us to remove the Connect message, making the implementation overall simpler!

protocols/dcutr/src/handler/relayed.rs

thomaseizinger · 2023-05-18T13:10:59Z

Plus, this should allow us to remove the Connect message, making the implementation overall simpler!

Unfortunately we can't because we still need it to trigger more attempts.

thomaseizinger · 2023-05-18T13:45:20Z

We do something similar in the relay itself already:

rust-libp2p/protocols/relay/src/priv_client.rs

Lines 156 to 174 in cc5b346

    
           fn handle_established_inbound_connection( 
        
               &mut self, 
        
               connection_id: ConnectionId, 
        
               peer: PeerId, 
        
               local_addr: &Multiaddr, 
        
               remote_addr: &Multiaddr, 
        
           ) -> Result<THandler<Self>, ConnectionDenied> { 
        
               if local_addr.is_relayed() { 
        
                   return Ok(Either::Right(dummy::ConnectionHandler)); 
        
               } 
        
               let mut handler = Handler::new(self.local_peer_id, peer, remote_addr.clone()); 
        
               if let Some(event) = self.pending_handler_commands.remove(&connection_id) { 
        
                   handler.on_behaviour_event(event) 
        
               } 
        
               Ok(Either::Left(handler)) 
        
           }

thomaseizinger · 2023-05-19T21:52:13Z

@tcoratger Did the comment above clear things up on how to proceed here?

tcoratger · 2023-05-21T21:38:51Z

@tcoratger Did the comment above clear things up on how to proceed here?

@thomaseizinger Thank you for your explanations, I understood a little better but unfortunately, I do not yet have all the understanding of the nesting of functions on this subject, I must discover the library in more detail (I'm quite new on this).

I understood what must happen, i.e. when the connection_keep_alive function is called as soon as the connection is set up, then the self.queued_events must not be empty at the risk of returning KeepAlive:: No, which would turn off the connection when it shouldn't.
As with the ToSwarm::NotifyHandler, the event is already pushed to the handler, I still wonder why this basic implementation doesn't work: I guess because it's just a notification and the event does not have time to be pushed into the queue with the on_behaviour_event function, so connection_keep_alive reads an empty queue and therefore terminates the connection.
Based on this principle, I understand your approach which aims to directly integrate the event into the constructor of the handler. I thus propose the following or similar approach:

// constructor with default values
 let mut handler = handler::relayed::Handler::new(connected_point.clone());
// push of the event (another approach can be adopted here)
  handler.on_behaviour_event(handler::relayed::Command::Connect {
      obs_addrs: self.observed_addresses(),
  });

  self.queued_events.extend([
      ToSwarm::GenerateEvent(Event::InitiatedDirectConnectionUpgrade {
          remote_peer_id: peer_id,
          local_relayed_addr: match connected_point {
              ConnectedPoint::Listener { local_addr, .. } => local_addr.clone(),
              ConnectedPoint::Dialer { .. } => unreachable!("Due to outer if."),
          },
      }),
  ]);

But using this approach, I don't understand how to push my handler thus created so that it appears in the queue of events transmitted from the behavior to the handler thereafter. The approach is not exactly the same as the one taken in handle_established_inbound_connection for the relay (for me) because here the handler is returned by the function and subsequently used to be pushed to the handler side I imagine.

In summary I have trouble seeing how to link the implementation of behavior with what is in the handler afterwards, probably because I lack experience in the library (that's why I try to choose themes that are not too complicated to begin to slowly familiarize myself with the codebase :)).

thomaseizinger · 2023-05-22T18:47:18Z

No worries at all, the life-cycles are tricky to understand!

I'll see to push some follow-up patches so you can see what I mean :)

thomaseizinger · 2023-05-22T19:22:57Z

@mxinden I pushed some follow-up commits here. The trickiest bit is that we need to keep the connection alive between the upgrade attempts. However, the probelab data showed that multiple upgrade attempts don't really increase the success rate.

We could simplify this a bit more if we remove the upgrade attempts. What do you think?

thomaseizinger · 2023-05-22T19:25:03Z

@mxinden I pushed some follow-up commits here. The trickiest bit is that we need to keep the connection alive between the upgrade attempts. However, the probelab data showed that multiple upgrade attempts don't really increase the success rate.

We could simplify this a bit more if we remove the upgrade attempts. What do you think?

Another thing to discuss: Do we really need the events that notify us about a dcutr attempt? Those add code to the communication between handler and behaviour that is otherwise not needed.

mxinden · 2023-05-29T04:02:47Z

Another thing to discuss: Do we really need the events that notify us about a dcutr attempt? Those add code to the communication between handler and behaviour that is otherwise not needed.

Initially I added them with the goal of exposing corresponding Prometheus metrics in libp2p-metrics. I think that is still a worthy goal.

mxinden · 2023-05-29T04:04:27Z

@mxinden I pushed some follow-up commits here. The trickiest bit is that we need to keep the connection alive between the upgrade attempts. However, the probelab data showed that multiple upgrade attempts don't really increase the success rate.

We could simplify this a bit more if we remove the upgrade attempts. What do you think?

I would expect that we will re-introduce the retry logic in case we decide to implement libp2p/specs#487. Thus my preference, unless they add significant amount of complexity, is to keep the retry logic.

mxinden

Thank you for the work here.

Shall we land #3982 first?

mxinden · 2023-05-29T04:07:49Z

protocols/dcutr/src/behaviour_impl.rs

                        remote_peer_id: event_source,
-                        remote_relayed_addr: remote_addr,


Why no longer provide the remote_relayed_addr with the RemoteInitiatedDirectConnectionUpgrade event? That would remove the need for explicit state tracking in the NetworkBehaviour implementation and instead store the state close to its source, namely the connection.

Friendly ping. Am I missing something? This should eliminate the necessity for maintaining the state of Behaviour::remote_relayed_addr, correct?

Yes-ish. The original idea was to not pass data back and forth. The behaviour learns about the address first so it is kind of redundant to pass it to the handler only to then pass it to the behaviour again.

On the flipside, not needing the hashmap removes a few error cases and simplifies the behaviour so I think it is a good idea.

@tcoratger Mind taking a look at this?

The idea would be to remove the peer_addresses hashmap and instead pass the address in the InboundConnectRequest enum from the handler to the behaviour.

@thomaseizinger @mxinden Done. I hope I have understood everything correctly.

From what I have in mind, the peers_addresses hashmap of the Behavior structure was used to list peer addresses. It was therefore updated when establishing or closing a connection.

To replace this and avoid some bug-prone cases, we prefer to simplify the behavior and place the logic in the handler instead. So I started by completely removing peers_addresses which is no longer useful. Then I added the remote_addr inside InboundConnectRequest to pass the address information from the handler, closer to the source of the connection.

Don't hesitate if there's something tricky that I didn't catch.

mergify · 2023-05-31T01:18:20Z

This pull request has merge conflicts. Could you please resolve them @tcoratger? 🙏

…eepalive-dcutr

mxinden · 2023-05-31T02:30:52Z

The merging of #3982 introduced a tricky merge conflict here. @tcoratger I took a go at resolving it. Mind giving the latest merge commit a review?

tcoratger · 2023-05-31T16:39:15Z

The merging of #3982 introduced a tricky merge conflict here. @tcoratger I took a go at resolving it. Mind giving the latest merge commit a review?

@mxinden Yes right, if I understand well your merge here:

You have added, in the handle_established_inbound_connection function, the handler using the constructor and pushed that to the queued_events to mimic the previous pattern that was inside the match in order to establish connection.
You put a direct_connections call in both handle_established_inbound_connection and handle_established_outbound_connection to push a direct and non relayed connection.

It sounds good to me, as I didn't code the most technical part of this PR, maybe @thomaseizinger can you take a look to see if everything is compatible with what you implemented?

protocols/dcutr/src/handler/relayed.rs

mergify · 2023-06-01T04:37:20Z

This pull request has merge conflicts. Could you please resolve them @tcoratger? 🙏

mxinden

Looks good to me overall. Nice to see these simplifications!

protocols/dcutr/src/handler/relayed.rs

mxinden · 2023-06-03T00:38:26Z

protocols/dcutr/src/behaviour_impl.rs

                        remote_peer_id: event_source,
-                        remote_relayed_addr: remote_addr,


Friendly ping. Am I missing something? This should eliminate the necessity for maintaining the state of Behaviour::remote_relayed_addr, correct?

mxinden

Looks good to me. Thank you for the follow-ups! @thomaseizinger let us know if you feel strongly about moving away from the const.

thomaseizinger

I think it would be cleaner the const being private to the behaviour but this is already a nice improvement overall so let's get it in!

tcoratger added 2 commits May 17, 2023 19:00

keep connection alive while we are using it dcutr

1a0521f

add changelog

7015af6

thomaseizinger reviewed May 17, 2023

View reviewed changes

thomaseizinger reviewed May 18, 2023

View reviewed changes

protocols/dcutr/src/handler/relayed.rs Outdated Show resolved Hide resolved

tcoratger and others added 3 commits May 18, 2023 19:22

remove UpgradeFinishedDontKeepAlive variant in on_behaviour_event

6f9db5a

Merge branch 'master' into keepalive-dcutr

72fc561

Merge branch 'master' into keepalive-dcutr

aaf9274

thomaseizinger added 5 commits May 22, 2023 20:58

Fully remove enum variant

26975c6

Track peer addresses in behaviour instead of sending it back and forth

4081885

Initialize handler with holepunch candidates

8be8e54

Directly initialize handler with events

5c18d0f

Keep handler alive between attempts

b617f0f

mxinden reviewed May 29, 2023

View reviewed changes

Merge branch 'master' into keepalive-dcutr

c01860a

Merge branch 'master' of https://github.com/libp2p/rust-libp2p into k…

1409fd9

…eepalive-dcutr

Merge branch 'master' into keepalive-dcutr

fcf8ecc

thomaseizinger reviewed May 31, 2023

View reviewed changes

protocols/dcutr/src/handler/relayed.rs Show resolved Hide resolved

mxinden reviewed Jun 3, 2023

View reviewed changes

tcoratger and others added 6 commits June 3, 2023 23:15

Merge branch 'master' into keepalive-dcutr

f32250e

adjustments due to remove direct::Handler

c6f8937

fix wip

5e14dc9

fix push to queue in handle_established_outbound_connection

89b6a83

remote_relayed_addr fix

23b0cb0

remove peers_addresses

4114cb3

mxinden approved these changes Jun 4, 2023

View reviewed changes

thomaseizinger approved these changes Jun 4, 2023

View reviewed changes

thomaseizinger added the send-it label Jun 4, 2023

mergify bot merged commit a4450d4 into libp2p:master Jun 4, 2023

tcoratger deleted the keepalive-dcutr branch June 4, 2023 12:19

This was referenced Sep 19, 2023

Simplified handling of idle connections #4306

Closed

swarm: deprecate KeepAlive::Until #3844

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(dcutr): keep connection alive while we are using it #3960

feat(dcutr): keep connection alive while we are using it #3960

tcoratger commented May 17, 2023

thomaseizinger left a comment

tcoratger commented May 17, 2023

thomaseizinger commented May 18, 2023 •

edited

Loading

thomaseizinger commented May 18, 2023

thomaseizinger commented May 18, 2023

thomaseizinger commented May 19, 2023 •

edited

Loading

tcoratger commented May 21, 2023

thomaseizinger commented May 22, 2023

thomaseizinger commented May 22, 2023

thomaseizinger commented May 22, 2023

mxinden commented May 29, 2023

mxinden commented May 29, 2023

mxinden left a comment

mxinden May 29, 2023

mxinden Jun 3, 2023

thomaseizinger Jun 3, 2023

tcoratger Jun 3, 2023

mergify bot commented May 31, 2023

mxinden commented May 31, 2023

tcoratger commented May 31, 2023

mergify bot commented Jun 1, 2023

mxinden left a comment

mxinden Jun 3, 2023

mxinden left a comment

thomaseizinger left a comment

		remote_peer_id: event_source,
		remote_relayed_addr: remote_addr,

feat(dcutr): keep connection alive while we are using it #3960

feat(dcutr): keep connection alive while we are using it #3960

Conversation

tcoratger commented May 17, 2023

Description

Notes & open questions

Change checklist

thomaseizinger left a comment

Choose a reason for hiding this comment

tcoratger commented May 17, 2023

thomaseizinger commented May 18, 2023 • edited Loading

thomaseizinger commented May 18, 2023

thomaseizinger commented May 18, 2023

thomaseizinger commented May 19, 2023 • edited Loading

tcoratger commented May 21, 2023

thomaseizinger commented May 22, 2023

thomaseizinger commented May 22, 2023

thomaseizinger commented May 22, 2023

mxinden commented May 29, 2023

mxinden commented May 29, 2023

mxinden left a comment

Choose a reason for hiding this comment

mxinden May 29, 2023

Choose a reason for hiding this comment

mxinden Jun 3, 2023

Choose a reason for hiding this comment

thomaseizinger Jun 3, 2023

Choose a reason for hiding this comment

tcoratger Jun 3, 2023

Choose a reason for hiding this comment

mergify bot commented May 31, 2023

mxinden commented May 31, 2023

tcoratger commented May 31, 2023

mergify bot commented Jun 1, 2023

mxinden left a comment

Choose a reason for hiding this comment

mxinden Jun 3, 2023

Choose a reason for hiding this comment

mxinden left a comment

Choose a reason for hiding this comment

thomaseizinger left a comment

Choose a reason for hiding this comment

thomaseizinger commented May 18, 2023 •

edited

Loading

thomaseizinger commented May 19, 2023 •

edited

Loading