periodically reconnect to bootstrap peers (supervise bootstrap connections) #416

btc · 2014-12-08T07:45:55Z

context
The network relies on bootstrap peers to help new nodes get acquainted. When bootstrap peers go down (eg. for maintenance), other nodes on the network lose these important connections. When bootstrap peers finally rejoin, other nodes must restart in order to reconnect. It would be great for nodes to supervise their bootstrap connections... periodically reconnecting to bootstraps whose connections have been lost.

concern
Right now, connection initialization is performed with route.Connect(ctx, peer). Is this the preferred method going forward? The alternative is to use the Network's n.DialPeer(ctx, peer) directly.

If route.Connect is preferred and the Ping action/side-effect is important, should we be concerned about misuse of the network.DialPeer method?

@jbenet

The text was updated successfully, but these errors were encountered:

whyrusleeping · 2014-12-08T10:29:02Z

The Ping is important to keep the DHTs routing table up to date. But for reconnects, it probably doesnt matter much

btc · 2014-12-08T10:38:44Z

The Ping is important to keep the DHTs routing table up to date.

So I suppose it is critical/required/necessary for the first connection to be performed using Routing, yeah?

whyrusleeping · 2014-12-08T16:43:18Z

Yeah, although, Im unhappy with the way that all works... if you want to slightly rearrange things, i would be fully supportive.

jbenet · 2014-12-08T19:37:45Z

So I suppose it is critical/required/necessary for the first connection to be performed using Routing, yeah?

In reality, bootstrap is a routing problem, not a network problem. The underlying network (net, swarm, etc) doesn't actually care about having connections open. (it could live with a model of ephemeral connections for example). It's the routing system of choice (in our case, Kademlia routing) that wants bootstrapping. (A different routing system might use a completely static topology known a priori).

So the Bootstrap process is Routing system dependent. But note that it shouldn't be part of the routing system interface per-se. Bootstrapping is implementation dependent. Kademlia routing only needs peer addresses ((peer.ID, ma.Multiaddr) pairs for now, but once Multiaddr supports ipfs addresses, just ma.Multiaddr). This means that the *IpfsDHT should be initialized with its bootstrap info:

dht, err := dht.NewDHT(..., bootstrapPeers)

And, the bootstrap/reconnection alg should be something like this.

// when starting up (in core.go, somewhere)
go dht.Boostrap()

// in dht somewhere
var minConnThreshold = 1 // later set to something like len(bootstrapPeers)/2

// when routing table connections fall below some threshold
// (note, right now we don't get this event exactly. we expel nodes from the table
// when they fail to respond (and we keep pinging for liveness).
func (dht *IpfsDHT) onRoutingTableExpulsion() {
  if len(routingTablesSize()) < minConnThreshold {
    go dht.Bootstrap()
  }
}

func (dht *IpfsDHT) Bootstrap() {
  ps := dht.randomSubsetOfBootstrapPeers()
  for _, p := range ps {
    n.DialPeer(p)
  }
}

NB: peers don't need to (and in the long run shouldnt) keep active connections to the bootstrap nodes always. This makes bootstrap nodes central points of failure. while the network is small, that's fine, but as the network increases in size and is up all the time, we want bootstrap connections only to help introduce nodes to the network at the beginning.

It's becoming clearer and clearer that the DHT and the RoutingTable are actually two separate (layered) entities. In the future we should separate these. And that there's two parts to routing: ContentRouting and PeerRouting. Kademlia happens to implement both.

Perhaps a good way to proceed is:

separate out the routing table + PeerRouting part of the dht into something like KademliaRouting
make IpfsDHT uses a KademliaRouting, and layers the DHT part, storing data.

But, this is an "after alpha" thing. We should push out what's here now, with all of this together. We can patch in the bootstrapping as discussed above.

whyrusleeping · 2014-12-08T19:48:13Z

As we move towards larger networks, we can avoid the central point of failure problem by having each node keep track of its own set of 'trusted peers' that it has had long, successful relationships with, and can use those nodes to bootstrap.

As to moving the RoutingTable out of the DHT (or at least separating them) Ive been hoping we would do that for some time now, I have other services ive wanted to write (mesh messaging among them) that would benefit greatly from having access to the RoutingTable's NearestPeer methods

jbenet · 2014-12-08T19:51:43Z

As to moving the RoutingTable out of the DHT (or at least separating them) Ive been hoping we would do that for some time now, I have other services ive wanted to write (mesh messaging among them) that would benefit greatly from having access to the RoutingTable's NearestPeer methods

yeah ran into the same thing

btc · 2014-12-08T20:44:31Z

But, this is an "after alpha" thing. We should push out what's here now, with all of this together. We can patch in the bootstrapping as discussed above.

To avoid adding complexity that will make it harder to change the DHT when we decide to get around to this, I'm going to implement a simple stop-gap solution in userspace (core) using public DHT methods.

It's basically initConnections with some supervision.

Impl'd here: 00a1a12

Let me know if any objections to this short-term approach.

jbenet · 2014-12-08T22:01:11Z

@maybebtc okay, the userland-supervision-for-now approach sgtm. beautifully small, too.

btc · 2014-12-08T22:59:09Z

note that if the list of bootstrap peers changes at any point, the list used here becomes stale.

Adding the repo to the core may address this staleness.

Perhaps the repo can expose a Config() method that fetches the config from disk?

jbenet · 2014-12-08T23:04:45Z

@maybebtc maybe we should move Config into the Repo entirely. and construct the Repo on init.

btc · 2014-12-08T23:17:43Z

@maybebtc maybe we should move Config into the Repo entirely. and construct the Repo on init.

Indeed.

How about I open a PR for supervision with this known limitation (stale entries).

And right after, begin the repo transition right. I should be able to finish the repo changes in a few hours, but it would be nice to land supervision and move it out of the way.

I'm fixing this concern ATM:

if len(activeConnections) < threshold {
  // connect to a randomly chosen _subset_ of peers
}

jbenet · 2014-12-08T23:18:34Z

How about I open a PR for supervision with this known limitation (stale entries).
And right after, begin the repo transition right.

SGTM

btc · 2014-12-09T00:13:53Z

#422

btc added the kind/bug A bug in existing code (including security flaws) label Dec 8, 2014

btc mentioned this issue Dec 8, 2014

refactor(peer) some _minor_ safety features #419

Merged

btc self-assigned this Dec 8, 2014

btc added the status/in-progress In progress label Dec 8, 2014

btc mentioned this issue Dec 9, 2014

feat(core) supervise bootstrap connections #422

Merged

btc closed this as completed in #422 Dec 9, 2014

btc removed the status/in-progress In progress label Dec 9, 2014

btc removed their assignment Mar 30, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

periodically reconnect to bootstrap peers (supervise bootstrap connections) #416

periodically reconnect to bootstrap peers (supervise bootstrap connections) #416

btc commented Dec 8, 2014

whyrusleeping commented Dec 8, 2014

btc commented Dec 8, 2014

whyrusleeping commented Dec 8, 2014

jbenet commented Dec 8, 2014

whyrusleeping commented Dec 8, 2014

jbenet commented Dec 8, 2014

btc commented Dec 8, 2014

jbenet commented Dec 8, 2014

btc commented Dec 8, 2014

jbenet commented Dec 8, 2014

btc commented Dec 8, 2014

jbenet commented Dec 8, 2014

btc commented Dec 9, 2014

periodically reconnect to bootstrap peers (supervise bootstrap connections) #416

periodically reconnect to bootstrap peers (supervise bootstrap connections) #416

Comments

btc commented Dec 8, 2014

whyrusleeping commented Dec 8, 2014

btc commented Dec 8, 2014

whyrusleeping commented Dec 8, 2014

jbenet commented Dec 8, 2014

whyrusleeping commented Dec 8, 2014

jbenet commented Dec 8, 2014

btc commented Dec 8, 2014

jbenet commented Dec 8, 2014

btc commented Dec 8, 2014

jbenet commented Dec 8, 2014

btc commented Dec 8, 2014

jbenet commented Dec 8, 2014

btc commented Dec 9, 2014