Skip to content

Commit

Permalink
kad-dht/: Recommend new values for Provider Record Republish and Expi…
Browse files Browse the repository at this point in the history
…ration (#451)

Recommend new values for provider record republish and expiration (22h/48h) based on request-for-measurement 17 results.

Co-authored-by: Marcin Rataj <lidel@lidel.org>
  • Loading branch information
yiannisbot and lidel authored Dec 12, 2022
1 parent cfcf023 commit 9a646c0
Showing 1 changed file with 76 additions and 13 deletions.
89 changes: 76 additions & 13 deletions kad-dht/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

| Lifecycle Stage | Maturity | Status | Latest Revision |
|-----------------|----------------|--------|-----------------|
| 3A | Recommendation | Active | r1, 2021-10-30 |
| 3A | Recommendation | Active | r2, 2022-12-09 |

Authors: [@raulk], [@jhiesey], [@mxinden]

Expand Down Expand Up @@ -75,14 +75,22 @@ nodes, unrestricted nodes should operate in _server mode_ and restricted nodes,
e.g. those with intermittent availability, high latency, low bandwidth, low
CPU/RAM/Storage, etc., should operate in _client mode_.

As an example, running the libp2p Kademlia protocol on top of the Internet,
publicly routable nodes, e.g. servers in a datacenter, might operate in _server
As an example, publicly routable nodes running the libp2p Kademlia protocol,
e.g. servers in a datacenter, should operate in _server
mode_ and non-publicly routable nodes, e.g. laptops behind a NAT and firewall,
might operate in _client mode_. The concrete factors used to classify nodes into
should operate in _client mode_. The concrete factors used to classify nodes into
_clients_ and _servers_ depend on the characteristics of the network topology
and the properties of the Kademlia DHT . Factors to take into account are e.g.
and the properties of the Kademlia DHT. Factors to take into account are e.g.
network size, replication factor and republishing period.

For instance, setting the replication factor to a low value would require more
reliable peers, whereas having higher replication factor could allow for less
reliable peers at the cost of more overhead. Ultimately, peers that act as
servers should help the network (i.e., provide positive utility in terms of
availability, reachability, bandwidth). Any factor that slows down network
operations (e.g., a node not being reachable, or overloaded) for the majority
of times it is being contacted should instead be operating as a client node.

Nodes, both those operating in _client_ and _server mode_, add another node to
their routing table if and only if that node operates in _server mode_. This
distinction allows restricted nodes to utilize the DHT, i.e. query the DHT,
Expand Down Expand Up @@ -228,7 +236,7 @@ Then we loop:
becomes the new best peer (`Pb`).
2. If the new value loses, we add the current peer to `Po`.
2. If successful with or without a value, the response will contain the
closest nodes the peer knows to the key `Key`. Add them to the candidate
closest nodes the peer knows to the `Key`. Add them to the candidate
list `Pn`, except for those that have already been queried.
3. If an error or timeout occurs, discard it.
4. Go to 1.
Expand Down Expand Up @@ -256,7 +264,7 @@ type Validator interface {
```

`Validate()` should be a pure function that reports the validity of a record. It
may validate a cryptographic signature, or else. It is called on two occasions:
may validate a cryptographic signature, or similar. It is called on two occasions:

1. To validate values retrieved in a `GET_VALUE` query.
2. To validate values received in a `PUT_VALUE` query before storing them in the
Expand All @@ -268,23 +276,76 @@ heuristic of the value to make the decision.

### Content provider advertisement and discovery

Nodes must keep track of which nodes advertise that they provide a given key
(CID). These provider advertisements should expire, by default, after 24 hours.
These records are managed through the `ADD_PROVIDER` and `GET_PROVIDERS`
There are two things at play with regard to provider record (and therefore content)
liveness and reachability:

Content needs to be reachable, despite peer churn;
and nodes that store and serve provider records should not serve records for stale content,
i.e., content that the original provider does not wish to make available anymore.

The following two parameters help cover both of these cases.

1. **Provider Record Republish Interval:** The content provider
needs to make sure that the nodes chosen to store the provider record
are still online when clients ask for the record. In order to
guarantee this, while taking into account the peer churn, content providers
republish the records they want to provide. Choosing the particular value for the
Republish interval is network-specific and depends on several parameters, such as
peer reliability and churn.

- For the IPFS network it is currently set to **22 hours**.

2. **Provider Record Expiration Interval:** The network needs to provide
content that content providers are still interested in providing. In other words,
nodes should not keep records for content that content providers have stopped
providing (aka stale records). In order to guarantee this, provider records
should _expire_ after some interval, i.e., nodes should stop serving those records,
unless the content provider has republished the provider record. Again, the specific
setting depends on the characteristics of the network.

- In the IPFS DHT the Expiration Interval is set to **48 hours**.

The values chosen for those parameters should be subject to continuous monitoring
and investigation. Ultimately, the values of those parameters should balance
the tradeoff between provider record liveness (due to node churn) and traffic overhead
(to republish records).
The latest parameters are based on the comprehensive study published
in [provider-record-measurements].

Provider records are managed through the `ADD_PROVIDER` and `GET_PROVIDERS`
messages.

It is also worth noting that the keys for provider records are multihashes. This
is because:

- Provider records are used as a rendezvous point for all the parties who have
advertised that they store some piece of content.
- The same multihash can be in different CIDs (e.g. CIDv0 vs CIDv1 of a SHA-256 dag-pb object,
or the same multihash but with different codecs such as dag-pb vs raw).
- Therefore, the rendezvous point should converge on the minimal thing everyone agrees on,
which is the multihash, not the CID.

#### Content provider advertisement

When the local node wants to indicate that it provides the value for a given
key, the DHT finds the closest peers to the key using the `FIND_NODE` RPC (see
key, the DHT finds the (`k` = 20) closest peers to the key using the `FIND_NODE` RPC (see
[peer routing section](#peer-routing)), and then sends an `ADD_PROVIDER` RPC with
its own `PeerInfo` to each of these peers.
its own `PeerInfo` to each of these peers. The study in [provider-record-measurements]
proved that the replication factor of `k` = 20 is a good setting, although continuous
monitoring and investigation may change this recommendation in the future.

Each peer that receives the `ADD_PROVIDER` RPC should validate that the received
`PeerInfo` matches the sender's `peerID`, and if it does, that peer should store
the `PeerInfo` in its datastore. Implementations may choose to not store the
addresses of the providing peer e.g. to reduce the amount of required storage or
to prevent storing potentially outdated address information.
to prevent storing potentially outdated address information. Implementations that choose
to keep the network address (i.e., the `multiaddress`) of the providing peer should do it for
a period of time that they are confident the network addresses of peers do not change after the
provider record has been (re-)published. As with previous constant values, this is dependent
on the network's characteristics. A safe value here is the Routing Table Refresh Interval.
In the kubo IPFS implementation, this is set to 30 mins. After that period, peers provide
the provider's `peerID` only, in order to avoid pointing to stale network addresses
(i.e., the case where the peer has moved to a new network address).

#### Content provider discovery

Expand Down Expand Up @@ -470,3 +531,5 @@ multiaddrs are stored in the node's peerbook.
[ping]: https://github.com/libp2p/specs/issues/183

[go-libp2p-xor]: https://github.com/libp2p/go-libp2p-xor

[provider-record-measurements]: https://github.com/protocol/network-measurements/blob/master/results/rfm17-provider-record-liveness.md

0 comments on commit 9a646c0

Please sign in to comment.