Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Light clients shouldn't insert themselves in the DHT #3303

Closed
tomaka opened this issue Aug 5, 2019 · 32 comments
Closed

Light clients shouldn't insert themselves in the DHT #3303

tomaka opened this issue Aug 5, 2019 · 32 comments
Labels
I3-bug The node fails to follow expected behavior. Z1-easy Can be fixed primarily by duplicating and adapting code by an intermediate coder

Comments

@tomaka
Copy link
Contributor

tomaka commented Aug 5, 2019

EDIT: (see description below)

@tomaka tomaka added the I3-bug The node fails to follow expected behavior. label Aug 5, 2019
@expenses
Copy link
Contributor

This seems fairly simple, but it's not super clear where a client adds itself to kademlia.

let mut kademlia = Kademlia::new(local_id.clone(), store);
for (peer_id, addr) in &user_defined {
kademlia.add_address(peer_id, addr.clone());
}
Here?

@tomaka
Copy link
Contributor Author

tomaka commented Oct 24, 2019

No. Nodes insert you in their buckets when you connect to them. That code is in libp2p-kad.

This issue is IMO extremely complex to tackle and requires some research efforts.

@expenses
Copy link
Contributor

Oh ok, good to know! Do you thing you could mark this issue as Q5-substantial or above?

@tomaka tomaka added the Z3-substantial Can be fixed by an experienced coder with a working knowledge of the codebase. label Oct 24, 2019
@tomaka
Copy link
Contributor Author

tomaka commented Oct 24, 2019

To give some context.

We use the DHT in order to discover nodes to connect to.

Right now, full and light nodes are both present in the DHT. What can happen at the moment is that, by randomly walking through the DHT to discover nodes, if the number of light nodes compared to the number of full nodes is high enough, then we might only discover light nodes.

This is bad, because what we need is connections to full nodes in order to function properly. Light nodes are purely parasites at the moment.
Similarly, light nodes also need to connect to full nodes in order to function. A light node connecting to another light node is totally useless, and they can't do anything with that connection.

We also have no way to know whether a node is full or light when we find it in the DHT. We have to connect to it and ask.

In order to solve that problem, the solution that this issue title implies is that only full nodes are present in the DHT. Both light nodes and full nodes would find only full nodes to connect to through the DHT. Light nodes would also therefore never connect to each other directly.
There could, however, be other solutions to this problem.

cc @infinity0 @burdges

@tomaka
Copy link
Contributor Author

tomaka commented Oct 24, 2019

This issue isn't urgent at the moment because the ratio of light nodes per full node on the existing networks is something like 1 to 100. Nobody uses light clients at the moment, except occasionally someone who tries if they work.

However if we start advertising light clients, or releasing a UI containing a light client for example, then this issue will need to be tackled first.

@expenses
Copy link
Contributor

So I suppose a solution would be to allow nodes to send a message saying that they shouldn't be added into the DHT?

@burdges
Copy link

burdges commented Oct 24, 2019

What are the key and record for this DHT? If our key or record includes a cryptographic key, then we could ask that DHT entries be signed by their key, so nodes must explicitly ask for inclusion.

We're maybe worried about adversarial spam DHT entries eventually, which makes everything harder. We could however privilege buckets whose key played some on-chain role and randomly drop others when the DHT came under excessive load, but.. We've have many roles for the relay chain already with some that sound tricky to recognize, ala fishermen. And parachain specific roles make this much worse. If this becomes our approach then we could still punt on classifying the roles for quite a while.

We cannot easily recognize a "full node" in such an adversarial setting. I could find some tricks like using H(KEX(bucket_holder_key,bucket_maker_key) || time) to identify some chain state the bucket maker must tell the bucket holder that the bucket holder should already know and can verify as correct.

Anyways, my first question is simply: How far will simply asking the DHT entries be signed go? Even if we ask for nothing about the signing key?

@tomaka
Copy link
Contributor Author

tomaka commented Oct 24, 2019

What are the key and record for this DHT?

Anyways, my first question is simply: How far will simply asking the DHT entries be signed go? Even if we ask for nothing about the signing key?

We're not using the key/value system of Kademlia, but only the FIND_PEER messages.

The keys are therefore the identities of the nodes, and there's no associated value.
When a node connects to us, we (try to) insert it in our buckets. That's the only situation where we add an entry to the buckets.
Therefore the key is already signed through the encryption handshake.

@burdges
Copy link

burdges commented Oct 24, 2019

I see. You want nodes to make claim about their roles or desired roles when introducing themselves initially then?

@tomaka
Copy link
Contributor Author

tomaka commented Oct 24, 2019

You want nodes to make claim about their roles or desired roles when introducing themselves initially then?

Yes, that's one possibility.
But that would solve only light nodes vs full nodes, and it might have implications on how the DHT behaves. For example, if nodes sometimes report that they are light and sometimes full, then the record storage system might not work properly, and that could make it possible to hide validator identities (we're storing validator identities in the DHT).

I also feel like there should be a way to extend this mechanism for nodes belonging/collating for parachains for example. I would therefore put this issue in the "DHT research" bucket.

@infinity0
Copy link
Contributor

I thought w3f/parity wrote its own version of libp2p in rust, isn't this simply a case of pinging the author of that and ask him to provide a knob to do what this issue requires (i.e. for certain nodes to not add themselves to the DHT)?

@burdges
Copy link

burdges commented Oct 24, 2019

We change rust-libp2p as we see fit :) which makes @tomaka the relevant author.

@infinity0
Copy link
Contributor

Ultimately the DHT needs an access policy that prevents light clients from adding themselves even if they tried to.

@tomaka
Copy link
Contributor Author

tomaka commented Oct 25, 2019

The first problem is that it's not as trivial as you seem to think. Adding a handshake saying whether we are full or light ties Kademlia to Substrate and adds lots of additional roundtrips compared to right now.

Also, any modification to the libp2p-kad code would obviously violate the specs.
We could totally create our own Kademlia-like protocol that does things differently, but considering that we have some many open issues concerning Kademlia, we should probably plan a bit more ahead about what to do here.

@infinity0
Copy link
Contributor

@tomaka I meant a local-only option, at least for the time being. A light client knows it's a light client and can just omit the DHT step.

@tomaka
Copy link
Contributor Author

tomaka commented Oct 25, 2019

Nodes don't choose to insert themselves in the DHT, they get immediately inserted by others when they open a connection.

@burdges
Copy link

burdges commented Oct 25, 2019

I agree we should do this in a "compelling" and "correct" way. :) At the libp2p layer, we should ideally provide protocol labs at least some solid technical reasons to follow our lead because our Go implementation should actually use go-libp2p, so protocol labs accepting the Go teams PRs helps us.

@mxinden
Copy link
Contributor

mxinden commented May 14, 2020

Further discussion of restricting peers in the Kademlia routing table happens in libp2p/rust-libp2p#1560. The restriction ability in combination with the identify protocol could solve this issue.

@tomaka
Copy link
Contributor Author

tomaka commented Jul 29, 2020

We didn't consider this issue when doing #6549, but the foundations should now be in place.

@dvc94ch
Copy link
Contributor

dvc94ch commented Sep 26, 2020

substrate already supports multiple dht's. would it be possible to have light clients insert themselves in a different dht, thus allowing them to still connect to each other?

@burdges
Copy link

burdges commented Sep 26, 2020

I suppose but it sounds non-scalable if you mean true unaffiliated light clients. We'll already want this for more structured stuff like parachains.

@dvc94ch
Copy link
Contributor

dvc94ch commented Sep 26, 2020

According to [0] it is possible to achieve subsecond lookups with a median latency of 200ms in a kademlia dht with 9.5mio nodes. How many nodes is it reasonable to expect?

Maybe we can deal with >9.5mio nodes when we get there?

@burdges
Copy link

burdges commented Sep 26, 2020

There are over 1 billion visa cards in the world, many affiliated with some phone, and like 30 million visa merchants. ;) We want nodes to play specific roles in chains or in layer two systems, and different roles obtain different evidence from chains, so the term "light client" alone makes little sense.

We're working on techniques for one chain's full nodes to track another chain, especially for parachain nodes to track the relay chain, and for validators assigned to a parachain to talk to collators from that parachain. We do ask parachain collators to be full node of the relay chain right now, which limits things considerably.

@infinity0
Copy link
Contributor

allowing them to still connect to each other?

Why?

There's much more incentive to attack the Polkadot DHT than the bittorrent DHTs. The main thing we're concerned about here is spam - we don't want the validator address book to get spammed, which prevents people from finding validators.

@dvc94ch
Copy link
Contributor

dvc94ch commented Sep 28, 2020

That's why I suggested adding them in a separate dht. Practical applications require some way for mobile or web applications that use substrate, to be able to communicate with each other. So while you might build something like youtube on substrate, to handle micro payments to content producers, it's not a good idea for content producers to try to insert their content into a transaction. So in that example, they could add their content to ipfs and publish the cid on chain and users could query the chain for the content they want. If this particular example is a good fit for blockchains in general, I don't know, but obviously it's very limiting in terms of what you can do if you only have thin clients.

@dvc94ch
Copy link
Contributor

dvc94ch commented Sep 28, 2020

To add some context, substrate exposes the NetworkService and a generic request response protocol as a public api. Maybe it's only intended to be used in very specific ways by polkadot. In that case it should be documented as a polkadot api that should not be used by other applications.

@tomaka
Copy link
Contributor Author

tomaka commented Sep 29, 2020

A big concern with light clients is that they might briefly connect to the network just to send a transaction, then disappear. We don't want this type of ephemeral node to play a role in the DHT.

The problem at the moment is that as soon as we receive an incoming connection from a node of the same chain, we had this node to our local k-buckets.

While it is not a huge problem, as the node will eventually be removed its k-bucket when we realize it is unreachable, it is still somewhat of a pollution.

I don't think we should have a code that says if remote.light_client { insert_in_dht = false; }, rather nodes should probably indicate whether they should be inserted in the DHT. As mentioned, it's not an actual attack vector but an optimization, and it is therefore ok to rely on the goodwill of nodes.

@tomaka
Copy link
Contributor Author

tomaka commented May 3, 2021

Update on the issue: after #6549, the only thing left to do is for light clients to no longer advertise support for the Kademlia protocol.
In other words, light clients can emit Kademlia requests, but not receive them.

Marking as easy, might need some minor changes in libp2p.

@tomaka tomaka added Z1-easy Can be fixed primarily by duplicating and adapting code by an intermediate coder and removed Z3-substantial Can be fixed by an experienced coder with a working knowledge of the codebase. labels May 3, 2021
@mxinden
Copy link
Contributor

mxinden commented May 3, 2021

Marking as easy, might need some minor changes in libp2p.

Cross-referencing libp2p's Kademlia client mode here libp2p/rust-libp2p#2025 (comment).

@stale
Copy link

stale bot commented Jul 7, 2021

Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Jul 7, 2021
@tomaka
Copy link
Contributor Author

tomaka commented Jul 8, 2021

This is implemented for the listening side, however the light client still advertises support for Kademlia.

@stale stale bot removed the A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Jul 8, 2021
@tomaka
Copy link
Contributor Author

tomaka commented Apr 26, 2022

Light client support has been removed from Substrate altogether, so this is irrelevant.

@tomaka tomaka closed this as completed Apr 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
I3-bug The node fails to follow expected behavior. Z1-easy Can be fixed primarily by duplicating and adapting code by an intermediate coder
Projects
None yet
Development

No branches or pull requests

6 participants