Using content features instead of hashing for content addresable networks #442

systemshift · 2020-11-14T11:32:55Z

Okay,

So I have been thinking of a different way to implement a content addressable network for some time now, I have written a post to explain the details.
I have also started working on it last month, currently only a bare bones repo

I would love to have some feedback.

Open question

Currently, all nodes have to agree on the same weights in order to get the same results. Obviously this will be a huge problem.
For now, I think a work around this would to be have a central point of deployment, a huge crawler builds the weights, and then they get deployed as a file along with the binary. One issue with this approach is it will not be compatible with other versions, so a user with IPFS 1.1.2 will have different weights than nodes using 1.1.3, and so on.

Siemens neural networks might be a starting point to start looking for solution. The architecture is two neural networks run in parallel and both get updated if one of them updates their weights. But this architecture is also meant to be run on a local machine, not a distributed system, so its possible to be a dead end.

word2vec approach
One idea that I've been thinking about is inspired by the word2vec paper, each node creates a vector space representing local node files. So rather than a hash lookup table, a vector is used to compare it with a high dimension 'lookup box'.
This one needs some more discussion, it is possible I could be mistaken.

content aware network and performance
If each node has a representation of neighbouring nodes data, I believe this could be akin to something like a "IPFS map" for nodes. So rather than having a 'blind' gossip protocol, each node can forward the request to the 'warmer' regional direction in the network.
Don't know what this would look like, but possibly each node has a compressed representation of its neighbouring nodes with good enough information to indicate if the request is closer or father to target.

possible domain specific use case
I have been following the progress on the Earth Biogenome Project. A very ambitious project to catalogue all genomic lifeforms, this is something that can be stored on IPFS, given the work is distributed globally between different labs & universities.
A separate dht network and ipfs fork can be used for this project as a way to address the global dataset for searching for specific genome sequences, without downloading the entire file, since genome files are enormous.

random note
I have been reading this article about graph neural networks.
I cannot write a coherent statement since I don't have a very clear thought, but reading about graphs and attention connections. I find it hard to avoid thinking about how this can be applied to a network application, how is this useful, I do not know.

systemshift · 2020-11-14T11:36:15Z

ops, wrong default label, should have been discussion.

systemshift added the need/triage Needs initial labeling and prioritization label Nov 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using content features instead of hashing for content addresable networks #442

Using content features instead of hashing for content addresable networks #442

systemshift commented Nov 14, 2020

systemshift commented Nov 14, 2020

Using content features instead of hashing for content addresable networks #442

Using content features instead of hashing for content addresable networks #442

Comments

systemshift commented Nov 14, 2020

systemshift commented Nov 14, 2020