Skip to content
This repository has been archived by the owner on Feb 8, 2023. It is now read-only.

Using content features instead of hashing for content addresable networks #442

Open
systemshift opened this issue Nov 14, 2020 · 1 comment
Labels
need/triage Needs initial labeling and prioritization

Comments

@systemshift
Copy link

Okay,

So I have been thinking of a different way to implement a content addressable network for some time now, I have written a post to explain the details.
I have also started working on it last month, currently only a bare bones repo

I would love to have some feedback.

Open question

Currently, all nodes have to agree on the same weights in order to get the same results. Obviously this will be a huge problem.
For now, I think a work around this would to be have a central point of deployment, a huge crawler builds the weights, and then they get deployed as a file along with the binary. One issue with this approach is it will not be compatible with other versions, so a user with IPFS 1.1.2 will have different weights than nodes using 1.1.3, and so on.

Siemens neural networks might be a starting point to start looking for solution. The architecture is two neural networks run in parallel and both get updated if one of them updates their weights. But this architecture is also meant to be run on a local machine, not a distributed system, so its possible to be a dead end.

word2vec approach
One idea that I've been thinking about is inspired by the word2vec paper, each node creates a vector space representing local node files. So rather than a hash lookup table, a vector is used to compare it with a high dimension 'lookup box'.
This one needs some more discussion, it is possible I could be mistaken.

content aware network and performance
If each node has a representation of neighbouring nodes data, I believe this could be akin to something like a "IPFS map" for nodes. So rather than having a 'blind' gossip protocol, each node can forward the request to the 'warmer' regional direction in the network.
Don't know what this would look like, but possibly each node has a compressed representation of its neighbouring nodes with good enough information to indicate if the request is closer or father to target.

possible domain specific use case
I have been following the progress on the Earth Biogenome Project. A very ambitious project to catalogue all genomic lifeforms, this is something that can be stored on IPFS, given the work is distributed globally between different labs & universities.
A separate dht network and ipfs fork can be used for this project as a way to address the global dataset for searching for specific genome sequences, without downloading the entire file, since genome files are enormous.

random note
I have been reading this article about graph neural networks.
I cannot write a coherent statement since I don't have a very clear thought, but reading about graphs and attention connections. I find it hard to avoid thinking about how this can be applied to a network application, how is this useful, I do not know.

@systemshift systemshift added the need/triage Needs initial labeling and prioritization label Nov 14, 2020
@systemshift
Copy link
Author

ops, wrong default label, should have been discussion.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
need/triage Needs initial labeling and prioritization
Projects
None yet
Development

No branches or pull requests

1 participant