-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
double hashing implementation #1
base: master
Are you sure you want to change the base?
Conversation
noot
commented
Aug 30, 2022
- calculate and use sha256 hash of content multihash before advertising/discovering
- run golangci-lint and fix lint errors
…into noot/double-hash
…keyspace dep to fork
Cool, this is a good first chunk,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress! This looks all go to me!
Concerning the isHashed
variable, I guess that the setting it to true
or false
will depend on the migration that we choose to go with.
Did you already run some test cases with isHashed = true
in lookup.go
, or any test case that 1. provides some content to the DHT using double hashing and 2. retrieving this content (from another peer)?
lookup.go
Outdated
// TODO: I can break the interface! return []peer.ID | ||
lookupRes, err := dht.runLookupWithFollowup(ctx, key, | ||
|
||
const isHashed = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can the host determine whether the requested key is hashed or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the recipient of the query will receive Message_FIND_NODE
, Message_GET_VALUE
or Message_GET_PROVIDERS
(see handlers.go
). for the first 2, the key is not hashed, but for Message_GET_PROVIDERS
the key will be hashed.
query.go
Outdated
q := &query{ | ||
id: uuid.New(), | ||
key: target, | ||
key: target, // TODO change this if hashed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this doesn't need to be changed as if isHashed
, target
should already be the hash
routing.go
Outdated
@@ -288,7 +288,8 @@ func (dht *IpfsDHT) getValues(ctx context.Context, key string, stopQuery chan st | |||
go func() { | |||
defer close(valCh) | |||
defer close(lookupResCh) | |||
lookupRes, err := dht.runLookupWithFollowup(ctx, key, | |||
const isHashed = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is isHashed
set to false here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we decided on hashing only the key for the AddProviders
/GetProviders
path, as it's what's used specifically in IPFS to find content providers. Get/Put keys is only used for IPNS as far as I know
@guillaumemichel yes, the existing unit tests that use |
@@ -777,10 +777,14 @@ func (dht *FullRT) Provide(ctx context.Context, key cid.Cid, brdcst bool) (err e | |||
return fmt.Errorf("invalid cid: undefined") | |||
} | |||
keyMH := key.Hash() | |||
logger.Debugw("providing", "cid", key, "mh", internal.LoggableProviderRecordBytes(keyMH)) | |||
mhHash := internal.Sha256Multihash(keyMH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think it might be a good idea to add a salt for this hash? this way there's another layer of protection that could be used from the provider?
but i can see the other side where if we want to make this easily reproducible, it would be better to leave out the additional step (-:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If required, we could add a constant salt (e.g to specify the purpose of the hash).
However, the secret to access the content is already the CID, adding a random salt means that knowledge of both the CID and the salt is required to find the content. If we add a random salt here, it would have to be shared along with the CID, thus it wouldn't add much security. (in other words, it would only make the secret longer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider adding a constant salt (or prefix) to the spec and here - so that the derived hashes are in a known specific domain that doesn't easily have collisions with other things.
lookup.go
Outdated
// TODO: I can break the interface! return []peer.ID | ||
lookupRes, err := dht.runLookupWithFollowup(ctx, key, | ||
|
||
const isHashed = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will we add a check to see if it's hashed later on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it's checked in query.go
line 152
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, this got removed since the double hash is now a multihash.Multihash
func(ctx context.Context, p peer.ID) ([]*peer.AddrInfo, error) { | ||
// For DHT query command | ||
routing.PublishQueryEvent(ctx, &routing.QueryEvent{ | ||
Type: routing.SendingQuery, | ||
ID: p, | ||
}) | ||
|
||
provs, closest, err := dht.protoMessenger.GetProviders(ctx, p, key) | ||
provs, closest, err := dht.protoMessenger.GetProviders(ctx, p, multihash.Multihash(mhHash[:])) // TODO: this isn't a multihash anymore, change GetProvider param type instead? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth having a multihash code for these? doublehashed-sha256
or similar and encode the derived hash as a multihash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, updated to use DBL_SHA2_256
internal/utils.go
Outdated
type Hash [32]byte | ||
|
||
func Sha256Multihash(mh multihash.Multihash) Hash { | ||
return sha256.Sum256(mh) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a derived hash, so e.g. add a custom prefix like sha256.Sum256("CR_DOUBLEHASH" + mh.Bytes())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated!
@@ -777,10 +777,14 @@ func (dht *FullRT) Provide(ctx context.Context, key cid.Cid, brdcst bool) (err e | |||
return fmt.Errorf("invalid cid: undefined") | |||
} | |||
keyMH := key.Hash() | |||
logger.Debugw("providing", "cid", key, "mh", internal.LoggableProviderRecordBytes(keyMH)) | |||
mhHash := internal.Sha256Multihash(keyMH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider adding a constant salt (or prefix) to the spec and here - so that the derived hashes are in a known specific domain that doesn't easily have collisions with other things.
internal/utils.go
Outdated
|
||
func Sha256Multihash(mh multihash.Multihash) multihash.Multihash { | ||
prefix := []byte("CR_DOUBLEHASH") | ||
mh, err := multihash.Sum(append(prefix, mh...), multihash.DBL_SHA2_256, 30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason we're saying this has a length of 30?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, since the multihash automatically prepends the code
and length
of the data, the returned multihash is of length 32
. If I set the length to 32
(which I initially tried) it causes the returned multihash to be of length 34, which then causes a panic in routingTable.NearestPeers
when it calls CommonPrefixLen
to compare the multihash with the local peer ID, since the lengths are mismatched.
the other workaround I thought off is slicing off the first 2 bytes of the multihash / only using the digest, but I'm not sure if that would have unintended consequences. any thoughts on what method would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func Encode(buf []byte, code uint64) ([]byte, error) {
// FUTURE: this function always causes heap allocs... but when used, this value is almost always going to be appended to another buffer (either as part of CID creation, or etc) -- should this whole function be rethought and alternatives offered?
newBuf := make([]byte, varint.UvarintSize(code)+varint.UvarintSize(uint64(len(buf)))+len(buf))
n := varint.PutUvarint(newBuf, code)
n += varint.PutUvarint(newBuf[n:], uint64(len(buf)))
copy(newBuf[n:], buf)
return newBuf, nil
}
The Encode()
function basically takes the digest and append it to the code, hence returning something of the format: [code, digest]. The code is represented by 2 bytes.
We don't want the digest to start with a constant, as this digest determine the location of the provider records in the DHT. Having a common prefix (code) to all digest would mean that all Double Hashed Provider Records are allocated to a restricted location of the keyspace. We don't need to have the code in the output. We could update the code as follow:
mh, err := multihash.Sum(append(prefix, mh...), multihash.DBL_SHA2_256, KEYSIZE)
if err != nil {
// this shouldn't ever happen
panic(err)
}
// don't return the code prefix, only return the last KEYSIZE bytes
return mh[len(mh)-KEYSIZE:]
KEYSIZE
should be a constant value defining the DHT key size. Ideally this constant already exists somewhere in the DHT implementation or it is passed down by kubo. Its value should be 32 for now, but if the DHT keysize was to change this piece of code shouldn't need to be updated, updating the constant should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guillaumemichel that works, but the issue is now the function returns a multihash.Multihash
that isn't actually a multihash (since the code is gone). the purpose from what I remember of using multihash.Sum
was that we could specify the hash was of type DBL_SHA2_256
, whereas previously it was just using sha256 (which is effectively the same as doing multihash.Sum
then choosing the last 32 bytes). should I just change the function back to do a sha256
sum? because I agree we shouldn't have the hashes all having the same first 2 bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should be fine to change the function back to sha256
fullrt/dht.go
Outdated
|
||
// add self locally | ||
dht.ProviderManager.AddProvider(ctx, keyMH, peer.AddrInfo{ID: dht.h.ID()}) | ||
err = dht.ProviderManager.AddProvider(ctx, keyMH, peer.AddrInfo{ID: dht.h.ID()}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be mhHash
instead of keyMH
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, updated!
@noot we can now probably get rid of this abomination in go-libp2p-kbucket. IIUC it is the result of a preimage bruteforce to find a key that would fit every single small bucket. It isn't required anymore with double hashing as one doesn't request the preimage of a kad key, but the kad key itself. We can simply forge random keys that fit inside the bucket instead of taking the ones resulting from the brute force. |