Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go.mod: use github.com/holiman/bloomfilter/v2 #22044

Merged
merged 3 commits into from
Jan 12, 2021
Merged

Conversation

holiman
Copy link
Contributor

@holiman holiman commented Dec 20, 2020

This PR switches bloom filter, from the (archived) steakknife implementation a much much better (faster + less false positives) one.

Two orders of magnitude lower false-positive-rate achived here: #21724 (comment) .

@holiman
Copy link
Contributor Author

holiman commented Dec 20, 2020

Running a fast-sync with this PR on bench03 now, against master on bench04

https://geth-bench.ethdevops.io/d/Jpk-Be5Wk/dual-geth?orgId=1&from=1608478900273&to=now

@holiman
Copy link
Contributor Author

holiman commented Dec 21, 2020

The first run didn't actually show a lot of changes in the bloom misses. My second commit added a fourth key to the bloom.
This time, the misses went down considerably:

Screenshot_2020-12-21 Dual Geth - Grafana(1)

The master had a head start on state download, and was ahead the whole time. Towards the end though, this PR started to catch up:

Screenshot_2020-12-21 Dual Geth - Grafana(2)

@fjl fjl changed the title deps: use improved bloom filter implementation go.mod: use github.com/holiman/bloomfilter/v2 Dec 23, 2020
@holiman
Copy link
Contributor Author

holiman commented Dec 27, 2020

I started the --snapshot feature, which also uses bloomfilter. The differences in the faults have been growing over the last couple of days.
Screenshot_2020-12-27 Dual Geth - Grafana

Note: the snapshot blooms are replaced as is, I have not changed the number of keys in this PR.

m := float64(b.bloom.M())

return math.Pow(1.0-math.Exp((-k)*(n+0.5)/(m-1)), k)
func hashToUint64(h common.Hash) uint64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It mixes all the bits in a single uint64, is it ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the intent, yes. However, it's really not changed from before. The previous code took a []byte as input, and called

func (f syncBloomHasher) Sum64() uint64                     { return binary.BigEndian.Uint64(f) }

And this method here is exactly that same binary.BigEndian.Uint64 method, but adapted for a fixed-size array input.

@@ -171,11 +157,11 @@ func (b *SyncBloom) Close() error {
}

// Add inserts a new trie node hash into the bloom filter.
func (b *SyncBloom) Add(hash []byte) {
func (b *SyncBloom) Add(hash common.Hash) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure this []byte to common.Hash change is a good idea. The old code throughout either used some existing []byte value, or if it has a hash, it did hash[:]. In both these cases, there's no copy involved as we're jut using the same backing memory area.

With this refactor however, you are making the parameter an array from potentially a []byte, which always incurs a full copy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SyncBloom.Add always takes a common.Hash without conversion, afaict

@@ -110,12 +97,11 @@ func (b *SyncBloom) init(database ethdb.Iteratee) {
// If the database entry is a trie node, add it to the bloom
key := it.Key()
if len(key) == common.HashLength {
b.bloom.Add(syncBloomHasher(key))
b.bloom.AddHash(binary.BigEndian.Uint64(key))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for calling binary.Bigendian to manually convert instead of common.BytesToHash like other occurances? Seems a bit dangerous as the two methods could go out of sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The syncBloomHasher was replaced by it's Sum64, which is binary.BigEndian.Uint64

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point here was that we have an explicit manual calculation and an implicit built-in calculation now, and it seems dangerous to have two means to arrive to the same number, as eventually one will get modified without the other and things will bork.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a misunderstanding here.

However, we could keep b.bloom.Add(syncBloomHasher(key)) and redefine syncBloomHasher as binary.BigEndian.Uint64 ?

But I think you are conflating two different things, the built-in calculation has nothing to do with this

b.bloom.Add(syncBloomHasher(hash))
} else if ok, hash := rawdb.IsCodeKey(key); ok {
// If the database entry is a contract code, add it to the bloom
b.bloom.AddHash(binary.BigEndian.Uint64(hash))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for calling binary.Bigendian to manually convert instead of common.BytesToHash like other occurances? Seems a bit dangerous as the two methods could go out of sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you mean the hashToUint64 ?

Copy link
Member

@karalabe karalabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@karalabe karalabe added this to the 1.10.0 milestone Jan 12, 2021
@holiman holiman merged commit 93a89b2 into ethereum:master Jan 12, 2021
bulgakovk pushed a commit to bulgakovk/go-ethereum that referenced this pull request Jan 26, 2021
* deps: use improved bloom filter implementation

* eth/handler, trie: use 4 keys for syncbloom + minor fixes

* eth/protocols, trie: revert change on syncbloom method signature
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants