Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(lib/trie): refactor encoding and hash related code in trie package #2077

Merged
merged 50 commits into from
Dec 16, 2021

Conversation

qdm12
Copy link
Contributor

@qdm12 qdm12 commented Nov 29, 2021

πŸ’β€β™‚οΈ don't be too scared by the deltas, it is mostly test and mock code being moved in subpackages

Changes

  • Split code in subpackages
  • Add tests
  • Replace existing tests with more in-depth tests
  • Self reviewed
  • Ran gossamer without error

Notes

  • This does not change behavior really, it's just moving things around to see more clearly

Branched out PRs

Tests

go test -race ./lib/trie/...

Issues

Primary Reviewer

@codecov
Copy link

codecov bot commented Nov 29, 2021

Codecov Report

Merging #2077 (61baed1) into development (d096d44) will increase coverage by 0.19%.
The diff coverage is 91.93%.

Impacted file tree graph

@@               Coverage Diff               @@
##           development    #2077      +/-   ##
===============================================
+ Coverage        61.35%   61.55%   +0.19%     
===============================================
  Files              202      213      +11     
  Lines            27355    27390      +35     
===============================================
+ Hits             16785    16860      +75     
+ Misses            8694     8665      -29     
+ Partials          1876     1865      -11     
Flag Coverage Ξ”
unit-tests 61.55% <91.93%> (+0.19%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Ξ”
lib/trie/print.go 80.00% <71.42%> (-4.32%) ⬇️
internal/trie/node/leaf_encode.go 80.26% <80.26%> (ΓΈ)
lib/trie/database.go 49.40% <82.45%> (ΓΈ)
internal/trie/node/branch_encode.go 84.84% <84.84%> (ΓΈ)
internal/trie/node/hash.go 85.71% <85.71%> (ΓΈ)
lib/trie/lookup.go 73.91% <88.88%> (ΓΈ)
lib/trie/trie.go 91.62% <95.87%> (-0.61%) ⬇️
internal/trie/codec/nibbles.go 100.00% <100.00%> (ΓΈ)
internal/trie/node/branch.go 100.00% <100.00%> (ΓΈ)
internal/trie/node/children.go 100.00% <100.00%> (ΓΈ)
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Ξ” = absolute <relative> (impact), ΓΈ = not affected, ? = missing data
Powered by Codecov. Last update d096d44...61baed1. Read the comment docs.

// ScaleEncodeHash hashes the node (blake2b sum on encoded value)
// and then SCALE encodes it. This is used to encode children
// nodes of branches.
func (b *Branch) ScaleEncodeHash() (encoding []byte, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would assume Encode functions take an io.Writer.

Suggested change
func (b *Branch) ScaleEncodeHash() (encoding []byte, err error) {
func (b *Branch) ScaleMarshalHash() (encoding []byte, err error) {

Copy link
Contributor Author

@qdm12 qdm12 Dec 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually if I do that that will be an extra copy (buffer's Write() implementation is a copy), since there is no scale encoder writing to a io.Writer directly. Maybe do you feel like adding such encoder in the scale package? That would allow to use a writer directly now. Or let me know I can do eventually do it quickly tomorrow sometime.

As in I guess I'm pre-optimizing this although we have another elephant size memory problem, but this part of the code seems to be quite cpu and memory intense still so it shouldn't hurt to save a copy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created an issue a while ago for this. #1897

lib/trie/branch/branch.go Outdated Show resolved Hide resolved
lib/trie/decode/byte.go Outdated Show resolved Hide resolved
lib/trie/decode/byte.go Outdated Show resolved Hide resolved

// KeyLEToNibbles converts a Little Endian byte slice into nibbles.
// It assumes bytes are already in Little Endian and does not rearrange nibbles.
func KeyLEToNibbles(in []byte) (nibbles []byte) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there should be some sort of Key type, and this function should just be a receiver function on that type.
Key.Nibbles() ([]byte).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started working on it in a branch, but it's a lot more changes than I expected. I will pause it for now but taking note to do it at a later time eventually.

lib/trie/leaf/copy.go Outdated Show resolved Hide resolved
Copy link
Contributor

@noot noot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like the idea of separating node (branch and leaf) types into separate packages, imo they should stay unexported as they don't need to show up in the API docs. node should be used only internally to the trie package imo

@qdm12
Copy link
Contributor Author

qdm12 commented Dec 2, 2021

imo they should stay unexported as they don't need to show up in the API docs

I can move them in the internal directory such as internal/trie/node so they won't show in docs/as API outside this project. I think we also all agree both branch and leaf packages should be moved together in the node subpackage as well. However we do need subpackages, the trie package is way too dense, so we can't go back to have everything in a single package and have unexported things.

@qdm12 qdm12 force-pushed the qdm12-trie-node-subpkg-base branch from c1f22e9 to 30e2318 Compare December 7, 2021 09:27
@qdm12
Copy link
Contributor Author

qdm12 commented Dec 7, 2021

@noot @timwu20

  1. I merged the lib/trie/leaf + lib/trie/branch + lib/trie/encode + lib/trie/decode package in lib/trie/node

  2. I moved lib/trie/node in internal/trie/node to not expose the node API

  3. I added an interface alias in lib/trie

    import "github.com/ChainSafe/gossamer/internal/trie/node"
    
    // Node is node in the trie and can be a leaf or a branch.
    type Node node.Node

    So that developers importing the gossamer repository can implement the interface if they want to for whatever reason.

Let me know if that fits what you wanted πŸ˜‰ Thanks!

@qdm12 qdm12 force-pushed the qdm12-trie-node-subpkg-base branch 2 times, most recently from f46613f to be205de Compare December 8, 2021 15:50
Copy link
Contributor

@noot noot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh I don't know how I feel about moving everything into separate packages, I feel like it's more confusing but if you think it's clearer then happy to review :p

@qdm12
Copy link
Contributor Author

qdm12 commented Dec 8, 2021

tbh I don't know how I feel about moving everything into separate packages, I feel like it's more confusing but if you think it's clearer then happy to review :p

It does make sense to me to split it since node/leaf/branch can be split apart from the trie / database / generations / pruning operations. I'm opposed to have many nested micro packages, but here we really had some heavy weight trie package 😸 I'm working now off this branch and it feels quite nicer I think.

@qdm12 qdm12 force-pushed the qdm12-trie-node-subpkg-base branch from be205de to 6f38613 Compare December 9, 2021 16:24
lib/trie/decode.go Outdated Show resolved Hide resolved
Copy link
Contributor

@kishansagathiya kishansagathiya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky since I don't have a clear view on what has changed and what has moved. I have tried to comment wherever I see problems, but I have also ran quickly with some parts and avoided some comments since I don't think it is a good idea to add more changes in this PR.

  • I don't like that some smaller parts have their own files, for example dirty, key, generation, value, copy, etc.
  • To me, methods of same struct being together(in the same file) makes more sense than same methods of similar structs.
  • It might be possible to just merge Leaf and Branch struct. It might require some gymnastic like using an interface for methods that are different or functions as values. But if we can do that, that could make some things clearer. (Not as part of this PR)

Though I have made some smaller comments, we should (I realize that now) not make more changes in this PR to keep things simpler.

internal/trie/node/decode.go Outdated Show resolved Hide resolved
internal/trie/node/decode_test.go Show resolved Hide resolved
internal/trie/node/leaf_encode.go Outdated Show resolved Hide resolved
internal/trie/node/types.go Show resolved Hide resolved
@qdm12
Copy link
Contributor Author

qdm12 commented Dec 14, 2021

@kishansagathiya

This is tricky since I don't have a clear view on what has changed and what has moved.

Agreed. There was unfortunately no way to split the changes really due to the file movements.
On the other hand:

  • I added deeper tests with higher coverage in all the internal/trie packages
  • Existing higher level tests (i.e. in lib/trie) still pass

So I am confident no bug was introduced. But I agree it's hard to review.

avoided some comments since I don't think it is a good idea to add more changes in this PR.
Though I have made some smaller comments, we should (I realize that now) not make more changes in this PR to keep things simpler.

Feel free to comment, I already have a few PRs pointing to this PR, adding a few more is fine too. I'll rebase them on development once this one merges.
I'm also taking note of things I can't do right now as side branches, I will eventually create an issue with each suggestion.

I don't like that some smaller parts have their own files, for example dirty, key, generation, value, copy
To me, methods of same struct being together(in the same file) makes more than same methods of similar structs.

Agreed; however the alternative is to have only a leaf.go and branch.go files which would be in the 1000+ lines, and that's hard to navigate in.
That is why I took the approach to split by 'action' (i.e. encoding, decoding, generation etc.) rather than by type. That would definitely be improved by the suggestion you made below:

It might be possible to just merge Leaf and Branch struct

Definitely. I'll explore this once the memory issue is fixed. Added this to my list of suggestions.

internal/trie/node/decode.go Outdated Show resolved Hide resolved
@qdm12 qdm12 requested review from EclesioMeloJunior and removed request for arijitAD and omar391 December 15, 2021 13:05
internal/trie/node/branch.go Outdated Show resolved Hide resolved
Copy link
Contributor

@noot noot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just minor comments!

@qdm12 qdm12 force-pushed the qdm12-trie-node-subpkg-base branch from de4b2a8 to 61baed1 Compare December 16, 2021 13:53
@qdm12 qdm12 merged commit a3ae30b into development Dec 16, 2021
@qdm12 qdm12 deleted the qdm12-trie-node-subpkg-base branch December 16, 2021 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants