Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix malformed CAR panics and excessive memory usage #312

Merged
merged 37 commits into from
Jul 6, 2022
Merged

Commits on Jun 30, 2022

  1. Configuration menu
    Copy the full SHA
    4257e9c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    6c256c2 View commit details
    Browse the repository at this point in the history
  3. test: add fuzzing of NewCarReader

    Jorropo authored and masih committed Jun 30, 2022
    Configuration menu
    Copy the full SHA
    3143968 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    c133105 View commit details
    Browse the repository at this point in the history
  5. test: v2 add fuzzing to BlockReader

    Jorropo authored and masih committed Jun 30, 2022
    Configuration menu
    Copy the full SHA
    e36135f View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    8004dff View commit details
    Browse the repository at this point in the history
  7. test: v2 add fuzzing to Reader

    Jorropo authored and masih committed Jun 30, 2022
    Configuration menu
    Copy the full SHA
    f5b91b9 View commit details
    Browse the repository at this point in the history
  8. fix: v2 don't allocate indexes too big

    Jorropo authored and masih committed Jun 30, 2022
    Configuration menu
    Copy the full SHA
    6dc2ea1 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    67ff54f View commit details
    Browse the repository at this point in the history
  10. test: v2 add fuzzing of the index

    Jorropo authored and masih committed Jun 30, 2022
    Configuration menu
    Copy the full SHA
    4e24d90 View commit details
    Browse the repository at this point in the history
  11. ci: add fuzzing on CI

    Jorropo authored and masih committed Jun 30, 2022
    Configuration menu
    Copy the full SHA
    2eea288 View commit details
    Browse the repository at this point in the history
  12. feat: Refactor indexes to put storage considerations on consumers

    There is no way I can make a safe implementation of the parser by slurping
    thing into memory, indexes people use are just too big.
    
    So I made a new API which force consumers to manage that.
    They can choose to use a bytes.Reader, *os.File, mmaped thing, ...
    Jorropo authored and masih committed Jun 30, 2022
    Configuration menu
    Copy the full SHA
    f8735e6 View commit details
    Browse the repository at this point in the history
  13. fix index comparisons

    rvagg authored and masih committed Jun 30, 2022
    Configuration menu
    Copy the full SHA
    6e4e208 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    3eabc2d View commit details
    Browse the repository at this point in the history
  15. fix: staticcheck catches

    rvagg authored and masih committed Jun 30, 2022
    Configuration menu
    Copy the full SHA
    04a85e7 View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    77de9fe View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    dfbe3f7 View commit details
    Browse the repository at this point in the history
  18. feat: MaxAllowedSectionSize default to 32M

    rvagg authored and masih committed Jun 30, 2022
    Configuration menu
    Copy the full SHA
    3940bf5 View commit details
    Browse the repository at this point in the history
  19. feat: MaxAllowed{Header,Section}Size option

    rvagg authored and masih committed Jun 30, 2022
    Configuration menu
    Copy the full SHA
    ec34902 View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    3c9f1d2 View commit details
    Browse the repository at this point in the history
  21. fix: tighter constraint of singleWidthIndex width, add index recommen…

    …tation docs
    rvagg authored and masih committed Jun 30, 2022
    Configuration menu
    Copy the full SHA
    3a1b4e8 View commit details
    Browse the repository at this point in the history
  22. Fix testutil assertion logic and update index generation tests

    Update index generation tests to assert indices are identical.
    
    Fix minor typo in the test utility name and a bug where the check was
    not using both index instances to assert they are identical.
    Also refactor the use of lock in favour of wait group for better
    readability of the assertion logic.
    masih committed Jun 30, 2022
    Configuration menu
    Copy the full SHA
    8a5b330 View commit details
    Browse the repository at this point in the history
  23. Use a fix code as the multihash code for CarIndexSorted

    Previous changes added the `ForEach` interface to `Index` type which
    enables iteration through the index by multihash and offset. However,
    not all index types contain enough information to construct the full
    multihash. Namely, `CarIndexSorted` only stores the digest portion of
    the multihashes.
    
    In order to implement `ForEach` for this index type correctly uses the
    `uint64` max value as the code in the multihash and document the
    behaviour where the iterations over this index type should not rely on
    the returned code. Note that the max value is used as a code that
    doesn't match any existing multicodec.Code to avoid misleading users.
    masih committed Jun 30, 2022
    Configuration menu
    Copy the full SHA
    fd7281b View commit details
    Browse the repository at this point in the history
  24. Remove support for ForEach enumeration from car-index-sorted

    This index type does not store enough information to satisfy `ForEach`.
    It only contains the digest of mulithashes and not their code. Instead
    of some partial functionality simply return an error when
    `ForEach` is called on this function type. Because, there is no valid
    use for this index type and the user should ber regenerating the index
    to the newer `car-multihash-index-sorted` anyway.
    
    Update tests to include samples of both types and assert IO operations
    and index generation for both formats.
    masih committed Jun 30, 2022
    Configuration menu
    Copy the full SHA
    e6d416c View commit details
    Browse the repository at this point in the history

Commits on Jul 1, 2022

  1. Configuration menu
    Copy the full SHA
    2539ce2 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    708b0a2 View commit details
    Browse the repository at this point in the history
  3. test: add fuzzing for reader#Inspect

    Jorropo authored and rvagg committed Jul 1, 2022
    Configuration menu
    Copy the full SHA
    a36603e View commit details
    Browse the repository at this point in the history
  4. Use streaming APIs to verify the hash of blocks in CAR Inspect

    `go-cid` exposes `Sum` API that facilitates calculation of the CID from
    `[]byte` payload. `go-multihash` now exposes `SumStream` which can
    calculate digest from `io.Reader` as well as `[]byte`. But,
    unfortunately the equivalent API does not exist in `go-cid`.
    
    To avoid copying the entire block into memory, implement CID calculation
    using the streaming multihash sum during inspection of CAR payload.
    masih committed Jul 1, 2022
    Configuration menu
    Copy the full SHA
    965f1f3 View commit details
    Browse the repository at this point in the history

Commits on Jul 2, 2022

  1. Use consistent CID mismatch error in Inspect and BlockReader.Next

    This reverts the earlier changes to get the message consistent. Note,
    the CID we expect is the one in the CAR payload, not the calculated CID
    for the block.
    masih committed Jul 2, 2022
    Configuration menu
    Copy the full SHA
    a274e75 View commit details
    Browse the repository at this point in the history
  2. Benchmark Reader.Inspect with and without hash validation

    Benchmark the `Reader.Inspect` with and without hash validation using a
    randomly generated CARv2 file of size 10 MiB.
    
    Results from running the benchmark in parallel locally on MacOS
    `Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz` repeated 10 times:
    
    ```
    Reader_InspectWithBlockValidation-8       5.30ms ±48%
    Reader_InspectWithoutBlockValidation-8     231µs ±42%
    
    name                                    speed
    Reader_InspectWithBlockValidation-8     2.08GB/s ±35%
    Reader_InspectWithoutBlockValidation-8  46.8GB/s ±32%
    
    name                                    alloc/op
    Reader_InspectWithBlockValidation-8       10.7MB ± 0%
    Reader_InspectWithoutBlockValidation-8    60.7kB ± 0%
    
    name                                    allocs/op
    Reader_InspectWithBlockValidation-8        4.54k ± 0%
    Reader_InspectWithoutBlockValidation-8     2.29k ± 0%
    ```
    masih committed Jul 2, 2022
    Configuration menu
    Copy the full SHA
    641c0f8 View commit details
    Browse the repository at this point in the history

Commits on Jul 4, 2022

  1. Drop repeated package name from CarStats

    Cosmetic refactor to rename `car.CarStats` to `car.Stats`, which looks
    more fluent when using the API.
    masih committed Jul 4, 2022
    Configuration menu
    Copy the full SHA
    8696a19 View commit details
    Browse the repository at this point in the history
  2. Return error when section length is invalid varint

    Return potential error when reading section error as varint. Add test to
    verify the error is indeed returned.
    
    Use `errors.New` instead of `fmt.Errorf` when no formatting is needed in
    error message.
    masih committed Jul 4, 2022
    Configuration menu
    Copy the full SHA
    bed1297 View commit details
    Browse the repository at this point in the history
  3. Fix fuzz CI job

    masih authored and Jorropo committed Jul 4, 2022
    Configuration menu
    Copy the full SHA
    a41506a View commit details
    Browse the repository at this point in the history

Commits on Jul 5, 2022

  1. Revert changes to index.Index while keeping most of security fixes

    Revert the changes to `index.Index` interface such that it is the same
    as the current go-car main branch head.
    
    Reverting the changes, however, means that unmarshalling untrusted
    indices is indeed dangerous and should not be done on untrusted files.
    
    Note, the `carv2.Reader` APIs are changed to return errors as well as
    readers when getting `DataReader` and `IndexReader`. This is to
    accommodate issues detected by fuzz testing while removing boiler plate
    code in internal IO reader conversion. This is a breaking change to the
    current API but should be straight forward to roll out.
    
    Remove index fuzz tests and change inspection to only read the index
    codec instead of reading the entire index.
    masih authored and Jorropo committed Jul 5, 2022
    Configuration menu
    Copy the full SHA
    a971f7c View commit details
    Browse the repository at this point in the history

Commits on Jul 6, 2022

  1. Revert changes to insertionindex

    Revert changes to serialization of `insertionindex` postponed until the
    streaming index work stream.
    masih committed Jul 6, 2022
    Configuration menu
    Copy the full SHA
    dec4ca1 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    80bb0d5 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    d68cd32 View commit details
    Browse the repository at this point in the history