Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix malformed CAR panics and excessive memory usage #312

Merged
merged 37 commits into from
Jul 6, 2022
Merged

Conversation

masih
Copy link
Member

@masih masih commented Jul 6, 2022

Fix an issue where malformed CARv1 files cause a panic by introducing a maximum header and section size in CARv1 as an option.

Add fuzz tests across the repo

Jorropo and others added 30 commits June 30, 2022 10:44
There is no way I can make a safe implementation of the parser by slurping
thing into memory, indexes people use are just too big.

So I made a new API which force consumers to manage that.
They can choose to use a bytes.Reader, *os.File, mmaped thing, ...
Update index generation tests to assert indices are identical.

Fix minor typo in the test utility name and a bug where the check was
not using both index instances to assert they are identical.
Also refactor the use of lock in favour of wait group for better
readability of the assertion logic.
Previous changes added the `ForEach` interface to `Index` type which
enables iteration through the index by multihash and offset. However,
not all index types contain enough information to construct the full
multihash. Namely, `CarIndexSorted` only stores the digest portion of
the multihashes.

In order to implement `ForEach` for this index type correctly uses the
`uint64` max value as the code in the multihash and document the
behaviour where the iterations over this index type should not rely on
the returned code. Note that the max value is used as a code that
doesn't match any existing multicodec.Code to avoid misleading users.
This index type does not store enough information to satisfy `ForEach`.
It only contains the digest of mulithashes and not their code. Instead
of some partial functionality simply return an error when
`ForEach` is called on this function type. Because, there is no valid
use for this index type and the user should ber regenerating the index
to the newer `car-multihash-index-sorted` anyway.

Update tests to include samples of both types and assert IO operations
and index generation for both formats.
`go-cid` exposes `Sum` API that facilitates calculation of the CID from
`[]byte` payload. `go-multihash` now exposes `SumStream` which can
calculate digest from `io.Reader` as well as `[]byte`. But,
unfortunately the equivalent API does not exist in `go-cid`.

To avoid copying the entire block into memory, implement CID calculation
using the streaming multihash sum during inspection of CAR payload.
This reverts the earlier changes to get the message consistent. Note,
the CID we expect is the one in the CAR payload, not the calculated CID
for the block.
Benchmark the `Reader.Inspect` with and without hash validation using a
randomly generated CARv2 file of size 10 MiB.

Results from running the benchmark in parallel locally on MacOS
`Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz` repeated 10 times:

```
Reader_InspectWithBlockValidation-8       5.30ms ±48%
Reader_InspectWithoutBlockValidation-8     231µs ±42%

name                                    speed
Reader_InspectWithBlockValidation-8     2.08GB/s ±35%
Reader_InspectWithoutBlockValidation-8  46.8GB/s ±32%

name                                    alloc/op
Reader_InspectWithBlockValidation-8       10.7MB ± 0%
Reader_InspectWithoutBlockValidation-8    60.7kB ± 0%

name                                    allocs/op
Reader_InspectWithBlockValidation-8        4.54k ± 0%
Reader_InspectWithoutBlockValidation-8     2.29k ± 0%
```
masih and others added 7 commits July 4, 2022 09:16
Cosmetic refactor to rename `car.CarStats` to `car.Stats`, which looks
more fluent when using the API.
Return potential error when reading section error as varint. Add test to
verify the error is indeed returned.

Use `errors.New` instead of `fmt.Errorf` when no formatting is needed in
error message.
Revert the changes to `index.Index` interface such that it is the same
as the current go-car main branch head.

Reverting the changes, however, means that unmarshalling untrusted
indices is indeed dangerous and should not be done on untrusted files.

Note, the `carv2.Reader` APIs are changed to return errors as well as
readers when getting `DataReader` and `IndexReader`. This is to
accommodate issues detected by fuzz testing while removing boiler plate
code in internal IO reader conversion. This is a breaking change to the
current API but should be straight forward to roll out.

Remove index fuzz tests and change inspection to only read the index
codec instead of reading the entire index.
Revert changes to serialization of `insertionindex` postponed until the
streaming index work stream.
Copy link
Contributor

@Jorropo Jorropo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🤞

@github-actions
Copy link
Contributor

github-actions bot commented Jul 6, 2022

Suggested version: v0.4.0
Comparing to: v0.3.3 (diff)

Changes in go.mod file(s):

diff --git a/go.mod b/go.mod
index f63811b..1d50e22 100644
--- a/go.mod
+++ b/go.mod
@@ -2,16 +2,58 @@ module github.com/ipld/go-car
 
 require (
 	github.com/ipfs/go-block-format v0.0.3
-	github.com/ipfs/go-cid v0.0.7
-	github.com/ipfs/go-datastore v0.5.1 // indirect
-	github.com/ipfs/go-ipfs-blockstore v1.1.2 // indirect
+	github.com/ipfs/go-cid v0.1.0
 	github.com/ipfs/go-ipld-cbor v0.0.5
 	github.com/ipfs/go-ipld-format v0.2.0
 	github.com/ipfs/go-merkledag v0.5.1
-	github.com/ipld/go-codec-dagpb v1.3.0
-	github.com/ipld/go-ipld-prime v0.14.3-0.20211207234443-319145880958
+	github.com/ipld/go-codec-dagpb v1.3.1
+	github.com/ipld/go-ipld-prime v0.16.0
 	github.com/multiformats/go-multihash v0.1.0
 	github.com/stretchr/testify v1.7.0
 )
 
-go 1.16
+require (
+	github.com/davecgh/go-spew v1.1.1 // indirect
+	github.com/gogo/protobuf v1.3.2 // indirect
+	github.com/google/uuid v1.2.0 // indirect
+	github.com/hashicorp/golang-lru v0.5.4 // indirect
+	github.com/ipfs/bbloom v0.0.4 // indirect
+	github.com/ipfs/go-blockservice v0.2.1 // indirect
+	github.com/ipfs/go-datastore v0.5.1 // indirect
+	github.com/ipfs/go-ipfs-blockstore v1.1.2 // indirect
+	github.com/ipfs/go-ipfs-ds-help v1.1.0 // indirect
+	github.com/ipfs/go-ipfs-exchange-interface v0.1.0 // indirect
+	github.com/ipfs/go-ipfs-exchange-offline v0.1.1 // indirect
+	github.com/ipfs/go-ipfs-util v0.0.2 // indirect
+	github.com/ipfs/go-ipld-legacy v0.1.0 // indirect
+	github.com/ipfs/go-log v1.0.5 // indirect
+	github.com/ipfs/go-log/v2 v2.3.0 // indirect
+	github.com/ipfs/go-metrics-interface v0.0.1 // indirect
+	github.com/ipfs/go-verifcid v0.0.1 // indirect
+	github.com/jbenet/goprocess v0.1.4 // indirect
+	github.com/klauspost/cpuid/v2 v2.0.9 // indirect
+	github.com/mattn/go-isatty v0.0.13 // indirect
+	github.com/minio/blake2b-simd v0.0.0-20160723061019-3f5f724cb5b1 // indirect
+	github.com/minio/sha256-simd v1.0.0 // indirect
+	github.com/mr-tron/base58 v1.2.0 // indirect
+	github.com/multiformats/go-base32 v0.0.3 // indirect
+	github.com/multiformats/go-base36 v0.1.0 // indirect
+	github.com/multiformats/go-multibase v0.0.3 // indirect
+	github.com/multiformats/go-varint v0.0.6 // indirect
+	github.com/opentracing/opentracing-go v1.2.0 // indirect
+	github.com/pmezard/go-difflib v1.0.0 // indirect
+	github.com/polydawn/refmt v0.0.0-20201211092308-30ac6d18308e // indirect
+	github.com/spaolacci/murmur3 v1.1.0 // indirect
+	github.com/whyrusleeping/cbor-gen v0.0.0-20200123233031-1cdf64d27158 // indirect
+	go.uber.org/atomic v1.7.0 // indirect
+	go.uber.org/multierr v1.6.0 // indirect
+	go.uber.org/zap v1.16.0 // indirect
+	golang.org/x/crypto v0.0.0-20210506145944-38f3c27a63bf // indirect
+	golang.org/x/sys v0.0.0-20210426080607-c94f62235c83 // indirect
+	golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 // indirect
+	google.golang.org/protobuf v1.27.1 // indirect
+	gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c // indirect
+	lukechampine.com/blake3 v1.1.6 // indirect
+)
+
+go 1.17diff --git a/cmd/go.mod b/cmd/go.mod
index 0d933ec..a7b9580 100644
--- a/cmd/go.mod
+++ b/cmd/go.mod
@@ -1,20 +1,74 @@
 module github.com/ipld/go-car/cmd
 
-go 1.16
+go 1.17
 
 require (
 	github.com/dustin/go-humanize v1.0.0
 	github.com/ipfs/go-block-format v0.0.3
 	github.com/ipfs/go-cid v0.1.0
-	github.com/ipfs/go-ipfs-blockstore v1.0.3
-	github.com/ipfs/go-unixfsnode v1.1.4-0.20211105121048-b9b6e9dc571e
-	github.com/ipld/go-car v0.3.2-0.20211001222544-c93f5367a75c
-	github.com/ipld/go-car/v2 v2.1.0
-	github.com/ipld/go-codec-dagpb v1.3.0
-	github.com/ipld/go-ipld-prime v0.12.4-0.20211014180653-3ba656a3bc6b
-	github.com/multiformats/go-multicodec v0.3.1-0.20210902112759-1539a079fd61
-	github.com/multiformats/go-multihash v0.0.17-0.20211012170226-654b06d7912a
+	github.com/ipfs/go-ipfs-blockstore v1.1.2
+	github.com/ipfs/go-unixfsnode v1.4.0
+	github.com/ipld/go-car v0.3.3
+	github.com/ipld/go-car/v2 v2.1.2-0.20220121091436-7e10f104f452
+	github.com/ipld/go-codec-dagpb v1.3.1
+	github.com/ipld/go-ipld-prime v0.16.1-0.20220324131737-6e219a02ca16
+	github.com/multiformats/go-multicodec v0.4.1
+	github.com/multiformats/go-multihash v0.1.0
 	github.com/multiformats/go-varint v0.0.6
-	github.com/rogpeppe/go-internal v1.8.1-0.20210923151022-86f73c517451
+	github.com/rogpeppe/go-internal v1.8.1
 	github.com/urfave/cli/v2 v2.3.0
 )
+
+require (
+	github.com/Stebalien/go-bitfield v0.0.1 // indirect
+	github.com/cpuguy83/go-md2man/v2 v2.0.0-20190314233015-f79a8a8ca69d // indirect
+	github.com/gogo/protobuf v1.3.2 // indirect
+	github.com/google/uuid v1.2.0 // indirect
+	github.com/hashicorp/golang-lru v0.5.4 // indirect
+	github.com/ipfs/bbloom v0.0.4 // indirect
+	github.com/ipfs/go-bitfield v1.0.0 // indirect
+	github.com/ipfs/go-blockservice v0.2.1 // indirect
+	github.com/ipfs/go-datastore v0.5.1 // indirect
+	github.com/ipfs/go-ipfs-chunker v0.0.1 // indirect
+	github.com/ipfs/go-ipfs-ds-help v1.1.0 // indirect
+	github.com/ipfs/go-ipfs-exchange-interface v0.1.0 // indirect
+	github.com/ipfs/go-ipfs-util v0.0.2 // indirect
+	github.com/ipfs/go-ipld-cbor v0.0.5 // indirect
+	github.com/ipfs/go-ipld-format v0.2.0 // indirect
+	github.com/ipfs/go-ipld-legacy v0.1.0 // indirect
+	github.com/ipfs/go-log v1.0.5 // indirect
+	github.com/ipfs/go-log/v2 v2.3.0 // indirect
+	github.com/ipfs/go-merkledag v0.5.1 // indirect
+	github.com/ipfs/go-metrics-interface v0.0.1 // indirect
+	github.com/ipfs/go-verifcid v0.0.1 // indirect
+	github.com/jbenet/goprocess v0.1.4 // indirect
+	github.com/klauspost/cpuid/v2 v2.0.9 // indirect
+	github.com/libp2p/go-buffer-pool v0.0.2 // indirect
+	github.com/mattn/go-isatty v0.0.13 // indirect
+	github.com/minio/blake2b-simd v0.0.0-20160723061019-3f5f724cb5b1 // indirect
+	github.com/minio/sha256-simd v1.0.0 // indirect
+	github.com/mr-tron/base58 v1.2.0 // indirect
+	github.com/multiformats/go-base32 v0.0.3 // indirect
+	github.com/multiformats/go-base36 v0.1.0 // indirect
+	github.com/multiformats/go-multibase v0.0.3 // indirect
+	github.com/opentracing/opentracing-go v1.2.0 // indirect
+	github.com/petar/GoLLRB v0.0.0-20210522233825-ae3b015fd3e9 // indirect
+	github.com/pkg/diff v0.0.0-20210226163009-20ebb0f2a09e // indirect
+	github.com/polydawn/refmt v0.0.0-20201211092308-30ac6d18308e // indirect
+	github.com/russross/blackfriday/v2 v2.0.1 // indirect
+	github.com/shurcooL/sanitized_anchor_name v1.0.0 // indirect
+	github.com/spaolacci/murmur3 v1.1.0 // indirect
+	github.com/whyrusleeping/cbor v0.0.0-20171005072247-63513f603b11 // indirect
+	github.com/whyrusleeping/cbor-gen v0.0.0-20200123233031-1cdf64d27158 // indirect
+	github.com/whyrusleeping/chunker v0.0.0-20181014151217-fe64bd25879f // indirect
+	go.uber.org/atomic v1.7.0 // indirect
+	go.uber.org/multierr v1.7.0 // indirect
+	go.uber.org/zap v1.16.0 // indirect
+	golang.org/x/crypto v0.0.0-20210711020723-a769d52b0f97 // indirect
+	golang.org/x/exp v0.0.0-20210615023648-acb5c1269671 // indirect
+	golang.org/x/sys v0.0.0-20210630005230-0f9fa26af87c // indirect
+	golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 // indirect
+	google.golang.org/protobuf v1.27.1 // indirect
+	gopkg.in/errgo.v2 v2.1.0 // indirect
+	lukechampine.com/blake3 v1.1.6 // indirect
+)
diff --git a/v2/go.mod b/v2/go.mod
index e3b2043..b2686a7 100644
--- a/v2/go.mod
+++ b/v2/go.mod
@@ -1,24 +1,70 @@
 module github.com/ipld/go-car/v2
 
-go 1.16
+go 1.17
 
 require (
 	github.com/ipfs/go-block-format v0.0.3
 	github.com/ipfs/go-cid v0.1.0
-	github.com/ipfs/go-ipfs-blockstore v1.1.2
+	github.com/ipfs/go-ipfs-blockstore v1.2.0
 	github.com/ipfs/go-ipld-cbor v0.0.5
-	github.com/ipfs/go-ipld-format v0.2.0
-	github.com/ipfs/go-merkledag v0.5.1
-	github.com/ipld/go-codec-dagpb v1.3.0
-	github.com/ipld/go-ipld-prime v0.14.0
+	github.com/ipfs/go-ipld-format v0.3.0
+	github.com/ipfs/go-merkledag v0.6.0
+	github.com/ipfs/go-unixfsnode v1.4.0
+	github.com/ipld/go-codec-dagpb v1.3.1
+	github.com/ipld/go-ipld-prime v0.16.0
 	github.com/ipld/go-ipld-prime/storage/bsadapter v0.0.0-20211210234204-ce2a1c70cd73
-	github.com/multiformats/go-multicodec v0.3.1-0.20210902112759-1539a079fd61
+	github.com/multiformats/go-multicodec v0.5.0
 	github.com/multiformats/go-multihash v0.1.0
 	github.com/multiformats/go-varint v0.0.6
 	github.com/petar/GoLLRB v0.0.0-20210522233825-ae3b015fd3e9
 	github.com/stretchr/testify v1.7.0
 	github.com/whyrusleeping/cbor v0.0.0-20171005072247-63513f603b11
-	golang.org/x/crypto v0.0.0-20210711020723-a769d52b0f97 // indirect
 	golang.org/x/exp v0.0.0-20210615023648-acb5c1269671
+)
+
+require (
+	github.com/Stebalien/go-bitfield v0.0.1 // indirect
+	github.com/davecgh/go-spew v1.1.1 // indirect
+	github.com/gogo/protobuf v1.3.2 // indirect
+	github.com/google/uuid v1.2.0 // indirect
+	github.com/hashicorp/golang-lru v0.5.4 // indirect
+	github.com/ipfs/bbloom v0.0.4 // indirect
+	github.com/ipfs/go-bitfield v1.0.0 // indirect
+	github.com/ipfs/go-blockservice v0.3.0 // indirect
+	github.com/ipfs/go-datastore v0.5.0 // indirect
+	github.com/ipfs/go-ipfs-chunker v0.0.1 // indirect
+	github.com/ipfs/go-ipfs-ds-help v1.1.0 // indirect
+	github.com/ipfs/go-ipfs-exchange-interface v0.1.0 // indirect
+	github.com/ipfs/go-ipfs-exchange-offline v0.2.0 // indirect
+	github.com/ipfs/go-ipfs-util v0.0.2 // indirect
+	github.com/ipfs/go-ipld-legacy v0.1.0 // indirect
+	github.com/ipfs/go-log v1.0.5 // indirect
+	github.com/ipfs/go-log/v2 v2.3.0 // indirect
+	github.com/ipfs/go-metrics-interface v0.0.1 // indirect
+	github.com/ipfs/go-verifcid v0.0.1 // indirect
+	github.com/jbenet/goprocess v0.1.4 // indirect
+	github.com/klauspost/cpuid/v2 v2.0.9 // indirect
+	github.com/libp2p/go-buffer-pool v0.0.2 // indirect
+	github.com/mattn/go-isatty v0.0.13 // indirect
+	github.com/minio/blake2b-simd v0.0.0-20160723061019-3f5f724cb5b1 // indirect
+	github.com/minio/sha256-simd v1.0.0 // indirect
+	github.com/mr-tron/base58 v1.2.0 // indirect
+	github.com/multiformats/go-base32 v0.0.3 // indirect
+	github.com/multiformats/go-base36 v0.1.0 // indirect
+	github.com/multiformats/go-multibase v0.0.3 // indirect
+	github.com/opentracing/opentracing-go v1.2.0 // indirect
+	github.com/pmezard/go-difflib v1.0.0 // indirect
+	github.com/polydawn/refmt v0.0.0-20201211092308-30ac6d18308e // indirect
+	github.com/spaolacci/murmur3 v1.1.0 // indirect
+	github.com/whyrusleeping/cbor-gen v0.0.0-20200123233031-1cdf64d27158 // indirect
+	github.com/whyrusleeping/chunker v0.0.0-20181014151217-fe64bd25879f // indirect
+	go.uber.org/atomic v1.7.0 // indirect
+	go.uber.org/multierr v1.7.0 // indirect
+	go.uber.org/zap v1.16.0 // indirect
+	golang.org/x/crypto v0.0.0-20210711020723-a769d52b0f97 // indirect
 	golang.org/x/sys v0.0.0-20210630005230-0f9fa26af87c // indirect
+	golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 // indirect
+	google.golang.org/protobuf v1.27.1 // indirect
+	gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b // indirect
+	lukechampine.com/blake3 v1.1.6 // indirect
 )

gorelease says:

# github.com/ipld/go-car/util
## compatible changes
MaxAllowedSectionSize: added

# diagnostics
required module github.com/microcosm-cc/bluemonday@v1.0.1 retracted by module author: Retract older versions as only latest is to be depended upon

# summary
Suggested version: v0.4.0

gocompat says:

(empty)

@masih masih merged commit eac6157 into master Jul 6, 2022
@masih masih deleted the security-fixes branch July 6, 2022 09:21
@willscott
Copy link
Member

retroactive 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants