car: CARv2 spec #248

rvagg · 2020-03-10T03:59:42Z

WIP for discussion.

Highlights:

The v1 "header" block becomes a "version" block in v2, a CBOR encoded {version:2}—we can exact match the first 11 bytes to determine if it's a v2 (0x0aa16776657273696f6e02—essentially our CARv2 file magic number), otherwise fallback to decoding as per v1. CARv1 code should read this block and bork on version:2.
New "header" block is the first block after the version block, properly CID-prefixed, contains:
- a deterministic flag
- an array of CID + selector pairs

Deterministic CARs have to exhaustively describe the contained DAGs via the roots + selectors (the selectors can describe boundaries of partial DAGs but they must do so exhaustively). Therefore the CID of the header block for a deterministic CAR should uniquely identify that graphs it contains.

Non-deterministic CARs are lax, anywhere from the deterministic exhaustiveness to just-a-bundle-of-blocks with no roots. Where roots are present, selectors are optional.

A CARv1 would be read as a non-deterministic CARv2 with absent selectors.

vmx · 2020-03-10T12:22:08Z

I had a quick read and not really something to add as I'm not that deep into CAR files. Though it's sad to see that we can't do a [ varint | CID | dag-cbor block ] * that also includes the header. due to not being able to distinguish Version 1.

mikeal · 2020-03-10T19:53:36Z

This is a good way to approach the initial discussion, but I think that the next version of the spec should probably be either a separate file or a separate sub-section so that we can preserve the original spec. (I read the diff wrong, reading the rendered new file is much clearer)

mikeal · 2020-03-10T19:56:20Z

Do we want to try adding the index as a trailer?

warpfork · 2020-03-11T12:03:07Z

block-layer/content-addressable-archives.md

+ * A compressed integer
+ * A "version" block of byte-length described by the preceding compressed integer
+ * A series of CID-prefixed blocks:
+   * A compressed integer prefixing each block indicating the total byte-length of that block, including the length of the encoded CID


I wonder if it would be useful to have an int for the CID length and an int for the content data length, separately.

Figuring out how long a CID is... is a surprisingly nontrivial operation. https://github.com/ipfs/go-cid/pull/95/files recently added support for this, but involves at least three different varint parses (spread across more than just that one repo!) -- and in the meanwhile, you're passing around an overly large byte slice. (How much too large? I dunno. I guess you just take some bytes and wing it...?)

So, considering all that... yeah: I'd advocate for an int to state the CID size separately. While technically, sure, it's "defined" to just put the CID at the front of the content bytes and just report the total size... a separate size marker for CID and data separately seems advantageous: it's simpler, it's more antifragile, and it might even result in more efficient processing since it'll let code spend more time operating with known amounts of bytes.

@warpfork At that stage isn't it simpler to just have proper CBOR:

type DataFrame struct { Cid bytes Data bytes }

I haven’t seen a CBOR parser decode this as fast as we could parse the int based encoding so I’m worried about moving in that direction.

I do like the idea of separating the int lengths. Right now you’ve basically gotta parse the entire CID in order to work through the file. If you have a specific set of CID’s you’re scanning for you could do much faster binary comparison as you scan through the file and avoid full CID parsing.

Thinking this through a bit more combining with #248 (comment) :

Having v2 being a uniform stream of {cidlen}{cid}{payloadlen}{payload} where {cid} ( position 2 ) has a special codec signifying car header as opposed to just dag-cbor would solve all your symmetry issues and be super-fast to check for well-formedness...

ribasushi

Overall decent direction, I like it. Left a number of comments, mostly not being able to deduce a specifc answer from the spec alone ( which I suppose merits fixing... )

A meta-thought I want to surface: the spec is just that - a spec. If we cite limitations in the spec, and then do not implement any of them in the actual parsers, the limitations are nul and void. That is to say - I would be wary labeling too many things as invalid, because for each such case you need to implement the corresponding err("NOPE!") in the corresponding libraries...

If something is icky but harmless - let's just not mention it and move on.

ribasushi · 2020-03-11T13:10:49Z

block-layer/content-addressable-archives.md

 * **Individual block reads**: as the CAR format contains no index information, reads require either a partial scan to discover the location of a required block or an external index must be maintained and referenced for a seek and partial read of that data. See below regarding indexing.
 * **DAG traversal**: without an external index, traversal of a DAG specified by a "root" CID is not possible without dumping all blocks into a more convenient data store or by partial scans to find each block as required, which will likely be too inefficient to be practical.
 * **Modification**: CARs may be appended after initial write as there is no constraint in the Header regarding total length. Care must be taken in appending if a CAR is intended to contain coherent DAG data.

 ### Security and Verifiability

-The roots specified by the Header of a CAR must appear somewhere in its Data section, however there is no requirement that the roots define entire DAGs, nor that all blocks in a CAR must be part of DAGs described by the root CIDs in the Header. Therefore, the roots must not be used alone to determine or differentiate the contents of a CAR.
+A deterministic CAR should be uniquely describable by the CID of its Header block. Two CARs with the same Header CID with a `deterministic` value of `true` should be comprised of identical bytes when formatted according to this specification.


What does this mean in practice? Depth-first, output block on "first-seen"? Or "it depends"? If the latter: we need a more nuanced deterministic marker aside from "true/false"...

We only need to cover these details as they relate to the header. If the header is a collection of [root, selector] pairings, then the deterministic ordering is actually defined by the selector spec.

ribasushi · 2020-03-11T13:16:27Z

block-layer/content-addressable-archives.md

+ * A compressed integer
+ * A "version" block of byte-length described by the preceding compressed integer
+ * A series of CID-prefixed blocks:
+   * A compressed integer prefixing each block indicating the total byte-length of that block, including the length of the encoded CID


@warpfork At that stage isn't it simpler to just have proper CBOR:

type DataFrame struct { Cid bytes Data bytes }

ribasushi · 2020-03-11T13:19:03Z

block-layer/content-addressable-archives.md

+[ varint | cbor block ] [ varint | CID | dag-cbor block ] [ varint | CID | block ] [ varint | CID | block ] …
+```
+
+To maintain backward compatibility, Version 2 retains the lead-in block without CID but _only_ includes a CBOR encoded map containing a single key/value pair: `{ version: 2 }`. Therefore, a decoder may read the first 11 bytes of a CAR and match it against the bytes: `0x0aa16776657273696f6e02` (`0x0a` is a varint representing `10`, followed by 10 bytes of the CBOR encoded form of `{ version: 2 }`).


The need for a leading Version seems like too much magic... Isn't a try-parse-as-v2 or try-parse-as-v1 sufficient here? This also addresses @vmx's ick-factor

I think this was actually my idea so I’ll defend it.

One way we could maintain backwards compatibility is to keep the old v1 style header but only ever encode it as { version: 2 }. To the old v1 parser this looks like a CAR file with no roots. For a v2 parser you don’t need to even parse it, you can check if it’s a v2 CAR file by doing a simple binary comparison.

Step a little further back though: if we accept that v1 files "in the wild" all start with 0x0aa16776657273696f6e02, then the only concern you have is "what will a v1 ( old) parser do when it encounters a ver != 1 file. We want it to croak. Having v2 being a uniform stream of {cidlen}{cid}{payloadlen}{payload} should not possibly be decodeable as valid cbor, hence "problem solved"? Or you are concerned about the possibility of crafting a valid v2 file that "means something" to a v1 parser...?

We want it to croak

There are many different forms of “croak” though 😉

Being unable to even read the header is likely to cause an exception. Reading the header, seeing that the version is beyond the available range, and not having any root information to use in order to parse the file, is a much less aggressive form of croaking. It’s likely to cause the CAR file to simply be ignored rather than causing a noticeable error.

Now, this may not be the desired behavior. We may want a stronger form of croaking, we may want to see clear exceptions, but I’m not sure we do and I’m a little worried about code in the Filecoin ecosystem encountering new CAR files in the future and what the failure mode looks like.

Being unable to even read the header is likely to cause an exception.

Only in clients linked against a sufficiently old go-car/js-car not knowing how to read v != 1

Or are you saying that at this point there is likely a significant number of implementations out there that wrote their own car handlers not using any of our libraries?

I have a high degree of certainty that, by the time we ship this, there will be car handlers we didn’t write in use. For one thing, we don’t have a Rust CAR implementation and there’s Rust stuff happening in the ecosystem.

How well they are written and if they can handle something like this, I have no idea.

I see...
So perhaps we should pivot and ship v2 with only the following changes:

make versioning forwards compatible

potentially fix the cid-length specification / symmetry of data-frames questions

And leave everything else for v3?

I'm not sure we're willing to drop selectors from scope, are we? We already have Go code in go-car that's passing in a Root+Selector (https://github.com/ipld/go-car/blob/master/selectivecar.go#L21-L24) and presumably corresponding code in go-fil-markets that's working this way, so the data is there and we could get ontop of this pretty quick.

ribasushi · 2020-03-11T13:48:17Z

block-layer/content-addressable-archives.md


 ### Data

-Immediately following the Header block, **one or more** IPLD blocks are concatenated to form the _Data_ section of the CAR format. _(Caveat: see [Zero blocks](#zero-blocks) under Unresolved Issues.)_ Each block is encoded into a _Section_ by the concatenation of the following values:
+Immediately following the Header block, **one or more** IPLD blocks are concatenated to form the _Data_ section of the CAR format. It is invalid for a CAR to contain zero blocks after the Header. Each block is encoded into a _Section_ by the concatenation of the following values:


This seems a bit wasteful in the case of non-deterministic files. I have a use-case where multiple car files are given and the second is just a "pinset" ( roots have already been seen in a previous car ). Something like:
( almost done writing up, should have it working correctly by early Thursday, handwavey CLI args ) :
ipfs dag import $( ipld-dagger --stdout=car-fifos ) where the inner invocation returns 2 named pipes, the first being an unordered car stream, the second being just the roots to pin, together being governed by one "suspended GC / pin-pass transaction".

Yes, it doesn't cost too much to just keep around the block-contents of each root, and re-emit them at the end again, but what would be the point? The second car is still incomplete either way...

Additionally keep in mind that car files will be used for patching in the future across the ecosystem, and it is conceivable that a "rollback" will contain no data at all, just the "previous root".

For this use case, I think we should have a header that doesn’t specify any roots, and we should figure out who/what this might break and how to mitigate.

I know that there are parts of the Filecoin ecosystem that assume there is always a single root and will break without it, so one tradeoff of this might be that those CAR files won’t work at all in Filecoin.

there are parts of the Filecoin ecosystem

My understanding was that the FIL ecosystem only deals in "deterministic cars" and anything else is virtually unsupported. Whereas .car as a concept within ipfs & friends ecosystems is way less strict, and is essentially a "better block put interface". Is my approach to this wrong...?

That’s not entirely accurate.

Filecoin clients that use the regular deal flow only use deterministic car files. The offline dealflow, or really any deal that is added to the network without our online dealflow code, supports any CAR file. The retrieval market does not require deterministic CAR files in order to read out data. It does require that there is a single root though.

I was also told that there are other actors in the Filecoin ecosystem reading CAR files that also assume there is always a single root and that multiple roots, in addition to no roots, would be problematic. In this case, I’m talking about developers outside of Protocol Labs building applications and tooling on top of the network.

ribasushi · 2020-03-11T13:55:41Z

block-layer/content-addressable-archives.md


 ### Performance

 Some considerations regarding performance:

-* **Streaming**: the CAR format is ideal for dumping blocks via streaming reads as the Header can be loaded first and minimal state is required for ongoing parsing.
+* **Streaming**: the CAR format is ideal for dumping blocks via streaming reads as the Header can be loaded first and minimal state is required for ongoing parsing. Describing partial graphs with a deterministic CAR may be more difficult as a streaming operation as the selector boundaries may not be known ahead of time.


This seems mis-matched dumping ( implies export from a stateful application into a static .car ) vs streaming reads ( implies reading a static .car ). Also I am still fuzzy on how verification can be guaranteed on a non-exhaustive selector... Anyway: perhaps:

Suggested change

* **Streaming**: the CAR format is ideal for dumping blocks via streaming reads as the Header can be loaded first and minimal state is required for ongoing parsing. Describing partial graphs with a deterministic CAR may be more difficult as a streaming operation as the selector boundaries may not be known ahead of time.

* **Streaming**: the **deterministic** CAR format is ideal for ingesting blocks via streaming reads as the Header can be loaded first and minimal state is required for ongoing parsing. Describing partial graphs with a deterministic CAR may be more difficult as a streaming operation as the selector boundaries may not be known ahead of time.

ribasushi · 2020-03-11T13:58:29Z

block-layer/content-addressable-archives.md


-Deterministic CAR creation is not covered by this specification. However, deterministic generation of a CAR from a given graph is possible and is relied upon by certain uses of the format, most notably, [Filecoin](https://filecoin-project.github.io/specs).
+Where a `deterministic` value of `true` is present in a CAR, the header is able to describe the entirety of the graph represented by the trailing blocks in the _Data_ section. Such a CAR may be reproduced, byte-for-byte, from a block source using the same roots and selectors.


This is insufficiently precise. If I give you a selector saying "root + next level of nodes only" - the car files will not have any leaves, thus you do not describe the entire graph, nor can you fully cryptographically verify such a .car file offline... Or perhaps I am confusing the level of determinism we spoke about on Monday?

"entirety of the graph" is probably in the wrong here

able to describe the entirety of a graph from the roots, limited by the selectors

I don't think this makes any claims about being able to verify it offline. You could do that if you had a * selector but I don't know if there's any way to do that with more advanced selectors. Maybe I should just add text to that effect:

The completeness of such a graph may not be able to be verified without the source graph in the case of a restrictive selector

mikeal · 2020-03-11T17:49:22Z

block-layer/content-addressable-archives.md

+Described as an IPLD Schema:
+
+```ipldsch
+type CarRoot struct {


One thing I want to explore is making the entire CAR header block a completely valid Selector (or perhaps we need to define some notion of a “multi-selector”).

It would be pretty amazing to be able to take a selector block, drop it into a selector engine, and spit out a CAR file with a CID for the header block that matches the selector you passed.

This may not be worth it. For instance, the deterministic boolean would become problematic as it’s unlikely to be there if you just create a selector block.

that would require a selector to ship with a CID embedded in it wouldn't it? I don't think we have that kind of structure in selectors as they are today (afaik!)

rvagg · 2020-03-20T00:54:00Z

@ribasushi

At that stage isn't it simpler to just have proper CBOR:
type DataFrame struct {
  Cid bytes
  Data bytes
}

This is an interesting proposal. There's two paths here -- cbor or dag-cbor. In both cases you you'd end up replacing varints with CBOR's ints describing lengths, prefixed with type data. If we used plain cbor we'd need to specify a smallest-representation rule (as per dag-cbor #236).

For plain cbor you'd only add an extra byte for an average CID (more for long hashes) over using a varint. For dag-cbor you'd get an additional 3 bytes to properly represent it (tag, plus length, plus weird 0x0 lead byte). The data block would depend on how much data there is but I'd expect that you'd only be adding 1 or possibly 2 bytes for average block sizes.

A benefit is that you simplify the decode for new languages. They just need a cbor parser. Another benefit is that we can lean on our existing materials (like schemas) making this more part of the IPLD family in a doc & spec sense rather than such a special snowflake.

The complication for efficient implementations is that you'd want to do this streaming and partial so we'd end up rewriting the basics of major type 2 cbor parsing, which, tbh, isn't too much beyond what we do now, but at least we have easy varint libraries to lean on.

Edit: will also require a type lead-in, list or map, list being the most efficient, so representation tuple but that's an extra byte at the front of each "frame".

warpfork · 2021-02-22T18:17:02Z

Relatedly: there's now also work on a new format called "dar", which I think had a lot of the same concerns as discussed here: https://github.com/anorth/go-dar

mikeal · 2021-02-22T19:24:01Z

I’m still considering all the details, but my initial reaction to DAR is that it’s a better approach. In trying to iterate on the existing CAR format we don’t have a lot of opportunities for efficiency gains.

DAR seems to do a nice job of introducing constraints we don’t have in CAR that then provide space savings and assurances while also adding in a few useful features.

Keeping in mind that we already have the existing CAR format, so use cases that cannot live within the constraints of DAR will still have a format they can rely, I think we may want to put off iterating on CAR in favor of DAR or something similar.

mikeal · 2021-02-22T19:27:59Z

I’ll also add that, given how much we’ve already adopted the existing CAR format w/ recommended constraints (single root only) the utility of an iteration that ads the complexity of detecting and potentially supporting two versions of the CAR format may be worse than just adding an entirely new format. It don’t think we gain anything by having two version of CAR instead of two separate but complimentary formats.

warpfork · 2021-04-11T12:15:14Z

block-layer/content-addressable-archives.md

-All such considerations are deferred to the user of the CAR format and should be documented there as this specification does not inherently support determinism.
+Duplicate blocks are strictly not valid in a deterministic CAR. The first instance of a block discovered by any root + selector combination will be present, all repeated occurances of the same block encountered during further graph traversal are omitted.
+
+### Additional Considerations


I think this may have meant to be an h2?

(I like using the two-line syntax variant for these because it's much more visually distinctive and easier to catch myself in typos like this, fwiw 🙃 )

warpfork · 2021-04-11T12:27:17Z

What can we do with this PR?

My concerns and goals today would be:

get the open PR count in this repo down
document whatever we learned -- by whatever means necessary
bonus/stretch goal: slowly anneal towards understanding what's up in the discussion of {car} vs {carv2} vs {carbs} vs {dar}.
- (figuring out how to land this PR should probably not get hung up on this though; bigger topic / multiple topics.)

I'm not actually sure how to wrangle this one, at a glance. First of all -- are we still actively working on it? (It's not that stale; I'm just on a close-a-thon today.)

Here are some options for moving forward I can think of (but maybe there are also other better ideas? I'm receptive):

Could we extract this content into an exploration report, and merge it thusly?
- (always an options, but might not be fun in this case: there's a good amount of content here, and it's intermixed with other existing content, so this suggests a lot of rewriting.)
Should we merge it almost entirely as-is, but maybe add some disclaimers about how widely implemented the v2 is as of today?
- (I'm not actually sure how widely implemented the v2 is! So someone else's input here will be critical :))
Is this a candidate for the "merge-ignore" change management strategy -- where we "merge" the commits (so they're referencable in git history) but don't actually apply the diffs to the content as it appears in master?
- (this is a very non-committal approach. Doesn't mean we can't resume the work later; if we do, we just cherry-pick the content and keep going. It just results in fewer open branches in the meanwhile.)
... or is this actually still actively in flight, and I should hush myself on pushing for merge?

rvagg · 2021-04-21T07:03:06Z

Not being actively worked on, should probably not be merged, this really is more like an exploration report than anything else. Focus has moved on to ways to augment CARv1 (CARBS, etc.) and entirely different approaches to a CARv2 (i.e. DAR). Nobody is actively exploring improving the existing format in-place and I don't see work on the horizon. Although this PR and discussion does contain some potentially useful tidbits for any future work to learn from.

rvagg · 2021-05-06T00:35:34Z

I just wrote this elsewhere and wanted to record it somewhere more permanent, it clarifies the versioning statements in OP:

The problem we have with iterating on versions and doing any kind of successful differentiation from CARv1 is the instability of the leading bytes. The header in CARv1 is: [varint][DAG-CBOR({roots:[CID,...],version:1})]. And that initial varint is not reliable in any way except that it's a varint. The DAG-CBOR bit following it is a bit more reliable because it starts with something like: MAP(2){STRING(5,"roots"),ARRAY(... and it breaks down from there. So that gives you some form of magic bytes that you'll most likely see, 0xa265726f6f7473 but that's after the varint.
The risk that this presents is that if you want to have clear differentiation then your leading magic bytes need to be clearly distinguishable from anything that suggests CARv1.

In this PR I suggested just making the magic bytes 0x0aa16776657273696f6e02 because it's [varint(10)][DAG-CBOR({version:2})] and the existing CAR readers can read that and bork at "oh, this is version 2, I don't know how to deal with this (it also has missing roots, but we could add in an empty roots array to be safer for decoders that assume it's there). Then version detection is much more straightforward, you could read the first 11 bytes and very safely determine whether you have a v2, then fall back to v1 parsing. What's more, there's a straightforward path to v3, it's just 0x0aa16776657273696f6e03, so just that 11th byte is different, nice and stable and CARv1 decoders will error meaningfully ("Don't know how to decode version 3").

0x0aa16776657273696f6e02 is first 0x0a which is just uint(10) because varints this small are just ints. Then followed by 10 bytes of CBOR:

a1                                                # map(1)
  67                                              #   string(7)
    76657273696f6e                                #     "version"
  02                                              #   uint(2)

Of course we could just abandon version sniffing from leading-bytes, but that's not very multiformats is it!

/cc @raulk

Define the magic CAR v2 prefix proposed in ipld/specs#248 and assert that it remains a valid CAR v1 header, along with other tests that verify the expected length, content and version number. Define basic structs to represent CAR v2 header, along with placeholder for the future "characteristics" bitfield, and the optional padding between CAR v1 dump and index added to the end of a CAR v2 archive. Relates to: - ipld/specs#248 (comment)

raulk · 2021-06-14T10:50:12Z

@rvagg Makes sense to me. Question:

Is it possible for CARv2 and subsequent versions to extend that initial header structure for additional purposes, as long as they preserve the version field. Or are we constraining this initial header and convicting it to ever only being used as a version discriminator, leaving it up to the version itself to specify the header?
- e.g. CARv2 introduces the concept of "characteristics" that express traits about the CAR and the DAG, e.g. whether an index follows, whether the DAG is order depth-first, whether it's deduplicated, etc.

willscott · 2021-06-14T12:00:31Z

My understanding is that the initial magic bytes is a signal of 'this is not a carv1' (and is in fact carv2) - and indicates to expect a carv2 metadata header with the characteristics to follow. If we end up wanting features that need to change that second header in a non-compatible way, we would consider changing the magic bytes at the beginning to indicate we're now on carv3

raulk · 2021-06-14T12:26:33Z

Understood. Comparing the 0x00 approach with the magic number, what the latter buys is a nicely readable error message at the expense of 10 extra bytes, right?

willscott · 2021-06-14T19:15:09Z

Understood. Comparing the 0x00 approach with the magic number, what the latter buys is a nicely readable error message at the expense of 10 extra bytes, right?

and a static 8 byte magic header for mime type detection.

rvagg · 2021-06-15T02:41:57Z

It might be best if we just view the CARv1 header as a minor mistake in that it didn't lead with a version (like we do with everything else multiformats style) but baked it into the block itself, which leaves little wriggle room for changing it around. Sorting order in DAG-CBOR gives us version coming last in most cases unless we want to have longer property names. We could break that rule for this to make version sniffing easier, but it all gets very messy very quickly. So if we just treat <varint length>dag-cbor({version:X}) as our multiformats-style leading magic bytes then we get to do whatever we want after that and can break from version to version (e.g. version 3 could be 0x0aa16776657273696f6e03 etc.). It's an unfortunately long series of magic bytes but we won't be alone and maybe we can be OK with that and write it off as an accident of history?

mikeal · 2021-06-15T19:23:20Z

We’ve considered the header a mistake for some time.

If we’re going to break backwards compat on the header then we could just drop the header and make the format begin with a registered multicodec identifier followed by the cid + block data for the “new header” format. That would be “better” in every respect other than backwards compatibility.

willscott · 2021-06-15T19:30:32Z

We’ve considered the header a mistake for some time.

If we’re going to break backwards compat on the header then we could just drop the header and make the format begin with a registered multicodec identifier followed by the cid + block data for the “new header” format. That would be “better” in every respect other than backwards compatibility.

I think we end up with much of the same complexity that we've spent the last month hashing out here. I'm hesitant to go back to redesign now since that was considered as an option previously, but rejected when we decided to do the least work we could to embed the carv1 as-is with an attached index.

We've gotten to a fairly straight forward fixed-length header prepended to a carv1. We get versioning. we can signal that an index is appended. that's what we need now. The goal was not to envision the format we actually want, which we've explicitly pushed back to carv3.

car: CARv2 spec

75fb268

rvagg requested review from mikeal, vmx, Stebalien, ribasushi, warpfork, whyrusleeping and hannahhoward March 10, 2020 03:59

warpfork reviewed Mar 11, 2020

View reviewed changes

ribasushi reviewed Mar 11, 2020

View reviewed changes

mikeal reviewed Mar 11, 2020

View reviewed changes

fixup! car: CARv2 spec

bc9a4ea

rvagg mentioned this pull request Nov 19, 2020

Make ipfs.dag.export built-in feature of HTTP gateways ipfs/in-web-browsers#170

Closed

anorth mentioned this pull request Apr 6, 2021

IPLD support on Gateways ipfs/in-web-browsers#182

Open

warpfork reviewed Apr 11, 2021

View reviewed changes

This was referenced Jun 3, 2021

fix: check header object and reject invalid headers & versions ipld/js-car#28

Merged

Header error tests ipld/go-car#76

Merged

This was referenced Jun 9, 2021

Implement a draft of CAR v2 writer ipld/go-car#78

Closed

Define CAR v2 fixed prefix and header ipld/go-car#80

Merged

rvagg mentioned this pull request Jun 22, 2021

CARv2 Specification ipld/ipld#107

Merged

rvagg closed this Jun 24, 2021

	* Streaming: the CAR format is ideal for dumping blocks via streaming reads as the Header can be loaded first and minimal state is required for ongoing parsing. Describing partial graphs with a deterministic CAR may be more difficult as a streaming operation as the selector boundaries may not be known ahead of time.
	* Streaming: the deterministic CAR format is ideal for ingesting blocks via streaming reads as the Header can be loaded first and minimal state is required for ongoing parsing. Describing partial graphs with a deterministic CAR may be more difficult as a streaming operation as the selector boundaries may not be known ahead of time.


		Deterministic CAR creation is not covered by this specification. However, deterministic generation of a CAR from a given graph is possible and is relied upon by certain uses of the format, most notably, [Filecoin](https://filecoin-project.github.io/specs).
		Where a `deterministic` value of `true` is present in a CAR, the header is able to describe the entirety of the graph represented by the trailing blocks in the _Data_ section. Such a CAR may be reproduced, byte-for-byte, from a block source using the same roots and selectors.

car: CARv2 spec #248

car: CARv2 spec #248

Conversation

rvagg commented Mar 10, 2020

vmx commented Mar 10, 2020

mikeal commented Mar 10, 2020 • edited Loading

mikeal commented Mar 10, 2020

warpfork Mar 11, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikeal Mar 11, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ribasushi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikeal Mar 11, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ribasushi Mar 11, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ribasushi Mar 11, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikeal Mar 11, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rvagg commented Mar 20, 2020 • edited Loading

warpfork commented Feb 22, 2021

mikeal commented Feb 22, 2021

mikeal commented Feb 22, 2021

Choose a reason for hiding this comment

warpfork commented Apr 11, 2021

rvagg commented Apr 21, 2021

rvagg commented May 6, 2021 • edited Loading

raulk commented Jun 14, 2021

willscott commented Jun 14, 2021

raulk commented Jun 14, 2021 • edited Loading

willscott commented Jun 14, 2021

rvagg commented Jun 15, 2021

mikeal commented Jun 15, 2021 • edited Loading

willscott commented Jun 15, 2021

mikeal commented Mar 10, 2020 •

edited

Loading

warpfork Mar 11, 2020 •

edited

Loading

mikeal Mar 11, 2020 •

edited

Loading

mikeal Mar 11, 2020 •

edited

Loading

ribasushi Mar 11, 2020 •

edited

Loading

ribasushi Mar 11, 2020 •

edited

Loading

mikeal Mar 11, 2020 •

edited

Loading

rvagg commented Mar 20, 2020 •

edited

Loading

rvagg commented May 6, 2021 •

edited

Loading

raulk commented Jun 14, 2021 •

edited

Loading

mikeal commented Jun 15, 2021 •

edited

Loading