Content-Addressable aRchives (CAR)
date | 2025-01-17 |
---|---|
editors | Robin Berjon <robin@berjon.com> Juan Caballero <bumblefudge@learningproof.xyz> |
issues | list, new |
abstract | The CAR format offers a serialized representation of set of content-addressed resources in one single concatenated stream, alongside a header that describes that content. |
Introduction
The CAR format (Content Addressable aRchives) is used to store series of content-addressable objects as a sequence of bytes. It packages that stream of objects with a header.
Much of the content of this specification was initially developed as part of the IPLD project. This specification was developed based on demand from the community to have just the one simplified document. Note that a CARv2 specification was developed at some point to add support for an index trailer, but it met with limited adoption and so was not considered when bringing CAR into DASL.
Parsing CAR
The CAR format comprises a sequence of length-prefixed block data, where the first block in the CAR is the Header encoded as dCBOR42, and the remaining blocks form the Data component of the CAR and are each additionally prefixed with their CIDs ([dcbor42], [cid]). The length prefix of each block in a CAR is encoded as an unsigned variable-length integer LEB128 integer ([leb128]). This integer specifies the number of remaining bytes for that block entry, excluding the bytes used to encode the integer, but including the CID for non-header blocks.
|------- Header -------| |------------------- Data -------------------|
[ int | DAG-CBOR block ] [ int | CID | block ] [ int | CID | block ] …
The steps to parse a CAR are:
- Accept a byte stream bytes that is consumed with every step that reads from it.
- Run the steps to parse a CAR header with bytes to obtain version and roots.
-
Set up array blocks and run these substeps:
- If bytes is empty, terminate these substeps.
- Run the steps to parse a CAR block header with bytes to obtain cid and block size.
- Read block size bytes from bytes and store the result in block.
- Push an entry onto blocks containing cid, block size, and block.
- Return to the beginning of these substeps.
- Return version, roots, and blocks.
The CAR header encodes both a version
, which is always 1, and
an array of roots
, which is a list of CIDs. A CAR can be used
to contain one or more DAGs of dCBOR42 content and the purpose of the
roots
is to list one or more roots for those DAGs. The array
may be empty if you do not care about encoding DAGs.
NOTE: Some implementations expect there to always be at
least one root. If you do not wish to indicate a root but have to
interoperate with those implementations, you can always use the empty
DASL CID \x01\x55\x12\x00
instead.
The steps to parse a CAR header are:
- Accept a byte stream bytes.
- Read an LEB128 length from bytes.
- If length is 0, throw an error.
- Read length bytes from bytes and decode them as dCBOR42 ([dcbor42]) into object. If object is not a map, throw an error.
-
If object does not have a
version
key entry with integer value1
, throw an error. Otherwise, storeversion
in version. -
If object does not have a
roots
key entry that is an array, or if that array contains anything other than DASL CIDs, throw an error. Otherwise, storeroots
in roots. - Return version and roots.
After its header, CAR contains a series of blocks each of which is prefixed with a small header of its own capturing the block's size and CID.
The steps to parse a CAR block header are:
- Accept a byte stream bytes.
- Read an LEB128 length from bytes.
- If length is 0, throw an error.
-
Read a CID ([cid]) from bytes and store it in cid.
Note: the length of the CID can be inferred by reading its metadata
step by step until the
hash size
part, which is then used to consume that many bytes from bytes. - Set CID length to the number of bytes that were required to read the CID.
- Set block size to length minus CID length.
- Return block size and cid.
Additional Considerations
Conformance
A CAR stream must only feature DASL CIDs.
A CAR stream must have CIDs that match the block that follows them. A CAR implementation should verify that CIDs match blocks, though it may delegate verification to other components. (Keep in mind that not verifying at all negates the value of content addressing.)
A CAR stream's stated roots must match CIDs contained in the data. However, implementations frequently operate in a streaming fashion such that they have no way of knowing whether a CAR stream conforms to this requirement before having processed the entire stream. Checking correctness with respect to this requirement may therefore be more readily performed via a warning (at end of processing) or a dedicated validator.
Determinism
Deterministic CAR creation is not covered by this specification. However, deterministic generation of a CAR from a given graph is possible and is relied upon by certain uses of the format, most notably, Filecoin. dCAR may be the topic of a future specification.
Care regarding the ordering of the roots
array in the Header and avoidance
of duplicate blocks may also be required for strict determinism.
Security & Verifiability
The roots specified by the Header of a CAR should appear somewhere in its Data section, however there is no requirement that the roots define entire DAGs, nor that all blocks in a CAR must be part of DAGs described by the root CIDs in the Header. Therefore, the roots must not be used alone to determine or differentiate the contents of a CAR.
The CAR format contains no internal means, beyond the blocks and their CIDs, to verify or differentiate contents. Where such a requirement exists, this must be performed externally, such as creating a digest of the entire CAR (and refer to it using a CID).
Appendix: Media Type
The media type for CAR is
application/vnd.ipld.car
.
The conventional file extension for CAR is .car
.
References
- [cid]
- Robin Berjon & Juan Caballero. Content IDs (CIDs). 2025-01-17. URL: https://dasl.ing/cid.html
- [dcbor42]
- Robin Berjon & Juan Caballero. Deterministic CBOR with tag 42 (dCBOR42). 2025-01-17. URL: https://dasl.ing/dcbor42.html
- [leb128]
- Wikipedia. LEB128. Retrieved December 2024. URL: https://en.wikipedia.org/wiki/LEB128