Content-Addressable aRchives (CAR)

date2025-01-17
editorsRobin Berjon <robin@berjon.com>
Juan Caballero <bumblefudge@learningproof.xyz>
issueslist, new
abstract

The CAR format offers a serialized representation of set of content-addressed resources in one single concatenated stream, alongside a header that describes that content.

Introduction

The CAR format (Content Addressable aRchives) is used to store series of content-addressable objects as a sequence of bytes. It packages that stream of objects with a header.

Much of the content of this specification was initially developed as part of the IPLD project. This specification was developed based on demand from the community to have just the one simplified document. Note that a CARv2 specification was developed at some point to add support for an index trailer, but it met with limited adoption and so was not considered when bringing CAR into DASL.

Parsing CAR

The CAR format comprises a sequence of length-prefixed block data, where the first block in the CAR is the Header encoded as dCBOR42, and the remaining blocks form the Data component of the CAR and are each additionally prefixed with their CIDs ([dcbor42], [cid]). The length prefix of each block in a CAR is encoded as an unsigned variable-length integer LEB128 integer ([leb128]). This integer specifies the number of remaining bytes for that block entry, excluding the bytes used to encode the integer, but including the CID for non-header blocks.

|------- Header -------| |------------------- Data -------------------|
[ int | DAG-CBOR block ] [ int | CID | block ] [ int | CID | block ] …
      

The steps to parse a CAR are:

  1. Accept a byte stream bytes that is consumed with every step that reads from it.
  2. Run the steps to parse a CAR header with bytes to obtain version and roots.
  3. Set up array blocks and run these substeps:
    1. If bytes is empty, terminate these substeps.
    2. Run the steps to parse a CAR block header with bytes to obtain cid and block size.
    3. Read block size bytes from bytes and store the result in block.
    4. Push an entry onto blocks containing cid, block size, and block.
    5. Return to the beginning of these substeps.
  4. Return version, roots, and blocks.

The CAR header encodes both a version, which is always 1, and an array of roots, which is a list of CIDs. A CAR can be used to contain one or more DAGs of dCBOR42 content and the purpose of the roots is to list one or more roots for those DAGs. The array may be empty if you do not care about encoding DAGs.

NOTE: Some implementations expect there to always be at least one root. If you do not wish to indicate a root but have to interoperate with those implementations, you can always use the empty DASL CID \x01\x55\x12\x00 instead.

The steps to parse a CAR header are:

  1. Accept a byte stream bytes.
  2. Read an LEB128 length from bytes.
  3. If length is 0, throw an error.
  4. Read length bytes from bytes and decode them as dCBOR42 ([dcbor42]) into object. If object is not a map, throw an error.
  5. If object does not have a version key entry with integer value 1, throw an error. Otherwise, store version in version.
  6. If object does not have a roots key entry that is an array, or if that array contains anything other than DASL CIDs, throw an error. Otherwise, store roots in roots.
  7. Return version and roots.

After its header, CAR contains a series of blocks each of which is prefixed with a small header of its own capturing the block's size and CID.

The steps to parse a CAR block header are:

  1. Accept a byte stream bytes.
  2. Read an LEB128 length from bytes.
  3. If length is 0, throw an error.
  4. Read a CID ([cid]) from bytes and store it in cid. Note: the length of the CID can be inferred by reading its metadata step by step until the hash size part, which is then used to consume that many bytes from bytes.
  5. Set CID length to the number of bytes that were required to read the CID.
  6. Set block size to length minus CID length.
  7. Return block size and cid.

Additional Considerations

Conformance

A CAR stream must only feature DASL CIDs.

A CAR stream must have CIDs that match the block that follows them. A CAR implementation should verify that CIDs match blocks, though it may delegate verification to other components. (Keep in mind that not verifying at all negates the value of content addressing.)

A CAR stream's stated roots must match CIDs contained in the data. However, implementations frequently operate in a streaming fashion such that they have no way of knowing whether a CAR stream conforms to this requirement before having processed the entire stream. Checking correctness with respect to this requirement may therefore be more readily performed via a warning (at end of processing) or a dedicated validator.

Determinism

Deterministic CAR creation is not covered by this specification. However, deterministic generation of a CAR from a given graph is possible and is relied upon by certain uses of the format, most notably, Filecoin. dCAR may be the topic of a future specification.

Care regarding the ordering of the roots array in the Header and avoidance of duplicate blocks may also be required for strict determinism.

Security & Verifiability

The roots specified by the Header of a CAR should appear somewhere in its Data section, however there is no requirement that the roots define entire DAGs, nor that all blocks in a CAR must be part of DAGs described by the root CIDs in the Header. Therefore, the roots must not be used alone to determine or differentiate the contents of a CAR.

The CAR format contains no internal means, beyond the blocks and their CIDs, to verify or differentiate contents. Where such a requirement exists, this must be performed externally, such as creating a digest of the entire CAR (and refer to it using a CID).

Appendix: Media Type

The media type for CAR is application/vnd.ipld.car.

The conventional file extension for CAR is .car.

References

[cid]
Robin Berjon & Juan Caballero. Content IDs (CIDs). 2025-01-17. URL: https://dasl.ing/cid.html
[dcbor42]
Robin Berjon & Juan Caballero. Deterministic CBOR with tag 42 (dCBOR42). 2025-01-17. URL: https://dasl.ing/dcbor42.html
[leb128]
Wikipedia. LEB128. Retrieved December 2024. URL: https://en.wikipedia.org/wiki/LEB128