Content IDs (CIDs)

date	2025-06-25
editors	Robin Berjon <robin@berjon.com> Juan Caballero <bumblefudge@learningproof.xyz>
issues	list, new
abstract	DASL CIDs are a simple structured identifier format for content addressing. They encapsulate a hash with enough metadata to be extensible (to add new hash types in the future) and to indicate whether they are pointing to raw bytes or to structured data.

Introduction

DASL CIDs are a simple structured identifier format for content addressing. They encapsulate a hash with enough metadata to be extensible (to add new hash types in the future) and to indicate whether they are pointing to raw bytes or to structured data. If you're simply using DASL CIDs as identifiers, you can almost certainly just use the string as an opaque ID and worry no further.

A DASL CID can be represented as a string or as an array of bytes. If you wish to understand the internals of a CID, it has the following structure:

A b prefix (only in string form). This is an extensibility point for future CID encodings other than the current base32 to be supported. (Currently this is the only one.)
A version number, which is currently always 1.
A content codec, which is a flag indicating whether it is pointing to structured or raw data.
A hash type, that is always SHA-256 ([sha256]).
A hash size, indicating how many bytes long the digest is.
A digest, which is the hash of the content being identified.

Parsing CIDs

Use the following steps to parse a CID string:

Accept a string CID.
Remove the first character from CID and store it in prefix.
If prefix is not equal to b, throw an error.
Decode the rest of CID using the base32 algorithm from RFC4648 with a lowercase alphabet and store the result in CID bytes ([rfc4648]).
Return the result of applying the steps to decode a CID to CID bytes.

Use the following steps to parse a binary CID:

Accept an array of bytes binary CID.
Remove the first byte in binary CID and store it in prefix.
If prefix is not equal to 0 (a null byte, the binary base256 prefix), throw an error.
Store the rest of binary CID in CID bytes.
Return the result of applying the steps to decode a CID to CID bytes.

Use the following steps to decode a CID:

Accept an array of bytes CID bytes.
Remove the first byte in CID bytes and store it in version.
If version is not equal to 1, throw an error.
Remove the next byte in CID bytes and store it in codec.
If codec is not equal to 0x55 (raw) or 0x71 (DRISL), throw an error ([drisl]).
Remove the next byte in CID bytes and store it in hash type.
If hash type is not equal to 0x12 (SHA-256), throw an error ([sha256]).
Read an LEB128 hash size from CID bytes ([leb128]).
If the number of bytes left in CID bytes is smaller than hash size, throw an error. Note that it is possible for hash size to be zero in order to code for an empty CID. At this point, we have two options:
- We are expecting CID bytes to only contain a CID. If the number of bytes left in CID bytes is greater than hash size, throw an error. Store the remaining CID bytes in digest.
- We are reading from the beginning of CID bytes and it may contain more data. This is the case for instance when processing a CAR block ([car]). Only read hash size bytes from CID bytes and leave the rest available for further processing. Store the hash size bytes from CID bytes in digest.
Return version, codec, hash type, hash size, and digest.

Relationship to IPFS

You don't need to understand IPFS in order to use DASL. This section is for informational purposes only.

DASL CIDs are a strict subset of IPFS CIDs with the following properties:

Only modern CIDv1 CIDs are used, not legacy CIDv0.
Only the lowercase base32 multibase encoding (the b prefix) is used for human-readable (and subdomain-usable) string encoding.
Only the raw binary multicodec (0x55) and dag-cbor multicodec (0x71), with the latter used only for [drisl]-conformant DAGs of CBOR objects.
Only SHA-256 (0x12) for the hash function .
The CID isn't the boss of anyone, but the expectation is that, regardless of size, resources should not be "chunked" into a DAG or Merkle tree (as historically done with UnixFS canonicalization in IPFS systems) but rather hashed in their entirety and content-addressed directly. That being said, a DASL CID can point to a piece of [drisl] metadata that describes this kind of chunking, if needed. (A separate specification may be added for that.)
This set of options has the added advantage that all the aforementioned single-byte prefixes require no additional varint processing or byte-fiddling.

References

[car]: Robin Berjon & Juan Caballero. Content-Addressable aRchives (CAR). 2025-06-25. URL: https://dasl.ing/car.html
[drisl]: Robin Berjon & Juan Caballero. DRISL — Deterministic Representation for Interoperable Structures & Links. 2025-06-25. URL: https://dasl.ing/drisl.html
[leb128]: Wikipedia. LEB128. Retrieved December 2024. URL: https://en.wikipedia.org/wiki/LEB128
[rfc4648]: S. Josefsson. The Base16, Base32, and Base64 Data Encodings. October 2006. URL: https://www.rfc-editor.org/rfc/rfc4648
[sha256]: National Institute of Standards and Technology, Secure Hash Algorithm. NIST FIPS 180-2. August 2002.