MASL — Metadata for Arbitrary Structures & Links
date | 2025-05-05 |
---|---|
editors | Robin Berjon <robin@berjon.com> Juan Caballero <bumblefudge@learningproof.xyz> |
issues | list, new |
abstract | MASL is a CBOR-based metadata system that is designed to work well with content-addressed and decentralised systems, to enable fully self-contained, self-certified content distribution. |
Introduction
Anywhere you have resources that will be deployed in real-world systems, the potential metadata needs of those systems are effectively unbounded. This is particularly true of decentralised systems that need to exhibit "web-like" behaviour: in order to have reproducible and safe execution when content is sourced from arbitrary sources, it is necessary to have an equivalent to HTTP headers that is as verifiable as the content itself and not the lossy behaviour incurred from treating the web as a file system (which it isn't).
Designing or constraining a syntax for all metadata needs would be hubris and madness. Instead, this document tries to minimally constrain applications while illustrating "where to stick" that metadata, as there are so few layers and hiding places in the DASL system.
How to use a MASL document
The recommended structure for DASL metadata is to insert a dCBOR42 document "between" each
CID and its resource(s), essentially using dCBOR42 as headers wrapping a CID-addressed resource.
To do this, simply replace the CID of a resource with the CID of a dCBOR42 document that contains either a
top-level (tag 42) property src
(for a single resource), containing the
CID of the resource it describes, or a top-level mapping resources
(for a collection
of resources), mapping paths to resource CIDs, each with optional dCBOR42 metadata ([cid], [dcbor42]).
It is preferable to nest any metadata in a top-level object to namespace your own metadata standard (inside an object named, for example, `my-cool-project-v1`) rather than opaque version-bits or magic numbers. There are a few reserved words at the top-level, but these can be avoided by nesting any conflicts.
There are many metadata standards that can be embedded in this way to facilitate the preservation of metadata at ingress as well as translatability. For example, the IPFS-based storage system Storacha has a robust CID-based metadata system called content credentials which includes UCAN-based permissioning and CID "equivalence," i.e. one or more IPFS CIDs equivalent to given DASL CID.
Using MASL with CAR
CAR files ([car]) have a space reserved for metadata in their header. A MASL metadata document,
particularly the variant using a resources
map, is well suited to be used there. The
resources field can be used to map paths to the CIDs of resources contained in CAR
blocks.
In order to work within CAR files, the version
and roots
fields must
be set to integer 1
and a possibly-empty array of tag 42 CIDs respectively. Neither
of these fields has any meaning in MASL, but they must be provided for historical compatibility
reasons.
Fields
MASL is designed to host arbitrary metadata but for interoperability purposes a number of root fields have predetermined values. Authors are invited to add their own metadata by creating namespaced objects at the top level.
NOTE: In examples below, whenever we represent a CID as JSON for, say, field
src
, we use "src": { "$link": "CID value…" }
as a convention.
Single or Multiple Resources
MASL documents are primarily used to wrap around other resources for which they provide metadata. This can happen in one of two modes:
-
Single Mode (using
src
): the metadata is only for one resource, which is the one that can be retrieved from the CID pointed to bysrc
. HTTP metadata, if specified, goes at the root. App manifest metadata on a single resource can be used if that resource a fully standalone document (e.g. a PDF). -
Bundle Mode (using
resources
): the metadata is used to describe a whole set of resources. These resources SHOULD be related to one another in some way (e.g. components that go into building an app or document). The keys of theresources
map are complete paths that MUST start with/
and the values are metadata objects that MUST have ansrc
field pointing to the resource's CID and SHOULD have amediaType
field giving its MIME type, along with other HTTP headers.
Note that if both src
and resources
are specified, then
src
MUST be ignored.
The Bundle Mode has some specific processing rules: Loading bundles: default, relative, no directory, note not zip
-
The entry with path
/
is the default path that is loaded if the bundle itself is being rendered. Implementations MUST only recognise this as the default and MUST NOT automatically decide to pick a given entry (e.g./index.html
). - When loading a bundle into a web context, the root of the bundle is given an opaque origin, and all internal links are resolved relative to that.
-
There is no notion of directory. If a resource is indicating as sitting at
/cats/reds/kitsune.jpg
this does not entail that/cats/
or/cats/reds/
somehow exist. As in web contexts, it is the full path that is matched, not/
-separated subsets. URLs do not map to file systems. - When resolving a URL inside a bundle, implementation MUST only make use of the URL's pathname and MUST ignore the query string. (Note that this departs from typical URL processing but makes it easier to pass parameters between resources internally.)
There is no requirement in MASL that bundles have to be stored in a specific manner. The relevant CIDs may be loaded through whatever way the implementation knows about such as RASL ([rasl]) or may be provided in a CAR file ([car]). Note that one value of this approach when compared to bundling using for instance Zip archives is that the resource map may contain an arbitrarily high number of resources that need not be loaded for the bundle to work other than on demand.
Example with src
:
{ "src": { "$link": "bafkreifn5yxi7nkftsn46b6x26grda57ict7md2xuvfbsgkiahe2e7vnq4" }, "mediaType": "application/pdf", }
Example with resources
:
{ "name": "A Simple Page With Pic", "resources": { "/": { "src": { "$link": "bafk…" }, "mediaType": "text/html" }, "/picture.jpg": { "src": { "$link": "bafk…" }, "mediaType": "image/jpeg" } } }
HTTP Headers
MASL supports a subset of HTTP response headers that are meaningful in decentralised contexts. This doesn't preclude headers not listed here from being used, but implementations that support using HTTP headers SHOULD NOT reflect the value of arbitrary HTTP headers without considering the potential attack surface they create.
When using HTTP headers as MASL metadata, there are two modes. If the MASL document
contains a root resources
field then it is a MASL document for multiple resources
and the HTTP headers are only meaningful if they are set on values of the resources
map (and MUST be ignored if set on the root object). Conversely, if this MASL document
contains a src
field (and no resources
) then the HTTP headers MUST
be set on the root and ignored otherwise. If neither src
nor resources
are specified, the meaning of HTTP fields is undefined.
All HTTP headers, where specified, are lowercased except for mediaType
which
for historical reasons maps to content-type
.
Supported headers:
content-disposition
content-encoding
content-language
-
content-security-policy
: keep in mind however that runtime contexts are likely to already have a strict CSP that will override or constrain this one. mediaType
: used instead ofcontent-type
.link
permissions-policy
referrer-policy
service-worker-allowed
-
sourcemap
: this must point to another resource in theresources
map. Implementations SHOULD verify that this is the case as source maps could otherwise be used to exfiltrate information. -
speculation-rules
: this must point to another resource in theresources
map. supports-loading-mode
x-content-type-options
Example with src
:
{ "src": { "$link": "bafkreifn5yxi7nkftsn46b6x26grda57ict7md2xuvfbsgkiahe2e7vnq4" }, "mediaType": "text/html", "content-language": "en", "service-worker-allowed": "/" }
Example with resources
:
{ "name": "My Doc", "resources": { "/": { "src": { "$link": "bafk…" }, "mediaType": "text/html", "content-encoding": "gzip", "content-language": "fr" }, "/interactive.js": { "src": { "$link": "bafk…" }, "mediaType": "application/javascript", "sourcemap": "/interactive.js.map" }, "/interactive.js.map": { "src": { "$link": "bafk…" }, "mediaType": "application/json" }, "/picture.jpg": { "src": { "$link": "bafk…" }, "mediaType": "image/jpeg" } } }
App Manifest
One useful pattern with MASL is to describe an entire app or document, with all of its resources available for content addressing, possibly within a common CAR ([car]). Such docs or apps should use Web App Manifest metadata ([manifest]) as it is widely understood.
The following manifest fields are guaranteed to be usable:
background_color
,
categories
,
description
,
icons
,
id
,
name
,
screenshots
,
short_name
, and
theme_color
.
Note: other manifest fields MAY be used, but their behaviour is not guaranteed in the kind
of contexts that MASL is used in.
For both icons
and screenshots
, the src
field MUST
be a path that matches an entry in the resources
map, and the type
field that is normally accepted in manifests there MUST NOT be used and MUST be ignored
if specified. Media type information for that resource is specific on the resource entry
that src
maps to.
Example:
{ "name": "Unicorn Editor", "short_name": "Unicorn", "description": "This is simply the best app to edit unicorns with.", "background_color": "#00ff75", "icons": [{ "src": "/unicorn.svg" }], "resources": { "/": { "src": { "$link": "bafk…" }, "mediaType": "text/html" }, "/unicorn.svg": { "src": { "$link": "bafk…" }, "mediaType": "image/svg" } } }
CAR Compatibility
As indicated in the CAR specification ([car]), the metadata object in the CAR header
must contain a version
field set to integer 1
and a
roots
field set to an array (that may be empty) of tag 42 CIDs. These
fields have no meaning for MASL, but are expected to be set when MASL is used for CAR
metadata for historical compatibility. Note that using versions in this way is an
antipattern, and we expect the value never to change.
Example:
{ "name": "Get in the CAR if you want to live", "version": 1, "roots": [] }
AT Compatibility
When used with the AT Protocol ([at]), it is common that objects will need to feature
a $type
field. If present, it MUST be a string and SHOULD be set to the
value ing.dasl.masl
.
Versioning
When manipulating DAGs, it can be useful to keep track of history by referencing
earlier versions of the same data or metadata. This can be done using the prev
field, which if present MUST be a tag 42 CID pointing to a previous MASL document.
Example:
{ "name": "Unicorn Editor", "prev": { "$link": "bafkreifn5yxi7nkftsn46b6x26grda57ict7md2xuvfbsgkiahe2e7vnq4" } }
Lexicon
Making a precise lexicon ([lexicon]) for MASL is impossible because lexicons lack a way of constraining objects with arbitrary keys. However, the following may still prove useful when MASL is integrated with the AT Protocol ([at]).
{ "lexicon": 1, "id": "ing.dasl.masl", "defs": { "main": { "type": "object", "properties": { "src": { "type": "string", "format": "cid" }, "resources": { "type": "object" }, // HTTP "mediaType": { "type": "string" }, "content-disposition": { "type": "string" }, "content-encoding": { "type": "string" }, "content-language": { "type": "string" }, "content-security-policy": { "type": "string" }, "link": { "type": "string" }, "permissions-policy": { "type": "string" }, "referrer-policy": { "type": "string" }, "service-worker-allowed": { "type": "string" }, "sourcemap": { "type": "string" }, "speculation-rules": { "type": "string" }, "supports-loading-mode": { "type": "string" }, "x-content-type-options": { "type": "string" }, // Manifest "background_color": { "type": "string" }, "categories": { "type": "array", "items": { "type": "string" } }, "description": { "type": "string" }, "icons": { "type": "array", "items": { "type": "object", "properties":{ "src": { "type": "string" }, "sizes": { "type": "string" }, "purpose": { "type": "string" } }, "required": ["src"] } }, "id": { "type": "string" }, "name": { "type": "string" }, "screenshots": { "type": "array", "items": { "type": "object", "properties":{ "src": { "type": "string" }, "sizes": { "type": "string" }, "label": { "type": "string" }, "form_factor": { "type": "string", "enum": ["narrow", "wide"] }, "platform": { "type": "string", "enum": [""android", "chromeos", "ios", "ipados", "kaios", "macos", "windows", "xbox", "chrome_web_store", "itunes", "microsoft", "microsoft", "play"] }, }, "required": ["src"] } }, "short_name": { "type": "string" }, "theme_color": { "type": "string" }, // CAR compatibility "version": { "type": "integer", "const": 1 }, "roots": { "type": "array", "items": { "type": "string", "format": "cid" } }, // AT (specifying this might not be AT compatible) "$type": { "type": "string" }, // versioning "prev": { "type": "string", "format": "cid" } } } } }
References
- [at]
- AT Protocol.
- [car]
- Robin Berjon & Juan Caballero. Content-Addressable aRchives (CAR). 2025-05-05. URL: https://dasl.ing/car.html
- [cid]
- Robin Berjon & Juan Caballero. Content IDs (CIDs). 2025-05-05. URL: https://dasl.ing/cid.html
- [dcbor42]
- Robin Berjon & Juan Caballero. Deterministic CBOR with tag 42 (dCBOR42). 2025-05-05. URL: https://dasl.ing/dcbor42.html
- [lexicon]
- AT Protocol: Lexicon.
- [manifest]
- M. Cáceres, K. Rohde Christiansen, D. González, D. Murphy, C. Liebel. Web Application Manifest. March 2025. URL: https://www.w3.org/TR/appmanifest/
- [rasl]
- Robin Berjon & Juan Caballero. RASL — Retrieval of Arbitrary Structures & Links. 2025-05-05. URL: https://dasl.ing/rasl.html