MASL — Metadata for Arbitrary Structures & Links

date2025-05-05
editorsRobin Berjon <robin@berjon.com>
Juan Caballero <bumblefudge@learningproof.xyz>
issueslist, new
abstract

MASL is a CBOR-based metadata system that is designed to work well with content-addressed and decentralised systems, to enable fully self-contained, self-certified content distribution.

Introduction

Anywhere you have resources that will be deployed in real-world systems, the potential metadata needs of those systems are effectively unbounded. This is particularly true of decentralised systems that need to exhibit "web-like" behaviour: in order to have reproducible and safe execution when content is sourced from arbitrary sources, it is necessary to have an equivalent to HTTP headers that is as verifiable as the content itself and not the lossy behaviour incurred from treating the web as a file system (which it isn't).

Designing or constraining a syntax for all metadata needs would be hubris and madness. Instead, this document tries to minimally constrain applications while illustrating "where to stick" that metadata, as there are so few layers and hiding places in the DASL system.

How to use a MASL document

The recommended structure for DASL metadata is to insert a dCBOR42 document "between" each CID and its resource(s), essentially using dCBOR42 as headers wrapping a CID-addressed resource. To do this, simply replace the CID of a resource with the CID of a dCBOR42 document that contains either a top-level (tag 42) property src (for a single resource), containing the CID of the resource it describes, or a top-level mapping resources (for a collection of resources), mapping paths to resource CIDs, each with optional dCBOR42 metadata ([cid], [dcbor42]).

It is preferable to nest any metadata in a top-level object to namespace your own metadata standard (inside an object named, for example, `my-cool-project-v1`) rather than opaque version-bits or magic numbers. There are a few reserved words at the top-level, but these can be avoided by nesting any conflicts.

There are many metadata standards that can be embedded in this way to facilitate the preservation of metadata at ingress as well as translatability. For example, the IPFS-based storage system Storacha has a robust CID-based metadata system called content credentials which includes UCAN-based permissioning and CID "equivalence," i.e. one or more IPFS CIDs equivalent to given DASL CID.

Using MASL with CAR

CAR files ([car]) have a space reserved for metadata in their header. A MASL metadata document, particularly the variant using a resources map, is well suited to be used there. The resources field can be used to map paths to the CIDs of resources contained in CAR blocks.

In order to work within CAR files, the version and roots fields must be set to integer 1 and a possibly-empty array of tag 42 CIDs respectively. Neither of these fields has any meaning in MASL, but they must be provided for historical compatibility reasons.

Fields

MASL is designed to host arbitrary metadata but for interoperability purposes a number of root fields have predetermined values. Authors are invited to add their own metadata by creating namespaced objects at the top level.

NOTE: In examples below, whenever we represent a CID as JSON for, say, field src, we use "src": { "$link": "CID value…" } as a convention.

Single or Multiple Resources

MASL documents are primarily used to wrap around other resources for which they provide metadata. This can happen in one of two modes:

  • Single Mode (using src): the metadata is only for one resource, which is the one that can be retrieved from the CID pointed to by src. HTTP metadata, if specified, goes at the root. App manifest metadata on a single resource can be used if that resource a fully standalone document (e.g. a PDF).
  • Bundle Mode (using resources): the metadata is used to describe a whole set of resources. These resources SHOULD be related to one another in some way (e.g. components that go into building an app or document). The keys of the resources map are complete paths that MUST start with / and the values are metadata objects that MUST have an src field pointing to the resource's CID and SHOULD have a mediaType field giving its MIME type, along with other HTTP headers.

Note that if both src and resources are specified, then src MUST be ignored.

The Bundle Mode has some specific processing rules: Loading bundles: default, relative, no directory, note not zip

  • The entry with path / is the default path that is loaded if the bundle itself is being rendered. Implementations MUST only recognise this as the default and MUST NOT automatically decide to pick a given entry (e.g. /index.html).
  • When loading a bundle into a web context, the root of the bundle is given an opaque origin, and all internal links are resolved relative to that.
  • There is no notion of directory. If a resource is indicating as sitting at /cats/reds/kitsune.jpg this does not entail that /cats/ or /cats/reds/ somehow exist. As in web contexts, it is the full path that is matched, not /-separated subsets. URLs do not map to file systems.
  • When resolving a URL inside a bundle, implementation MUST only make use of the URL's pathname and MUST ignore the query string. (Note that this departs from typical URL processing but makes it easier to pass parameters between resources internally.)

There is no requirement in MASL that bundles have to be stored in a specific manner. The relevant CIDs may be loaded through whatever way the implementation knows about such as RASL ([rasl]) or may be provided in a CAR file ([car]). Note that one value of this approach when compared to bundling using for instance Zip archives is that the resource map may contain an arbitrarily high number of resources that need not be loaded for the bundle to work other than on demand.

Specify a scheme and fetch rules properly.

Example with src:

{
  "src": { "$link": "bafkreifn5yxi7nkftsn46b6x26grda57ict7md2xuvfbsgkiahe2e7vnq4" },
  "mediaType": "application/pdf",
}
        

Example with resources:

{
  "name": "A Simple Page With Pic",
  "resources": {
    "/": {
      "src": { "$link": "bafk…" },
      "mediaType": "text/html"
    },
    "/picture.jpg": {
      "src": { "$link": "bafk…" },
      "mediaType": "image/jpeg"
    }
  }
}
        

HTTP Headers

MASL supports a subset of HTTP response headers that are meaningful in decentralised contexts. This doesn't preclude headers not listed here from being used, but implementations that support using HTTP headers SHOULD NOT reflect the value of arbitrary HTTP headers without considering the potential attack surface they create.

When using HTTP headers as MASL metadata, there are two modes. If the MASL document contains a root resources field then it is a MASL document for multiple resources and the HTTP headers are only meaningful if they are set on values of the resources map (and MUST be ignored if set on the root object). Conversely, if this MASL document contains a src field (and no resources) then the HTTP headers MUST be set on the root and ignored otherwise. If neither src nor resources are specified, the meaning of HTTP fields is undefined.

All HTTP headers, where specified, are lowercased except for mediaType which for historical reasons maps to content-type.

Supported headers:

  • content-disposition
  • content-encoding
  • content-language
  • content-security-policy: keep in mind however that runtime contexts are likely to already have a strict CSP that will override or constrain this one.
  • mediaType: used instead of content-type.
  • link
  • permissions-policy
  • referrer-policy
  • service-worker-allowed
  • sourcemap: this must point to another resource in the resources map. Implementations SHOULD verify that this is the case as source maps could otherwise be used to exfiltrate information.
  • speculation-rules: this must point to another resource in the resources map.
  • supports-loading-mode
  • x-content-type-options

Example with src:

{
  "src": { "$link": "bafkreifn5yxi7nkftsn46b6x26grda57ict7md2xuvfbsgkiahe2e7vnq4" },
  "mediaType": "text/html",
  "content-language": "en",
  "service-worker-allowed": "/"
}
        

Example with resources:

{
  "name": "My Doc",
  "resources": {
    "/": {
      "src": { "$link": "bafk…" },
      "mediaType": "text/html",
      "content-encoding": "gzip",
      "content-language": "fr"
    },
    "/interactive.js": {
      "src": { "$link": "bafk…" },
      "mediaType": "application/javascript",
      "sourcemap": "/interactive.js.map"
    },
    "/interactive.js.map": {
      "src": { "$link": "bafk…" },
      "mediaType": "application/json"
    },
    "/picture.jpg": {
      "src": { "$link": "bafk…" },
      "mediaType": "image/jpeg"
    }
  }
}
        

App Manifest

One useful pattern with MASL is to describe an entire app or document, with all of its resources available for content addressing, possibly within a common CAR ([car]). Such docs or apps should use Web App Manifest metadata ([manifest]) as it is widely understood.

The following manifest fields are guaranteed to be usable: background_color, categories, description, icons, id, name, screenshots, short_name, and theme_color. Note: other manifest fields MAY be used, but their behaviour is not guaranteed in the kind of contexts that MASL is used in.

For both icons and screenshots, the src field MUST be a path that matches an entry in the resources map, and the type field that is normally accepted in manifests there MUST NOT be used and MUST be ignored if specified. Media type information for that resource is specific on the resource entry that src maps to.

Example:

{
  "name": "Unicorn Editor",
  "short_name": "Unicorn",
  "description": "This is simply the best app to edit unicorns with.",
  "background_color": "#00ff75",
  "icons": [{ "src": "/unicorn.svg" }],
  "resources": {
    "/": {
      "src": { "$link": "bafk…" },
      "mediaType": "text/html"
    },
    "/unicorn.svg": {
      "src": { "$link": "bafk…" },
      "mediaType": "image/svg"
    }
  }
}
        

CAR Compatibility

As indicated in the CAR specification ([car]), the metadata object in the CAR header must contain a version field set to integer 1 and a roots field set to an array (that may be empty) of tag 42 CIDs. These fields have no meaning for MASL, but are expected to be set when MASL is used for CAR metadata for historical compatibility. Note that using versions in this way is an antipattern, and we expect the value never to change.

Example:

{
  "name": "Get in the CAR if you want to live",
  "version": 1,
  "roots": []
}
        

AT Compatibility

When used with the AT Protocol ([at]), it is common that objects will need to feature a $type field. If present, it MUST be a string and SHOULD be set to the value ing.dasl.masl.

Versioning

When manipulating DAGs, it can be useful to keep track of history by referencing earlier versions of the same data or metadata. This can be done using the prev field, which if present MUST be a tag 42 CID pointing to a previous MASL document.

Example:

{
  "name": "Unicorn Editor",
  "prev": { "$link": "bafkreifn5yxi7nkftsn46b6x26grda57ict7md2xuvfbsgkiahe2e7vnq4" }
}
        

Lexicon

Making a precise lexicon ([lexicon]) for MASL is impossible because lexicons lack a way of constraining objects with arbitrary keys. However, the following may still prove useful when MASL is integrated with the AT Protocol ([at]).

{
  "lexicon": 1,
  "id": "ing.dasl.masl",
  "defs": {
    "main": {
      "type": "object",
      "properties": {
        "src": {
          "type": "string",
          "format": "cid"
        },
        "resources": {
          "type": "object"
        },
        // HTTP
        "mediaType": { "type": "string" },
        "content-disposition": { "type": "string" },
        "content-encoding": { "type": "string" },
        "content-language": { "type": "string" },
        "content-security-policy": { "type": "string" },
        "link": { "type": "string" },
        "permissions-policy": { "type": "string" },
        "referrer-policy": { "type": "string" },
        "service-worker-allowed": { "type": "string" },
        "sourcemap": { "type": "string" },
        "speculation-rules": { "type": "string" },
        "supports-loading-mode": { "type": "string" },
        "x-content-type-options": { "type": "string" },
        // Manifest
        "background_color": {
          "type": "string"
        },
        "categories": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "description": {
          "type": "string"
        },
        "icons": {
          "type": "array",
          "items": {
            "type": "object",
            "properties":{
              "src": { "type": "string" },
              "sizes": { "type": "string" },
              "purpose": { "type": "string" }
            },
            "required": ["src"]
          }
        },
        "id": {
          "type": "string"
        },
        "name": {
          "type": "string"
        },
        "screenshots": {
          "type": "array",
          "items": {
            "type": "object",
            "properties":{
              "src": { "type": "string" },
              "sizes": { "type": "string" },
              "label": { "type": "string" },
              "form_factor": {
                "type": "string",
                "enum": ["narrow", "wide"]
              },
              "platform": {
                "type": "string",
                "enum": [""android", "chromeos", "ios", "ipados", "kaios", "macos", "windows", "xbox", "chrome_web_store", "itunes", "microsoft", "microsoft", "play"]
              },
            },
            "required": ["src"]
          }
        },
        "short_name": {
          "type": "string"
        },
        "theme_color": {
          "type": "string"
        },
        // CAR compatibility
        "version": {
          "type": "integer",
          "const": 1
        },
        "roots": {
          "type": "array",
          "items": {
            "type": "string",
            "format": "cid"
          }
        },
        // AT (specifying this might not be AT compatible)
        "$type": {
          "type": "string"
        },
        // versioning
        "prev": {
          "type": "string",
          "format": "cid"
        }
      }
    }
  }
}
      

References

[at]
AT Protocol.
[car]
Robin Berjon & Juan Caballero. Content-Addressable aRchives (CAR). 2025-05-05. URL: https://dasl.ing/car.html
[cid]
Robin Berjon & Juan Caballero. Content IDs (CIDs). 2025-05-05. URL: https://dasl.ing/cid.html
[dcbor42]
Robin Berjon & Juan Caballero. Deterministic CBOR with tag 42 (dCBOR42). 2025-05-05. URL: https://dasl.ing/dcbor42.html
[lexicon]
AT Protocol: Lexicon.
[manifest]
M. Cáceres, K. Rohde Christiansen, D. González, D. Murphy, C. Liebel. Web Application Manifest. March 2025. URL: https://www.w3.org/TR/appmanifest/
[rasl]
Robin Berjon & Juan Caballero. RASL — Retrieval of Arbitrary Structures & Links. 2025-05-05. URL: https://dasl.ing/rasl.html