Catalogs on the Filesystem

This document talks about how Catalog data is presented on a filesystem.

Catalogs on the Filesystem are a Projection

The first thing to note: Catalogs are a merkle tree, defined natively in IPLD.

We also have a default and well-known convention for how to project this merkle tree onto a filesystem, as JSON objects. However, this is a convention. The essential form of a Catalog is still just a series of IPLD objects -- and that's the form that all hashes are defined upon.

The projection into a filesystem layout is based on some of the properties in the data, for convenience and ease-of-splunking.

Some pieces of data may look redundant when observed on the filesystem. In particular, the CIDs which point to other parts of the Catalog data often look "redundant". The reason for this is that the underlying merkle tree is purely linked by these CIDs, and the filesystem paths used in the projection are not actually a part of that merkle tree!

Filesystem Outline

Example catalog filesystem outline:

{catalogroot}/{moduleName}/_module.json
{catalogroot}/{moduleName}/_releases/{releasename}.json
{catalogroot}/{moduleName}/_mirrors.json
{catalogroot}/{moduleName}/_replays/{HASH}

Why like this?

Can you author this filesystem manually?

Kinda, but not really. You'll need some tool assistance to do so.

If you write a release json by hand, you'd need to invoke a small tool to compute its CID and add it to the _module.json. You probably can't do that by hand easily.

Similarly, we should have a catalog linting tool that makes sure there aren't extra _releases/*.json files floating around that aren't actually referenced, and similarly for replays, etc.