Overview

The Crums Time Chain is a witness micro service providing digital certificates designed to establish when an object was first seen. The object can be anything digital: plain text, a document, an image, a video clip, a signature, a row in a database. The objects are actually identified by their SHA-256 hash, and it is these hashes that are witnessed and tracked.

To see how the service works, it's useful to consider how (without using this service) you could leave digital evidence on the web proving to anyone in the future that a file existed today or earlier. A simple solution might be to tweet a cryptographic hash (eg SHA-256) of your file: since tweets are timestamped by Twitter—assuming you trust the inner plumbings of Twitter (and if there's still no edit button), then you've left a digital trail. A link to your tweet then might satisfy a court in the future that you were today in possession, for example, of that document describing your invention. Or you might drop the hash in a conversation on stackexchange (again, assuming it's not editable). Or for good measure, just in case Twitter goes out of business, you might embed that hash on the blockchain in a Bitcoin transaction.

Indeed this is exactly the idea behind how the crums.io time chain works. It creates an immutable digital audit trail embedding information about which hashes it has seen. Because this digital trail is fairly compact, the service can record and back it up at many locations.

How Does It Work?

Continuing with the do-it-yourself example above, how would you timestamp 10 files, not just one? You could of course tweet all 10 hashes. But let's say you don't want anyone to know just yet how many files you're timestamping. One approach might be to combine them into one file (say a zip file) and then tweet the hash of that. The downside with this is that to prove you had any 1 of those 10 documents at the time of the tweet, you now have to show all 10 (the zip file) that make up the tweeted hash.

A better DIY method than the zip file idea would be to [compute the] hash [of] each file, write those hashes in a separate file, and then tweet the hash of that file of hashes. You save this file of hashes (along with the documents you timestamped), and now you can prove any 1 of those 10 documents belong to your tweeted hash without revealing the other 9. The way crums.io works is closer to this second approach.

Merkle Trees

We use a well known data structure called a Merkle tree to combine document hashes into a single hash. The idea is similar to the DIY way of writing all the hashes in a single file and then computing the hash of that file of hashes-but with some advantages. A Merkle tree structure looks something like this:

It's a pyramid of hashes. The service publishes one of these roughly every 5 minutes. At the base of each pyramid (Merkle tree) lie the hashes crums.io has seen over those minutes, roughly in the order the service has seen them. The layer immediately above the base contains the hash of the combination (concatenation) of each pair of hashes in the base layer below. So there are half as many hashes in this layer above. The same pairing rule is applied at successive layers until there is but one hash at the top of the pyramid. We've glossed over some details, but that's essentially how the Merkle tree is built.

The hashes that go into the tree are not actually the hashes the service witnesses; instead every hash added is derived from a combination of both the hash and the time it was witnessed. We call the structure containing this hash/time combination a crum.

One nice thing about Merkle trees is that you don't have to know what's in the whole tree in order to show that a particular hash belongs in it. In fact all you need are a few intermediate hashes that establish a path to the root of the tree (top of the pyramid in our parlance). A structure establishing this path is known as a Merkle proof.

In our application, we're calling this path of hashes from a crum to the root of the tree a crumtrail: it's for the user to keep and is their timestamp for the hash they dropped to be witnessed. The size of this timestamp is not very many bytes no matter how big the tree. (It's a multiple of how many digits there are in the number hashes there are in the tree.)

Memory, Crumtrails, and Forgetfulness

The service doesn't remember every hash it's ever seen. (How could it?) No, it remembers at best maybe a few days back. As it publishes new Merkle trees it drops old ones. But it still saves (remembers) 2 things about the Merkle tree it's about to drop.

  1. The root hash (at the top) of the Merkle tree (and the time it was published).
  2. The last Merkle proof in the tree.

The service keeps these forever. The first, the saved root hash, identifies a Merkle tree the service has published. It provides one way to verify that a given crumtrail file was indeed generated by the service and belongs to the Merkle tree it published.

The second thing about a Merkle tree that's always saved, the last Merkle proof is kept for bookkeeping. The very last hash (in the bottom layer) of any crums.io tree is special: it's the root of the previously published Merkle tree, and its Merkle proof threads the previous Merkle tree to this one. In this way, the service maintains an audit trail of everything it's done-without remembering much.

Secondary Evidence

From time to time the service sprinkles corroborating evidence across 3rd party online assets in a way that is meant to timestamp the service's state. This is done by publishing the latest Merkle proof linking the previous tree to the new one. A multitude of such 3rd party sites have been under consideration, including tombstoning on the Bitcoin blockchain. Procedures to validate a crumtrail against 3rd party evidence without appeal to crums.io are spelled out in the technical docs.

Workflow

Because new Merkle trees are only published every few minutes, the witness workflow is a 2 step process.

When the service witnesses (receives) a hash it doesn’t remember seeing, it first generates and returns a new crum. This crum is not the final the final product: it just contains the hash, and the witness time (in UTC milliseconds).

Behind the scenes, the witnessing of the new (i.e. not recently seen) hash triggers its inclusion in an upcoming tree (the next tree or the one right after it). It takes a few minutes, but once the next tree is generated, using the same hash, its crumtrail is available. The user should download and save the hash’s crumtrail promptly—within, say, the day. (As noted above, the information for generating a verifiable crumtrail is not kept indefinitely.)

Packaging Cryptographic Evidence

A crumtrail can be summarized as a crytographic packaging that evidences the time a hash was seen. But what about "the left hand side of the equation"? How do we package the object itself, alongside the evidence its hash was seen? Is the object always a file? Is the object's hash to be computed the same way you do it for a file (the hash of a stream of bytes)? Or is the object a data structure, with a structured hash computed from the hashes of its constituent parts?

Of course, the time chain itself leaves open these questions: you can design your hashes to be constructed by whatever rules that best suit your needs. For example, if you only care about the content in certain PDF documents and not their formatting, then you might compute their hashes by first transforming each PDF into to text, tokenizing that text (words), and then computing the hash only over those tokens. That would be an example of what I'm calling structured hashing.

Presently, there's a ready-made, structured cryptographic packaging called a morsel for evidencing the historical state of any ledger, or ledger-like object. For its commitment scheme, instead of a Merkle tree, it uses a new data structure called a skip ledger. (Its principal advantage over a Merkle tree is that it also doubles as a cryptographic accumulator.) The Ledgers project hosts software to achieve this. Using the tools provided there, and an SQL query representing your ledger, you can monitor, track, timestamp (using crumtrails) the ledger's contents as it grows, and if you so choose, you can differentially report how many rows your ledger has, or what the value of any row[s] or specific cell is, without revealing any more information. These cryptographic proofs are distributed in .mrsl files which are a kind of tarball of proofs from the same ledger: you can in fact merge multiple .mrsl files from the same ledger into one—even when the morsels capture the state of the ledger at different stages of its history. The mrsl tool is a bit like a "zip tool", but for .mrsl files: it displays, merges, and redacts information from these files. The latest version of the file format supports embedding a "custom report template": if it does, then using mrsl a user can generate PDF from it. (The PDF will soon be "self-validating".)

© 2020-2022 crums.io