Working With Data

In this section

Working with Data (this page)
- Identifiers — understanding and using addresses of different types
- Entries — creating, reading, updating, and deleting
- Links, Paths, and Anchors — creating and deleting
- Querying Source Chains — getting data from an agent’s history
- Validation Receipts — discovering the propagation state of data

Holochain is, at its most basic, a framework for building graph databases on top of content-addressed storage that is validated and stored by networks of peers. Each peer contributes to the state of this database by publishing actions to an event journal called their source chain, which is stored on their device. The source chain can also be used to hold private data.

Entries, actions, and records: primary data

Data in Holochain takes the shape of a record. Different kinds of records have different purposes, but the thing common to all records is the action: one participant’s attempt to manipulate their own state and/or the application’s shared database state in some way. All actions contain:

The agent ID of the author
A timestamp
The type of action
The hash of the previous action in the author’s history of state changes, called their source chain (note: the first action in their chain doesn’t contain this field)
The index of the action in the author’s source chain, called the action seq

Some actions also contain a weight, which is a calculation of the cost of storing the action and can be used for spam prevention. (Note: weighting isn’t implemented yet.)

The other important part of a record is the entry. Not all action types have an entry to go along with them, but those that do, Create and Update, are called entry creation actions and are the main source of data in an application.

It’s generally most useful to think about a record (entry plus creation action) as the primary unit of data. This is because the action holds useful context about when an entry was written and by whom. A unique entry, no matter how many times it’s written, is considered to be one piece of content:

"hello" // entry ID is hCEkhy0q54imKYjEpFdLTncbqAaCEGK3LCE+7HIA0kGIvTw

But that entry, paired with its respective creation actions into records, can be treated as two pieces of content:

{
    "create": {
        "author": "hCAkKUej3Mcu+40AjNGcaID2sQA6uAUcc9hmJV9XIdwUJUE", 
        "timestamp": 1732314805000000,
        "entry_hash": "hCEkhy0q54imKYjEpFdLTncbqAaCEGK3LCE+7HIA0kGIvTw" // "hello" entry 
    }
} // action ID is hCkkDHBZjU1a7L3gm6/qhImbWG6KG4Oc2ZiWDyfPSGoziBs

{
    "create": {
        "author": "hCAk4R0sY+orZRxeFwqFTQSrhalgY+W2pLEJ5mihgY3CE7A", 
        "timestamp": 1481614210000000,
        "entry_hash": "hCEkhy0q54imKYjEpFdLTncbqAaCEGK3LCE+7HIA0kGIvTw" // "hello" entry 
    }
} // action ID is hCkk1Oqnmn/xDVFNS+L2Z2PuQf9nN1/FmoAewlA8SV10jb8

(Note: these samples are simplified and JSON-ified to focus on the bits that matter.)

The graph DHT: Holochain’s shared database

Each application creates a shared graph database, where content is connected together by links. The underlying data store for this database is a distributed hash table or DHT, which is just a big key/value store. Primary content (entries and actions) is stored and retrieved by its identifier or address (usually its cryptographic hash), so we can call it addressable content. Then, the graph is built by attaching links and other kinds of metadata to those same addresses.

The application’s users all share responsibility for storing and validating this database and modifications to it.

Storage locations and privacy

Each DNA creates a network of peers who participate in storing pieces of that DNA’s database, which means that each DNA’s database (and the source chains that contribute to it) is completely separate from all others. This creates a per-network privacy for shared data. On top of that, entries can either be:

Private, stored encrypted on the author’s device in their source chain in an encrypted database and accessible to them only, or
Public, stored in the graph database and accessible to all participants.

All actions are public.

Links

A link is a piece of metadata attached to an address, the base, and points to another address, the target. It has a link type that gives it meaning in the application just like an entry type, as well as an optional tag that can store arbitrary application data.

When a link’s base and target don’t exist as addressable content in the database, they’re considered external references, and it’s up to your front end to decide how to handle them.

CRUD metadata graph

Holochain has a built-in create, read, update, and delete (CRUD) model. Data in the graph database and participants’ local state cannot be modified or deleted, so these kinds of mutation are simulated by attaching metadata to existing data. This builds up a graph of the history of a given piece of content.

We’ll get deeper into this in the next section and in the page on entries.

Individual state histories as public records

All data in an application’s database ultimately comes from the peers who participate in storing and serving it. Each piece of data originates in a participant’s source chain, which is an event journal that contains all the actions they’ve authored. These actions describe intentions to add to either the DHT’s state or their own state.

Every action becomes part of the shared DHT, but not every entry needs to. The entry content of most system-level actions is private. You can also mark an application entry type as private, and its content will stay on the participant’s device and not get published to the graph.

Because every action has a reference to both its author and its previous action in the author’s source chain, each participant’s source chain can be considered a linear graph of their authoring history.

Adding and modifying data

Because data can’t be deleted, mutation is simulated by adding metadata that describes changes to existing data’s state. The current state of an entry or record is calculated using simple rules, but you can also access the underlying metadata and implement your own CRUD model.

Every change starts out as an action on someone’s source chain. This action is turned into DHT operations that get sent to various peers, validated, and integrated into their portions of the database. DHT operations are beyond the scope of this page, so let’s focus on the result of integrating these operations.

Actions as both content and author history

In addition to the changes described below, every action:

is stored as content at the action’s address.
is stored as metadata at the author’s agent ID address, which allows peers responsible for that address to collect and validate a summary of their entire history.

Create
- The entry is stored as content at the entry’s address, if it doesn’t already exist.
- The action is stored as metadata at the entry’s address.
Update does the same as Create, but also:
- The action is added as metadata to the addresses of the original entry and its entry creation action. These serve as pointers from the original content to their replacements.
Delete
- The action is stored as metadata on the entry and entry creation action that it deletes, indicating their deleted status. The action contains all the information, so there’s no entry content.
CreateLink
- The action is stored as metadata on the link’s base address. The action contains all the link information.
DeleteLink
- The action is stored as metadata at the base address of the link it deletes, indicating that the link has been deleted.
- The action is stored as metadata at the action address of the link it deletes as well.

Private entries are also manipulated via Create, Update, and Delete actions, but only the action gets published to the graph, as the entry content and action content are separate parts of the record.

Default CRUD rules

The built-in CRUD model is simplistic, collecting all the metadata on an entry or record and producing a final state using these rules:

Although an entry can have multiple creation actions attached to it as metadata, the record returned contains the oldest-timestamped entry creation action that doesn’t have a corresponding delete action.
There’s no built-in logic for updates, which means that multiple updates can exist on one entry creation action. This creates a branching update model similar to Git and leaves room for you to create your own conflict resolution mechanisms if you need them. Updates aren’t retrieved by default; you must retrieve them by asking for an address’ metadata.
A delete applies to an entry creation action, not an entry. An entry is considered live until all of its creation actions are deleted, at which point it’s fully dead. A dead entry is live once again if a new entry creation action authors it.
Unlike entries, links are completely contained in the action, and are always distinct from each other, even if their base, target, type, and tag are identical. There’s no link update action, and a link deletion action marks one link creation action as dead.

If these rules don’t work for you, you can always directly access the underlying metadata and implement your own CRUD model.

Deleted/dead data

Data doesn’t ever disappear from the DHT. Instead, deletion actions simply mark an entry or link as dead, which means it won’t be retrieved when you ask for it — unless you ask for the metadata at the basis address.

Privacy

Each DNA within a Holochain application has its own network and database, isolated from all other DNAs’ networks and their databases. For each participant in a DNA, their source chain is separate from the source chains of all other DNAs they participate in, and the source chains of all other participants in the same DNA. Within a DNA, all shared data can be accessed by any participant, but the only one who can access a participant’s private entries is themselves.

A DNA can be cloned, creating a separate network, database, and set of source chains for all participants who join it. This lets you use the same backend code to define private spaces within one application to restrict access to certain shared databases.

Summary: multiple interrelated graphs

The shared DHT and the individual source chains are involved in multiple interrelated graphs — the source chain contributes to the DHT’s graph, and the DHT records source chain history. You can use as little or as much of these graphs as your application needs.