Validation

In this section

Validation (this page)
- genesis_self_check Callback — writing a function to control access to a network
- validate Callback — basic callback, examples using stub functions
- must_get_* Host Functions — Deterministically retrieving DHT data for use in validation
- DHT operations — advanced details on the underlying data structure used in DHT replication and validation

Validation gives shape to your DNA’s data model. It defines the ‘rules of the game’ for a network — who can create, modify, or delete data, and what that data should and shouldn’t look like. It’s also the basis for Holochain’s peer-auditing security model.

You implement your validation logic in your application’s integrity zomes. A DNA uses validation logic in two ways:

By an author of data, to protect them from publishing invalid data, and
By an agent that’s received data to store and serve, to equip them to detect invalid data and take action against the author.

Because every peer has the DNA’s validation logic on their own machine and is expected to check the data they author before they publish it, invalid data is treated as an intentionally malicious act.

Info

Currently Holochain can inform agents about invalid data when asked. In the future it’ll also take automatic defensive action by putting a malicious author into an agent’s network block list when they see evidence of invalid data.

There are two callbacks that implement validation logic:

validate is the core of the zome’s validation logic. It receives a DHT operation, which is a request to transform the data at an address, and returns a success/failure/indeterminate result.
genesis_self_check ‘pre-validates’ an agent’s own membrane proof before trying to connect to peers in the network.

Design considerations

Validation is a broad topic, so we won’t go into detail here. There are a few basic things to keep in mind though:

The structure of the Op type that a validate callback receives is complex and deeply nested, and it’s best to let the scaffolding tool generate the callback for you. It generates stub functions that let you think in terms of actions rather than operations, which is more natural and good enough for most needs. Read all about DHT operations if you want deep detail.
Entry data, link tags, and membrane proofs are just blobs; they need to be parsed in order to check that they have the correct structure. (The HDK makes it easy to deserialize an entry blob into a Rust type though.)
While an entry or link can be thought of as ‘things’, the actions that create, update, or delete them are verbs. Validating a whole action lets you not just check the content and structure of your things, but also enforce write privileges and even throttle an agent’s frequency of writes by looking at the action’s place in their source chain.
Validation rules must always yield the same true/false outcome for a given operation regardless of who is validating them and when. Don’t use any source of non-determinism, such as instantiating and comparing two std::time::Instants. In fact Holochain prevents your validation callbacks from calling any non-deterministic host functions. Read more about the available host functions.
Data may have dependencies that affect validation outcomes, but those dependencies must be addressable, they must be retrievable from the same DHT, and their addresses must be known. If a dependency can’t be retrieved at validation time, the validate callback terminates early with an indeterminate result, which will cause Holochain to try again later. (Note that an action already has a dependency on the action preceding it on an agent’s source chain.)
Even though multiple actions can be written within an atomic transaction, they are not validated together as an atomic transaction. An action can only have dependencies on prior actions in a source chain, not subsequent actions.
You don’t need to validate your data manually before committing — Holochain validates it after the zome function that writes it is finished, and returns any validation failure to the caller.
Test, test, test. Validation is the gate that accepts or rejects all DHT data, so make sure you write thorough test coverage for your validation functions. If the data being validated has no dependencies on DHT data or DNA/zome info, we recommend writing Rust unit tests for the validation function stubs that the scaffolding tool generates. We also recommend testing your validation code by writing single- and multiple-agent Tryorama test scenarios for zome functions that write data. This lets you check that your validation rules pass both when authoring data and checking data authored by other agents. (We’ll write about Tryorama soon; in the meantime, you can check the Tryorama GitHub readme and the scaffolded tests in a project’s tests/src/ folder for guidance).

Things you don’t need to worry about

For dependency trees that might get complex and costly to retrieve, you can use inductive validation rather than having to retrieve and validate all the dependencies.
Action timestamps, sequence indices, and authorship are automatically checked for consistency against the previous action in the author’s source chain.
Data is checked against Holochain’s maximum size (4 MB for entries, 1 KB for link tags).
The entry type of Update actions is checked against the data they replace.
The scaffolding tool generates a sensible default validate callback that does these things for you:
- Tries to deserialize an entry into the correct Rust type, and returns a validation failure if it fails.
- Checks that the original entry for an Update or Delete action exists and is a valid entry creation action.
- Checks that the original entry for an Update contains the same entry type.
- Checks that the original entry for a Delete comes from the same integrity zome.
- Checks that the action that registers the agent’s public key is directly preceded by an AgentValidationPkg action.
- Checks that most-recent update links and collection links point to valid entry creation records.
- Tries to fetch data dependencies from the DHT and make sure they’re the right type.

Available host functions

As mentioned, any host functions that introduce non-determinism can’t be called from genesis_self_check or validate — Holochain will return an error. That includes functions that create data or check the current time or agent info, of course, but it also includes certain functions that retrieve DHT or source chain data. That’s because this data can change over time.

These functions are available to both validate and genesis_self_check:

validate can also call these deterministic DHT retrieval functions:

You can read about them on the must_get_* Host Functions page.