First Principles

The infrastructure industry has spent a decade building workarounds on broken foundations. We're done with workarounds.

A distributed systems problem masquerading as file storage

Terraform state is a coordination problem. Multiple actors (engineers, CI systems, drift detection) need to read and modify overlapping subsets of infrastructure state concurrently. This is a well-studied problem in distributed systems, with established solutions around fine-grained locking, multi-version concurrency control, and transaction isolation.

Instead, we got a global mutex on a JSON file.

The mismatch between granularity of operation and granularity of locking is the root cause of every Terraform scaling problem. It violates a fundamental principle of concurrent systems: non-overlapping operations should not block each other. You're modifying twelve resources but locking 2,847.

The standard response, splitting state files, doesn't solve the problem. It redistributes it. Now you have N coordination problems instead of one, plus the complexity of managing cross-state dependencies. You've traded false contention for distributed transaction coordination.

File-based vs Graph-based State

	File-based	Graph-based
Lock scope	Entire state file	Affected subgraph only
Refresh	All resources	Changed resources
Concurrency	One operation at a time	Parallel on disjoint sets
Plan time (2,847 resources)	30 min	2 sec
Query support	Parse JSON	SQL
Dependencies	Opaque	First-class edges

Infrastructure is a graph. Store it as a graph.

Infrastructure state is inherently a directed graph. Resources have dependencies. Dependencies form edges. Changes propagate along those edges. Terraform already knows this. The internal representation is a graph, and the planner performs graph traversal.

But at the storage layer, we flatten this structure into a blob.

This is like storing a B-tree in a CSV file. You can do it, but you destroy the properties that make the data structure useful. Plans read the entire file because file-based storage offers no alternative. Refreshes query everything because the state file doesn't know what you're about to change. The lock is global because the file is the unit of atomicity.

When state is properly normalized into a graph database, the properties emerge naturally. Subgraph isolation means operations on disjoint subgraphs are inherently parallelizable. Precise locking means you acquire locks on resources and dependencies, not the entire state. Incremental refresh means you compute the minimal refresh set by traversing the dependency graph.

Apply forty years of database engineering

The distributed systems community solved these problems decades ago. Multi-version concurrency control allows readers to proceed without blocking writers. Write-ahead logging provides durability without sacrificing performance. Transaction isolation levels let operators choose their consistency guarantees. Row-level locking enables concurrent modification of non-overlapping data.

None of this is novel. PostgreSQL has done it for thirty years. Yet somehow the infrastructure industry decided that a JSON file with a mutex was acceptable for managing production systems.

We implement MVCC at the Terraform state layer. Each operation acquires locks only on its subgraph. The lock manager uses the dependency graph to ensure consistent ordering. Readers use snapshots without blocking writers. Three teams can run three transactions with zero contention on disjoint resource sets.

Kubernetes proved the control plane pattern works

Kubernetes controllers continuously reconcile cluster state with desired configuration. They retry until they succeed. They watch for drift. They handle this at massive scale. Infrastructure needs the same thing: state that reconciles automatically, operations that retry instead of fail, drift that gets fixed instead of reported to a Slack channel nobody monitors.

You can't build continuous reconciliation on top of a flat file that locks globally. You can't parallelize operations when everything shares the same lock. You can't query relationships when state is opaque JSON.

The state system has to change first. Fix the storage primitives, and you unlock reconciliation. Fix reconciliation, and you can react to events instead of polling.

Correctness isn't optional

We're building infrastructure that manages other people's infrastructure. State corruption can't be "rare." It has to be impossible.

We chose OCaml because its type system catches entire categories of bugs at compile time that tests miss. Strongly-typed data structures catch field errors before deployment. Type-safe SQL queries prevent schema drift before it reaches production. Immutability by default eliminates race conditions. When you add a new error case to the system, the compiler tells you every place you aren't handling it.

This isn't academic type theory. Production systems that absolutely cannot fail choose languages where certain failures are impossible, not just unlikely.

Fix the foundations

The Terraform ecosystem has built an impressive tower of workarounds. Terragrunt is the poster child. A wrapper that exists solely to compensate for Terraform's inability to handle basic patterns like DRY configuration and cross-stack dependencies. It papers over state fragmentation with more fragmentation. It adds a templating layer because the underlying tool can't express what you need. It's duct tape on a broken foundation, and somehow it became best practice.

The rest of the ecosystem follows the same pattern. Elaborate orchestration to work around the fact that the storage layer can't support concurrent operations. State splitting strategies. External locking mechanisms. Dependency graphs rebuilt in YAML because Terraform lost them when it flattened state to JSON.

These aren't solutions. They're evidence that we're solving the wrong problem.

We're not building another wrapper. We're not adding another layer of abstraction on a broken foundation. We're replacing the storage primitives that everything else depends on. Graph-native state. Resource-level locking. Subgraph operations. MVCC concurrency. Queryable infrastructure.

This isn't revolutionary. It's the application of established distributed systems principles to a problem that's been mischaracterized since its inception.

The infrastructure industry has accepted file-based state as an immutable constraint for too long.

It's not. It's a choice.
And it's the wrong one.

See our pricing Explore the platform