Engineering Log: Stategraph 2.0 and the configurations we used to give up on
A major version is a promise about the configurations you no longer have to worry about. 1.0 was about trusting the results we already produced. 2.0 is about producing them for the configurations we used to give up on.
Modules we used to give up on
The most common shape of a production Terraform codebase is one module called many times. An eks-cluster module instantiated once per environment. A vpc module called with for_each over a map of regions. A service module fanned out with count across replicas.
Resolving that shape into a dependency graph means evaluating the loop. Not just acknowledging that a module is called with for_each, but expanding the call into the actual set of resources it produces, with the actual addresses Terraform will use, so the rest of the graph can connect to them.
Before 2.0, certain combinations of for_each and count inside module blocks were resolved partially. The graph still produced correct results for the resources it could see, but resources hidden behind a loop Stategraph could not unroll were dependencies it could not trace. 2.0 closes that gap. Module loops are evaluated, the resources behind them are part of the graph, and the blast radius of a change reflects them.
Design Principle
A dependency that exists in Terraform but not in our graph is worse than a missing feature. It produces a confident answer that happens to be wrong. Closing those gaps is the work.
HCL parsing that does not flinch
HCL has a permissive grammar and a long tail of edge cases. Heredocs with embedded JSON. Parenthesized quoted keys. Multiline expressions with comments interleaved. Each one is a small thing on its own and a real configuration somewhere on GitHub.
We test the parser against a large corpus of public Terraform configurations. Every file that fails to parse is a bug, whether or not anyone has hit it yet. 2.0 closes a meaningful chunk of that tail. Configurations that previously produced parse errors or subtly wrong ASTs now round-trip cleanly.
If you have ever had an infrastructure tool tell you your perfectly valid HCL was invalid, you know how much trust this costs. 2.0 is the version where we stopped accepting it.
Observation
Parser correctness is not a feature. It is a precondition. A graph built on a parser that occasionally lies is a graph that occasionally lies.
A faster core
Stategraph runs on every plan. That puts a budget on how long the core can take to build a graph, evaluate it, and answer questions about it. The budget gets tighter as customer states get larger.
2.0 spent a release cycle on the core. Hot paths got faster. Allocation in the inner loops came down. Large state graphs that used to take a noticeable pause now feel immediate. We are not going to put numbers on this until we have benchmarks we are willing to defend, but the difference is real and you can feel it on big states.
Implementation Detail
Performance work compounds. A core that is twice as fast does not just feel faster. It changes what you are willing to do on top of it. Whole-graph queries that used to be too expensive become routine.
What's next
2.0 is the version where Stategraph stops apologizing for the configurations it cannot handle. The release cadence stays fast. 2.0.1 shipped two days after 2.0.0, and the next set of changes is already in flight.
If you manage Terraform at scale and want a dependency graph that reflects what you actually deployed, including the modules and loops, request access and try it.