← Back to Blog RSS

Engineering Log: Cost, surfaced inside the plan

Engineering Log Product Updates Stategraph Demo Cost

Most cost tools show you an estimate in a dashboard you have to remember to open. By the time you're looking at it, you've usually already applied the change. This demo was the first public look at Stategraph Cost, which puts the cost of a change in the plan you're already reviewing, and tracks what your infrastructure actually costs once the bill comes in.

engineering-log-demo-day-three.tldr
$ cat engineering-log-demo-day-three.tldr
• Stategraph Cost shows estimated and actual spend in the terraform plan output
• Per-state budgets with guardrails that warn or block an apply
• Cost and infrastructure data available over JSON APIs, Terraform-schema compatible
• Stategraph Cloud launched as a managed option alongside self-hosted

Demo day three

Last demo day was about making plan and apply fast. This one is about cost. Once you're running a decent amount of infrastructure, someone eventually asks what it's all costing and which team is responsible for which part of the bill. We spent the time since the last demo building that into Stategraph, and putting it in the plan output where you'd actually see it.

Here's the recording.

Cost in the plan

We started in the UI. There's an overview of spend across all your states, and you can drill into a single state, then a resource, then a specific instance. It's the same structured data that makes the rest of Stategraph queryable, so cost works the same way - you can slice it however you slice everything else.

Stategraph tracks two numbers. One is the estimate, calculated from the plan before you apply. The other is the actual spend, which we reconcile against your cloud provider's billing APIs at the end of the month. The estimate tells you roughly what a change should add. The actuals tell you what you're really paying, which is usually the number people argue about.

The part we spent the most time on was where the cost shows up. It's in the terraform plan output, next to the resource changes - current cost versus planned, inline. Same whether you run the plan on your laptop or in CI. You see what the change does and what it costs in the same place, before you approve it.

If it's not in the plan, you won't look

A cost number in a separate dashboard only gets checked when someone remembers to check it, which in practice is hardly ever. Putting current-versus-planned cost in the plan output means you see it during the review you were already doing, on every change, not just the ones someone thought to double-check.

Budgets that can stop an apply

Seeing the cost is half of it. The other half is doing something when it's wrong. You can set a budget on a state and attach a guardrail that fires when the budget is exceeded. The guardrail either warns the person running the change or blocks the apply. You pick which.

Blocking is the useful case. A warning is easy to click past. Blocking an apply is enforcement, and because every operation in Stategraph goes through the server, that's a real place to enforce it rather than a rule everyone is supposed to keep in their head. The budget and the thing that acts on it sit on the same path as the apply.

Cost and infrastructure data are also available over the API as JSON, so you can pull it into whatever finance or reporting tooling you already run, or build your own alerts. We'd rather be the structured data your tools sit on top of than try to be the one dashboard you check.

Stategraph Cloud

We also launched Stategraph Cloud, a hosted option for teams that don't want to run the backend themselves. It's the same Stategraph, just managed by us - a deployment choice, not a separate product.

One thing we said in the demo and want to repeat: if you need tight control over secrets, stay self-hosted. Terraform state is full of sensitive values, and for a lot of teams that alone settles where it should run. Stategraph Cloud is there to save you the operational work if you're fine with that tradeoff. We're not steering anyone off self-hosting - both are fully supported.

We showed cross-state transactions again too, now with cost attached. Change a value in one root module and Stategraph follows the terraform_remote_state references already in your code, pulls the dependent states into the same plan, and shows the full blast radius - including what it does to cost - in one operation. You don't have to plan each module separately and add up the numbers yourself.

The questions

The Q&A is usually the most useful part, and this time it was mostly about cost governance and how cost lines up with the parts of Stategraph we'd already shipped.

Someone asked whether budgets are native to Stategraph or tied to cloud provider budgets. They're native right now. Keeping them in Stategraph is what lets a single budget cover spend across more than one cloud, which a provider's own budget can't do. We might integrate with cloud provider budgets later, but multi-cloud is why they live here for now.

Another question was about the JSON output - whether it's some custom schema. It isn't. We stay compatible with the official Terraform schema. There's extra data available through the CLI and API, but we didn't invent a separate format you'd have to teach your other tools to read.

A trickier one: declarative infrastructure doesn't always match how cloud APIs actually manage resource lifecycles, so why not rewrite the engine to deal with that? Because that rewrite costs you a lot. We've stayed backward compatible with Terraform on purpose. We'd rather keep the switch to Stategraph small than make the engine cleaner at the price of everyone relearning how their infrastructure behaves.

Last one was about drift. How do you detect it without everything getting slow again? Same answer as plan and apply: Stategraph only refreshes the subgraph that's relevant to a change instead of the whole state. If you want full coverage, run a complete drift check as its own scheduled job, and there are escape hatches to force specific resources into a run when you need to.

What's next

Two things we previewed. The first is sessions - generating scoped tokens that let a person, an AI agent, or another machine operate on specific states or resources, with limits on what they can touch. You can hand out a token that's allowed to plan but not apply, for example. As more of this gets driven by agents, being able to bound exactly what one can do matters more.

The second is Terragrunt support. The goal is to make switching cheap - keep your existing setup working instead of making you rewrite it, the same way we've handled Terraform.

Same loop as always: show the work, listen to the questions, build what they point at. This time the questions were about controlling cost and trusting drift detection. We'll show where that goes next demo day.

Follow along as we build Stategraph

This is the latest demo day in an ongoing series. We're building Stategraph in the open, sharing progress, technical decisions, and the engineering challenges along the way. If you want to follow the journey or get involved as a design partner, subscribe for updates.