← Back to Blog RSS

AWS Cost Optimization with Terraform: A Practical Guide

Terraform Cost FinOps AWS

Most AWS cost problems are not billing problems, they are infrastructure decisions made without cost context. Terraform gives teams the ability to encode cost-saving practices directly into the configurations they already review, version, and deploy.

TL;DR
$ cat aws-cost-optimization-terraform.tldr
• Cloud waste is widespread: the 2025 Flexera State of the Cloud Report found 27% of cloud spend is estimated to be wasted, unchanged from the prior two years.
• Terraform addresses the root cause of common cost drivers by making cost-saving practices repeatable and reviewable, not reactive.
• The most effective strategies all live naturally in HCL: right-sizing, lifecycle policies, tagging, and purchase commitments.
• Cost estimation embedded in the Terraform plan workflow closes the loop between infrastructure decisions and their financial impact.

AWS cloud spend is growing. What is less often acknowledged is how much of that spending is structurally preventable. Not through aggressive cutbacks or architectural overhauls, but through the ordinary infrastructure decisions that happen every time an engineer writes a resource block and submits a pull request.

The Flexera 2026 State of the Cloud Report found that 29% of cloud spend is estimated to be wasted, up from 27% in 2025.

This data reflects that organizations are aware of and actively trying to solve the problem, but still losing over a quarter of their cloud budget to waste.

The main challenge companies face is that most AWS cost optimization happens after the fact: in billing consoles, monthly FinOps reviews, and retrospectives triggered by a bill that came in higher than expected.

By that point, the infrastructure changes that caused the spend have already been applied, context has dissolved into merged pull requests, and the engineers who made those decisions have moved on. Reacting to a cloud bill is expensive. Shaping infrastructure decisions before they land is cheap.

Terraform is the layer where those decisions actually happen. When instance types, storage classes, lifecycle policies, and tagging standards are encoded in HCL, they pass through the same review process as any other code change. Cost-impacting decisions become visible, discussable, and revisable before they are applied.

This guide covers the practical strategies that make that possible, and closes with the cost visibility layer that makes those strategies stick.

Why AWS costs grow without intervention

Over-provisioning is the most common culprit, and the most forgivable one. When an engineer provisions compute resources for the first time, picking a larger instance type feels like responsible engineering: better safe than sorry, plenty of headroom, easy to scale down later.

The problem is that "later" almost never arrives. Idle capacity at off-peak hours (e.g., a staging environment running an m6i.large overnight because no one owns the decision to downsize it) accumulates into meaningful AWS spend across a fleet of environments.

Then there are orphaned resources, like detached EBS volumes, RDS snapshots beyond any reasonable retention window, S3 objects accumulating in Standard storage long after they stopped being actively accessed, and dev environments left running through weekends. None of these appear in any alert, and none of them show up in a code review. They just grow, quietly, on the AWS bill.

Usage-based cost growth compounds the problem. Storage request volume, data transfer costs across availability zones and regions, and Lambda invocations tied to a growing user base all expand alongside traffic, which is expected, but without any mechanism to notice when growth is outpacing its value.

And then there’s the underlying cause: lack of visibility. Engineers reviewing a Terraform diff see a resource declaration, not a dollar amount. There is no clear cost at exactly the moment when the decision is cheapest to change.

How Terraform helps with AWS cost optimization

Terraform does not reduce the cost of AWS services on its own, but it does create the conditions under which cost-saving decisions become repeatable, reviewable, and enforceable.

When instance types, storage classes, and lifecycle rules live in HCL rather than in someone's manual configuration, cost decisions go through a pull request. As a result, they can be questioned, adjusted, and approved before they reach production.

A reviewer who notices that a staging environment is configured with the same instance size as production can flag it in the diff, not six weeks later in a cloud bill. Infrastructure as code turns cost decisions into policy decisions, whether or not anyone explicitly frames them that way.

Configuration drift is a related problem.

Manually resized instances or ad hoc resources created outside Terraform, which seemed reasonable in the moment, fall outside the visibility of any automated review. Managing infrastructure consistently with Terraform means that cost-impacting changes leave a paper trail, while the state file reflects what is actually running.

Design Principle

Shared modules are the best place to set cost-conscious defaults. Think of a module as a template: the engineering team builds it once, with sensible instance sizes, storage choices, and tagging baked in, and every new service that spins up starts from that template automatically. Good defaults propagate across the whole infrastructure without anyone having to remember to apply them.

Right-sizing EC2 instances in Terraform

Right-sizing is the practice of matching instance type to actual workload requirements, rather than provisioning for a theoretical peak that may never materialize. It is one of the highest-impact levers for reducing AWS cloud costs, and one that Terraform is well-positioned to enforce.

Burstable instance families (the t3 and t4g families in particular) are appropriate for workloads that do not need sustained CPU. Web servers handling moderate traffic, background workers with uneven load patterns, and developer tooling that sits idle most of the time are all good candidates.

AWS Compute Optimizer surfaces specific right-sizing recommendations by analyzing actual utilization data, and its suggestions should drive Terraform changes rather than intuition.

The most common right-sizing failure is staging environments that mirror production instance types. There’s rarely a workload justification for this. A variable-based approach to instance type selection, set per environment, prevents the pattern from recurring:

variable "environment" {
type = string
}
locals {
instance_type = {
production = "m6i.large"
staging = "t3.small"
dev = "t3.micro"
}
}
resource "aws_instance" "app" {
ami = var.ami_id
instance_type = local.instance_type[var.environment]
# ...
}

With this pattern, the decision about what staging and dev environments are allowed to run is encoded once, in a shared module, and applied consistently across every deployment. Engineers who want to deviate from the standard have to make the case in a pull request, a barrier that can reduce costs.

The aws_ce_rightsizing_recommendation data source can also be referenced in Terraform configurations to surface Compute Optimizer's suggestions programmatically, giving teams a Terraform-native way to track alignment between their configurations and AWS's own assessment of what those workloads actually require.

Auto Scaling and Spot Instances

Provisioning for peak capacity and leaving it running is the most direct way to accumulate idle compute costs. Auto Scaling and Spot Instances address different parts of that problem.

Auto Scaling, managed through aws_autoscaling_group in Terraform, matches capacity to demand rather than provisioning a fixed fleet sized for the worst case.

At off-peak hours, such as overnight, on weekends, or during seasonal lulls, the group scales down, and you pay for what you use. Idle capacity at off-peak hours is pure waste. Auto Scaling makes it unnecessary.

Spot Instances go further. AWS makes unused EC2 capacity available at up to 90% off on-demand pricing, in exchange for the possibility of a two-minute interruption notice when AWS needs the capacity back.

For stateless, fault-tolerant workloads (batch processing, CI/CD runners, data transformation pipelines, non-critical background workers), the benefits are clear. However, for stateful workloads or anything that cannot tolerate interruption, Spot is not appropriate.

The mixed_instances_policy block in aws_autoscaling_group allows teams to blend on-demand baseline capacity with Spot for variable workloads, capturing most of the savings while preserving reliability where it matters:

variable "subnet_ids" {
type = list(string)
}
# Requires an aws_launch_template resource defined elsewhere in your configuration
resource "aws_autoscaling_group" "app" {
name = "app-asg"
vpc_zone_identifier = var.subnet_ids
min_size = 1
max_size = 10
desired_capacity = 2
mixed_instances_policy {
instances_distribution {
on_demand_base_capacity = 1
on_demand_percentage_above_base_capacity = 20
spot_allocation_strategy = "capacity-optimized"
}
launch_template {
launch_template_specification {
launch_template_id = aws_launch_template.app.id
version = "$Latest"
}
override {
instance_type = "t3.medium"
}
override {
instance_type = "t3a.medium"
}
}
}
}

One on-demand instance is maintained as a baseline; everything above that is satisfied by a mix of on-demand and Spot, weighted toward Spot.

The capacity-optimized strategy picks whichever pool of Spot capacity has the most availability, which means your instances are less likely to get interrupted.

Storage cost optimization with lifecycle policies

Storage costs accumulate gradually, in the background, with no single change large enough to trigger an alert. By the time someone notices the line item, months of unnecessary spending have already landed on the bill.

Terraform provides direct mechanisms to prevent this.

S3 is the most common surface. Amazon Simple Storage Service charges Standard storage rates for infrequently accessed data that was never transitioned to a cheaper storage tier, not because teams made a deliberate decision to pay more, but because no one configured a lifecycle rule when the bucket was created, and no one owns the storage configuration afterwards.

aws_s3_bucket_lifecycle_configuration makes that configuration explicit and reviewable:

resource "aws_s3_bucket_lifecycle_configuration" "example" {
bucket = aws_s3_bucket.example.id
rule {
id = "transition-and-expire"
status = "Enabled"
filter {}
transition {
days = 30
storage_class = "STANDARD_IA"
}
expiration {
days = 90
}
}
}

Objects transition to S3 Standard-IA at 30 days and expire at 90. The specific thresholds should reflect actual data access patterns.

The example above is a reasonable starting point for logs or build artifacts, less so for objects that will be queried again at 60 days. The important thing is that the policy exists at all, and that it is subject to review.

Orphaned EBS volumes are a related problem. When an EC2 instance is terminated and its EBS volume is not, the volume continues to accrue charges with no workload justifying the cost.

For EBS volumes attached outside of Terraform, created manually or by other tooling, there is no lifecycle management unless they are brought under Terraform control. For volumes managed within aws_instance, the delete_on_termination argument on the root_block_device block defaults to true, but for additional ebs_block_device volumes, the behavior should be explicitly set per workload rather than assumed.

RDS snapshot retention is another place where costs quietly pile up.

The backup_retention_period argument on aws_db_instance accepts a value in days, and many teams either leave it at the AWS default or set it high without a corresponding cleanup policy.

A cost-appropriate retention period (long enough for meaningful recovery windows, short enough to avoid paying for snapshots that serve no recovery purpose) should be in the Terraform configuration and reviewed alongside the rest of the database resource.

Tagging for cost allocation

Without reliable tags, AWS Cost Explorer and cost allocation reports cannot tell you which team, product, or environment is responsible for a given line item. Without that attribution, cost optimization is guesswork.

Terraform enforces tagging at the point of deployment through two complementary mechanisms. The default_tags block in the AWS provider applies a base set of tags to every resource in the configuration without requiring repetition on each individual resource:

provider "aws" {
region = "us-east-1"
default_tags {
tags = {
Environment = var.environment
Team = var.team
CostCenter = var.cost_center
}
}
}

Modules can extend this by requiring tags as input variables, and teams can write validation rules directly in Terraform to reject configurations missing required tags, turning a social convention into a hard constraint.

Tagged resources feed into AWS Cost Explorer and AWS Cost Allocation Tags, enabling spend to be broken down by team, environment, or component. Tags set in Terraform survive configuration drift, because they are re-applied on every plan and apply.

Reserved Instances and Savings Plans

For teams with predictable, stable workloads, commitment-based pricing is the single largest lever for reducing AWS costs and finding cost optimization opportunities.

Standard Reserved Instances offer up to 72% off on-demand pricing, and EC2 Instance Savings Plans offer a comparable discount with more flexibility around instance size and operating system within a given family.

Compute Savings Plans offer up to 66% off on-demand rates with the broadest flexibility, covering EC2, Fargate, and Lambda usage across regions and families.

Reserved Instances and Savings Plans are billing constructs, not Terraform resources. The purchase happens outside Terraform, in the AWS Console or through the API, and it covers usage rather than creating infrastructure.

Terraform needs to be aligned. If a team purchases Reserved Instances for m6i.large in us-east-1 and the Terraform configuration deploys m6i.xlarge instances, the reservation does not apply, and the discount is lost.

The two models differ in flexibility. Reserved Instances commit to a specific instance type and region for a one-year or three-year term. Savings Plans commit to a dollar amount of hourly usage, which gives teams more room to change configurations without forfeiting the discount.

Two principles apply before purchasing either:

Right-size the configuration first

Committing to the wrong instance type locks in a discount against something the workload does not actually need.

Consider timing

Committing for one or three years is not appropriate for workloads still being resized, services approaching decommission, or infrastructure whose future shape is uncertain.

The aws_ce_rightsizing_recommendation data source provides a programmatic way to validate alignment between the current configuration and what AWS's own analysis suggests those workloads require.

Bringing cost visibility into the Terraform workflow

Every strategy covered in this guide shares the same problem: Not one of them tells an engineer, at the point of reviewing a plan, what the proposed change will actually cost.

This is the shift-left principle applied to cost. Security teams learned a version of this lesson some time ago: catching a misconfiguration in a pull request is orders of magnitude cheaper than discovering it after a breach.

Cost works the same way. When an engineer can see that a proposed change increases the monthly spend by a specific amount, that information shapes the review, and sometimes the decision.

Stategraph Cost closes that gap without requiring a separate integration. Cost estimation is built directly into Stategraph's Terraform plan output, and there is no additional CLI tool, no separate CI/CD step, and no additional configuration to maintain. The estimate appears in the same output that the engineer is already reviewing, loading in under a second, adding no meaningful friction to the workflow.

After a change is applied, Stategraph compares the original estimate against actual costs using daily refreshes from cloud providers. That actuals-versus-estimates comparison is something most standalone cost estimation tools do not offer, transforming cost estimation from a one-time signal into a feedback loop.

Stategraph Cost provides directional estimates at plan, not exact figures. Like all cost estimation products, it cannot account for enterprise pricing, reserved instance discounts, or usage-based variables that the Terraform configuration does not know at plan time.

For a full comparison of Terraform cost estimation tools – including Infracost, Terracost, and HCP Terraform's native feature – see the Terraform cost estimation guide. The value of in-workflow visibility is the signal, not the precision.

Stategraph Cost is included in both the free and starter tiers. The Stategraph docs cover how cost estimation fits into the broader plan workflow, while you can read more about how Stategraph works on the website.

AWS cost optimization starts in the plan

AWS cost optimization is a set of practices that are effective only when they are codified in Terraform and subject to the same review process as every other infrastructure decision.

The cloud bill is not the right place to notice a cost problem. By the time the bill arrives, the decisions that caused it are weeks or months old, the context has disappeared, and the only option is remediation rather than prevention.

The plan is where the leverage is to get significant cost savings. Engineers reviewing a Terraform diff can shape infrastructure decisions before they are applied, provided they have the cost signal they need to do so.

Stategraph gives teams cost estimates at the place where they’ll have the biggest impact cost optimization opportunities: before the apply. Start for free, book a demo, or explore the Stategraph docs to see how cost visibility fits into your Terraform workflow.

AWS cost optimization FAQs

What is AWS cost optimization?

AWS cost optimization is the ongoing practice of reducing unnecessary cloud spend while maintaining the performance and reliability that workloads require.

It spans a range of strategies, from right-sizing compute instances and applying storage lifecycle policies to purchasing Reserved Instances or Savings Plans for predictable workloads, and is most effective when built into the engineering workflow rather than treated as a periodic audit.

Tools like AWS Cost Explorer and the AWS Cost Optimization Hub provide visibility into spending patterns and surface recommendations, while approaches like tagging and infrastructure-as-code give teams the attribution and review mechanisms to act on them and improve cloud cost management.

How do I tag resources for cost allocation in Terraform?

The most reliable approach is the default_tags block in the AWS provider configuration, which applies a base set of tags to every resource managed by that provider without requiring repetition on each individual resource.

Modules can reinforce this by accepting tags as required input variables, and validation rules within Terraform can reject configurations that do not meet tagging standards.

Tagged resources feed into AWS Cost Explorer and AWS Cost Allocation Tags, which makes it possible to break down cloud spend by team, environment, or component, and understand which workloads are driving which cloud costs.

Can Terraform estimate AWS costs before deployment?

The open-source Terraform CLI does not include native cost estimation. Several tools address this gap:

All of these provide directional estimates rather than exact figures, because Terraform configurations cannot know at plan time what usage patterns, enterprise discounts, or reserved instance arrangements will apply once resources are deployed.