Most production infrastructure starts as console clicks. An engineer needed to ship something, the AWS console was open, the buttons existed, and an hour later the service was running on resources nobody had named or configured deliberately. The setup worked. It also had a security group that allowed 0.0.0.0/0 because that was the option that made the test work, and an IAM role with a wildcard policy because the engineer was tired of guessing scopes, and a VPC structure that made sense for the moment but did not anticipate the second service that would join it three months later. Each click was small. The cumulative result is the cloud account that nobody fully understands and that everyone is afraid to touch.
Infrastructure as code fixes the long-term problem by making the infrastructure reviewable, reproducible, and version-controlled. The fix only works when the IaC is written correctly: a Terraform file with the same wildcard IAM policy is no better than the console click that produced it. Mainstream AI tools generate IaC the same way they generate any other code: from the prompt, with reasonable defaults that are usually too permissive, and without a coherent picture of how the resources fit together. The /forge-infra skill exists to write IaC the way a senior infrastructure engineer would: with the network designed before the resources, the IAM scoped to the actual access patterns, and the modules structured so the second service can join without rewriting the first.
Why generalist AI generates fragile IaC
Ask Cursor or ChatGPT for a Terraform file that runs a Postgres database in AWS. You get an aws_db_instance resource. The instance is in a default VPC. The security group is permissive. The parameter group is the default. Storage is gp2, instance class is db.t3.medium, no read replica, no automated backup window, no maintenance window, and no encryption-at-rest configuration. The resource technically works. It is also the database that turns into the incident report the first time something goes wrong, and the cost report when the next person realizes the team has been paying for the default everything. The output looks correct in isolation; it is not correct in the context of a production environment, and a generalist tool cannot tell the difference.
The deeper issue is that infrastructure is composed. A database needs a VPC, the VPC needs subnets, the subnets need a route table, the route table needs a NAT gateway if the database needs internet access, the NAT gateway needs an Elastic IP. Each resource depends on the others, and each resource has options that depend on the surrounding context. A generalist tool can produce any one of these resources, but it cannot design the system. The result is a Terraform file that compiles and applies but produces an environment that is not actually correct: too permissive, too expensive, or too brittle to operate.
What production-grade IaC actually requires
A production-grade IaC layout has six things. First, a clear network topology: VPCs, subnets (public and private separated), route tables, NAT gateways, and the security groups that move traffic between them. Second, IAM scoped to the actual access patterns: a role per service, a policy per role, the policy listing the specific resources it can act on. Third, encryption at rest by default: EBS volumes, RDS storage, S3 buckets, all encrypted with KMS keys. Fourth, backups and snapshots configured: RDS automated backups with the right retention, S3 versioning where it matters, snapshot lifecycle for EBS. Fifth, observability hooks: CloudWatch alarms on the metrics that matter, logs flowing to a central place, traces if the project uses them. Sixth, the module structure that lets future services slot in: a network module, a database module, a service module, all reusable rather than copy-pasted.
Each of these layers has trade-offs. Strict IAM is the right default and adds friction during initial setup; the trade-off is justified by the security gain. Backups have a cost; the cost is justified by the rare bad day. Module structure feels like over-engineering on day one and pays off the first time a second service joins the account. The discipline of getting these right upfront is the discipline that distinguishes infrastructure that will scale from infrastructure that will be migrated again in eighteen months.
How /forge-infra works
Step one: detect provider and existing infra
Before writing any IaC, /forge-infra reads the project to detect the cloud provider (AWS, GCP, Azure, or multi-cloud), the IaC tool the project uses (Terraform, Pulumi, AWS CDK, Pulumi, OpenTofu), and any existing infrastructure already defined. The detection drives the output: a Terraform project gets Terraform, a Pulumi project gets Pulumi, and the new resources are added to the existing module structure rather than created in isolation.
Step two: design the network first
Network is the foundation of every other resource, so it is designed first. The skill produces VPC and subnet structure calibrated to the use case: public subnets for the load balancer, private subnets for application instances, isolated subnets for the database. NAT gateways are sized to the expected traffic, route tables are configured deliberately, security groups are scoped to specific protocols and ports between specific groups. The network is reusable: subsequent services join the same VPC by importing the network module.
Step three: compute, storage, and IAM
On top of the network, /forge-infra produces the compute, storage, and IAM. Compute is right-sized to the expected load with autoscaling configured. Storage uses the right service for the workload (RDS for relational, DynamoDB or Firestore for key-value, S3 for object). IAM is per-service: each service gets its own role, the role has a policy that grants only the actions on the specific resources the service needs. Wildcards (Resource: *, Action: s3:*) are flagged and require explicit override.
Step four: observability and lifecycle
The infrastructure ships with the observability hooks already wired: CloudWatch alarms on disk space, CPU, memory, and the application metrics the service exposes. Log groups are created with retention policies. Backups are configured per resource with the right retention. Tags are applied consistently for cost attribution. The lifecycle policies (S3 transitions, EBS snapshot retention) are part of the IaC, not added later by hand.
Default IAM policies in AWS examples are usually Resource: * because that is what tutorials need to work for the reader. /forge-infra refuses wildcard resources by default and requires explicit override with a comment explaining the operational reason.
Tonone's /forge-infra skill produces production-grade Infrastructure as Code with the network, IAM, encryption, backups, and observability done correctly from the start.
When to use /forge-infra, and when not to
/forge-infra is the right call when provisioning infrastructure for a new service or product, when an existing service was provisioned manually and needs to be codified, or when starting a new project that needs cloud resources from scratch. The signal is when the team is about to click in the cloud console, or when an existing infrastructure needs to be brought under version control.
Skip the skill for trivial single-resource additions to an existing IaC project (use a regular IaC edit). For runtime infrastructure issues (cold starts, scaling, networking failures), /forge-diagnose is the right call. For cost optimization on existing infrastructure, /forge-cost is calibrated to that work.
| Capability | Tonone | Generalist chatbot | Cursor / Copilot |
|---|---|---|---|
| Detects provider and existing IaC | Yes, AWS/GCP/Azure, Terraform/Pulumi/CDK | Asks user or guesses | Suggests within current line |
| Network designed before resources | Yes, VPC/subnets/route tables first | Resources in default VPC | Not in scope |
| IAM scoped to actual access patterns | Yes, per-service role with specific resources | Often wildcard policies | Not in scope |
| Encryption at rest by default | Yes, KMS keys configured | Often default no-encrypt | Not in scope |
| Module structure for future services | Yes, reusable network/db/service modules | Inline resources | Not in scope |
A worked example: provisioning a Node.js API on AWS
Suppose the brief is: provision the infrastructure for a new Node.js API on AWS with a Postgres database. Run /forge-infra and the output is the Terraform module structure plus the resources.
# modules/network/main.tf (excerpt)
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
tags = local.common_tags
}
resource "aws_subnet" "private" {
for_each = toset(["a", "b", "c"])
vpc_id = aws_vpc.main.id
cidr_block = "10.0.${index(["a","b","c"], each.key) + 10}.0/24"
availability_zone = "${var.region}${each.key}"
tags = merge(local.common_tags, { Tier = "private" })
}
resource "aws_subnet" "public" {
for_each = toset(["a", "b", "c"])
vpc_id = aws_vpc.main.id
cidr_block = "10.0.${index(["a","b","c"], each.key) + 1}.0/24"
availability_zone = "${var.region}${each.key}"
map_public_ip_on_launch = true
tags = merge(local.common_tags, { Tier = "public" })
}
# + IGW, NAT gateways (one per AZ), route tables, NACLs
# modules/database/main.tf (excerpt)
resource "aws_db_instance" "main" {
identifier = "${var.name}-db"
engine = "postgres"
engine_version = "16.4"
instance_class = "db.t4g.medium"
allocated_storage = 50
max_allocated_storage = 200
storage_encrypted = true
kms_key_id = aws_kms_key.rds.arn
db_subnet_group_name = aws_db_subnet_group.main.name
vpc_security_group_ids = [aws_security_group.db.id]
backup_retention_period = 14
backup_window = "03:00-04:00"
maintenance_window = "sun:04:00-sun:05:00"
performance_insights_enabled = true
deletion_protection = true
skip_final_snapshot = false
final_snapshot_identifier = "${var.name}-db-final"
tags = local.common_tags
}
# modules/api/main.tf (excerpt)
resource "aws_iam_role" "api" {
name = "${var.name}-api"
assume_role_policy = data.aws_iam_policy_document.api_assume.json
}
resource "aws_iam_role_policy" "api" {
role = aws_iam_role.api.id
policy = jsonencode({
Version = "2012-10-17",
Statement = [{
Effect = "Allow",
Action = ["s3:GetObject", "s3:PutObject"],
Resource = ["${aws_s3_bucket.uploads.arn}/*"]
}]
})
# Reasoning: api needs to read/write user uploads; no other S3 access.
}The structure is module-first: network, database, and API are separate modules that compose. Future services join the same VPC by importing the network module; they get their own database module if they need one and their own API module instance with their own IAM role. The pattern scales from one service to ten without a rewrite, which is the point of writing IaC at all.
Related skills
/forge-infra provisions infrastructure. For network design specifically, /forge-network handles VPCs, DNS, and load balancers in detail. For diagnosing runtime infrastructure problems, /forge-diagnose is the right call. For cost optimization, /forge-cost produces a savings plan with concrete actions.
Install
/forge-infra ships with the Forge agent in the Tonone for Claude Code package. Install Tonone, invoke /forge-infra from any Claude Code session, and the skill produces production-grade IaC for the project's cloud and IaC tool.
1. Add to marketplace
2. Install Forge
Infrastructure done right on day one is the cheapest infrastructure. The skill is built so the discipline that prevents day-one shortcuts is the default.
Frequently asked questions
- What does /forge-infra do?
- It produces production-grade Infrastructure as Code from scratch with VPC, IAM, encryption, backups, and observability configured correctly. The output adapts to the project's cloud provider and IaC tool.
- What clouds and IaC tools does /forge-infra support?
- AWS, GCP, and Azure are supported. Terraform (and OpenTofu), Pulumi, and AWS CDK are the primary IaC tools; the skill detects which the project uses or recommends Terraform for greenfield.
- How is /forge-infra different from copying a Terraform tutorial?
- Tutorials use permissive defaults (wildcard IAM, default VPC, no encryption). /forge-infra scopes IAM per service to specific resources, designs the network deliberately, and turns on encryption and backups by default.
- When should I use /forge-infra?
- When provisioning infrastructure for a new service or product, when codifying manually-provisioned infrastructure, or when starting a project that needs cloud resources from scratch.
- Does /forge-infra handle multi-cloud?
- Yes, when the project requires it. Most projects are single-cloud and the skill optimizes for that case; multi-cloud is supported when there is a real reason for it (DR, compliance, vendor portability).
- How do I install /forge-infra?
- Install Tonone for Claude Code via the get-started guide at tonone.ai/get-started. /forge-infra ships with the Forge agent and is invoked as a slash command in any Claude Code session. Tonone is free and MIT-licensed.
- Is /forge-infra free?
- Yes. The skill is part of Tonone, which is MIT-licensed. The only cost is Claude Code token usage during the work.
- Does /forge-infra produce reusable modules?
- Yes. The output is module-first: network, database, and service modules that compose. Subsequent services in the same account import the network module rather than duplicating it.