Skip to main content
Back to the field guide

A field guide to the /forge-infra skill

AI Infrastructure as Code Generator

Most teams provision infra by clicking around a console. /forge-infra produces production-grade Terraform, Pulumi, or CDK with compute, networking, storage, and IAM done right.

Forge · Infrastructure10 min readMarch 25, 2026

Most production infrastructure starts as console clicks. An engineer needed to ship something, the AWS console was open, the buttons existed, and an hour later the service was running on resources nobody had named or configured deliberately. The setup worked. It also had a security group that allowed 0.0.0.0/0 because that was the option that made the test work, and an IAM role with a wildcard policy because the engineer was tired of guessing scopes, and a VPC structure that made sense for the moment but did not anticipate the second service that would join it three months later. Each click was small. The cumulative result is the cloud account that nobody fully understands and that everyone is afraid to touch.

Infrastructure as code fixes the long-term problem by making the infrastructure reviewable, reproducible, and version-controlled. The fix only works when the IaC is written correctly: a Terraform file with the same wildcard IAM policy is no better than the console click that produced it. Mainstream AI tools generate IaC the same way they generate any other code: from the prompt, with reasonable defaults that are usually too permissive, and without a coherent picture of how the resources fit together. The /forge-infra skill exists to write IaC the way a senior infrastructure engineer would: with the network designed before the resources, the IAM scoped to the actual access patterns, and the modules structured so the second service can join without rewriting the first.

Why generalist AI generates fragile IaC

Ask Cursor or ChatGPT for a Terraform file that runs a Postgres database in AWS. You get an aws_db_instance resource. The instance is in a default VPC. The security group is permissive. The parameter group is the default. Storage is gp2, instance class is db.t3.medium, no read replica, no automated backup window, no maintenance window, and no encryption-at-rest configuration. The resource technically works. It is also the database that turns into the incident report the first time something goes wrong, and the cost report when the next person realizes the team has been paying for the default everything. The output looks correct in isolation; it is not correct in the context of a production environment, and a generalist tool cannot tell the difference.

The deeper issue is that infrastructure is composed. A database needs a VPC, the VPC needs subnets, the subnets need a route table, the route table needs a NAT gateway if the database needs internet access, the NAT gateway needs an Elastic IP. Each resource depends on the others, and each resource has options that depend on the surrounding context. A generalist tool can produce any one of these resources, but it cannot design the system. The result is a Terraform file that compiles and applies but produces an environment that is not actually correct: too permissive, too expensive, or too brittle to operate.

What production-grade IaC actually requires

A production-grade IaC layout has six things. First, a clear network topology: VPCs, subnets (public and private separated), route tables, NAT gateways, and the security groups that move traffic between them. Second, IAM scoped to the actual access patterns: a role per service, a policy per role, the policy listing the specific resources it can act on. Third, encryption at rest by default: EBS volumes, RDS storage, S3 buckets, all encrypted with KMS keys. Fourth, backups and snapshots configured: RDS automated backups with the right retention, S3 versioning where it matters, snapshot lifecycle for EBS. Fifth, observability hooks: CloudWatch alarms on the metrics that matter, logs flowing to a central place, traces if the project uses them. Sixth, the module structure that lets future services slot in: a network module, a database module, a service module, all reusable rather than copy-pasted.

Each of these layers has trade-offs. Strict IAM is the right default and adds friction during initial setup; the trade-off is justified by the security gain. Backups have a cost; the cost is justified by the rare bad day. Module structure feels like over-engineering on day one and pays off the first time a second service joins the account. The discipline of getting these right upfront is the discipline that distinguishes infrastructure that will scale from infrastructure that will be migrated again in eighteen months.

How /forge-infra works

Step one: detect provider and existing infra

Before writing any IaC, /forge-infra reads the project to detect the cloud provider (AWS, GCP, Azure, or multi-cloud), the IaC tool the project uses (Terraform, Pulumi, AWS CDK, Pulumi, OpenTofu), and any existing infrastructure already defined. The detection drives the output: a Terraform project gets Terraform, a Pulumi project gets Pulumi, and the new resources are added to the existing module structure rather than created in isolation.

Step two: design the network first

Network is the foundation of every other resource, so it is designed first. The skill produces VPC and subnet structure calibrated to the use case: public subnets for the load balancer, private subnets for application instances, isolated subnets for the database. NAT gateways are sized to the expected traffic, route tables are configured deliberately, security groups are scoped to specific protocols and ports between specific groups. The network is reusable: subsequent services join the same VPC by importing the network module.

Step three: compute, storage, and IAM

On top of the network, /forge-infra produces the compute, storage, and IAM. Compute is right-sized to the expected load with autoscaling configured. Storage uses the right service for the workload (RDS for relational, DynamoDB or Firestore for key-value, S3 for object). IAM is per-service: each service gets its own role, the role has a policy that grants only the actions on the specific resources the service needs. Wildcards (Resource: *, Action: s3:*) are flagged and require explicit override.

Step four: observability and lifecycle

The infrastructure ships with the observability hooks already wired: CloudWatch alarms on disk space, CPU, memory, and the application metrics the service exposes. Log groups are created with retention policies. Backups are configured per resource with the right retention. Tags are applied consistently for cost attribution. The lifecycle policies (S3 transitions, EBS snapshot retention) are part of the IaC, not added later by hand.

Default IAM policies in AWS examples are usually Resource: * because that is what tutorials need to work for the reader. /forge-infra refuses wildcard resources by default and requires explicit override with a comment explaining the operational reason.

Tonone's /forge-infra skill produces production-grade Infrastructure as Code with the network, IAM, encryption, backups, and observability done correctly from the start.

When to use /forge-infra, and when not to

/forge-infra is the right call when provisioning infrastructure for a new service or product, when an existing service was provisioned manually and needs to be codified, or when starting a new project that needs cloud resources from scratch. The signal is when the team is about to click in the cloud console, or when an existing infrastructure needs to be brought under version control.

Skip the skill for trivial single-resource additions to an existing IaC project (use a regular IaC edit). For runtime infrastructure issues (cold starts, scaling, networking failures), /forge-diagnose is the right call. For cost optimization on existing infrastructure, /forge-cost is calibrated to that work.

CapabilityTononeGeneralist chatbotCursor / Copilot
Detects provider and existing IaCYes, AWS/GCP/Azure, Terraform/Pulumi/CDKAsks user or guessesSuggests within current line
Network designed before resourcesYes, VPC/subnets/route tables firstResources in default VPCNot in scope
IAM scoped to actual access patternsYes, per-service role with specific resourcesOften wildcard policiesNot in scope
Encryption at rest by defaultYes, KMS keys configuredOften default no-encryptNot in scope
Module structure for future servicesYes, reusable network/db/service modulesInline resourcesNot in scope

A worked example: provisioning a Node.js API on AWS

Suppose the brief is: provision the infrastructure for a new Node.js API on AWS with a Postgres database. Run /forge-infra and the output is the Terraform module structure plus the resources.

hcl
# modules/network/main.tf (excerpt)
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  tags = local.common_tags
}

resource "aws_subnet" "private" {
  for_each          = toset(["a", "b", "c"])
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.${index(["a","b","c"], each.key) + 10}.0/24"
  availability_zone = "${var.region}${each.key}"
  tags = merge(local.common_tags, { Tier = "private" })
}

resource "aws_subnet" "public" {
  for_each                = toset(["a", "b", "c"])
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.${index(["a","b","c"], each.key) + 1}.0/24"
  availability_zone       = "${var.region}${each.key}"
  map_public_ip_on_launch = true
  tags = merge(local.common_tags, { Tier = "public" })
}
# + IGW, NAT gateways (one per AZ), route tables, NACLs

# modules/database/main.tf (excerpt)
resource "aws_db_instance" "main" {
  identifier             = "${var.name}-db"
  engine                 = "postgres"
  engine_version         = "16.4"
  instance_class         = "db.t4g.medium"
  allocated_storage      = 50
  max_allocated_storage  = 200
  storage_encrypted      = true
  kms_key_id             = aws_kms_key.rds.arn
  db_subnet_group_name   = aws_db_subnet_group.main.name
  vpc_security_group_ids = [aws_security_group.db.id]
  backup_retention_period = 14
  backup_window          = "03:00-04:00"
  maintenance_window     = "sun:04:00-sun:05:00"
  performance_insights_enabled = true
  deletion_protection    = true
  skip_final_snapshot    = false
  final_snapshot_identifier = "${var.name}-db-final"
  tags = local.common_tags
}

# modules/api/main.tf (excerpt)
resource "aws_iam_role" "api" {
  name = "${var.name}-api"
  assume_role_policy = data.aws_iam_policy_document.api_assume.json
}
resource "aws_iam_role_policy" "api" {
  role = aws_iam_role.api.id
  policy = jsonencode({
    Version = "2012-10-17",
    Statement = [{
      Effect = "Allow",
      Action = ["s3:GetObject", "s3:PutObject"],
      Resource = ["${aws_s3_bucket.uploads.arn}/*"]
    }]
  })
  # Reasoning: api needs to read/write user uploads; no other S3 access.
}

The structure is module-first: network, database, and API are separate modules that compose. Future services join the same VPC by importing the network module; they get their own database module if they need one and their own API module instance with their own IAM role. The pattern scales from one service to ten without a rewrite, which is the point of writing IaC at all.

/forge-infra provisions infrastructure. For network design specifically, /forge-network handles VPCs, DNS, and load balancers in detail. For diagnosing runtime infrastructure problems, /forge-diagnose is the right call. For cost optimization, /forge-cost produces a savings plan with concrete actions.

Install

/forge-infra ships with the Forge agent in the Tonone for Claude Code package. Install Tonone, invoke /forge-infra from any Claude Code session, and the skill produces production-grade IaC for the project's cloud and IaC tool.

1. Add to marketplace

$ claude plugin marketplace add tonone-ai/tonone

2. Install Forge

$ claude plugin install forge@tonone-ai

Infrastructure done right on day one is the cheapest infrastructure. The skill is built so the discipline that prevents day-one shortcuts is the default.

Frequently asked questions

What does /forge-infra do?
It produces production-grade Infrastructure as Code from scratch with VPC, IAM, encryption, backups, and observability configured correctly. The output adapts to the project's cloud provider and IaC tool.
What clouds and IaC tools does /forge-infra support?
AWS, GCP, and Azure are supported. Terraform (and OpenTofu), Pulumi, and AWS CDK are the primary IaC tools; the skill detects which the project uses or recommends Terraform for greenfield.
How is /forge-infra different from copying a Terraform tutorial?
Tutorials use permissive defaults (wildcard IAM, default VPC, no encryption). /forge-infra scopes IAM per service to specific resources, designs the network deliberately, and turns on encryption and backups by default.
When should I use /forge-infra?
When provisioning infrastructure for a new service or product, when codifying manually-provisioned infrastructure, or when starting a project that needs cloud resources from scratch.
Does /forge-infra handle multi-cloud?
Yes, when the project requires it. Most projects are single-cloud and the skill optimizes for that case; multi-cloud is supported when there is a real reason for it (DR, compliance, vendor portability).
How do I install /forge-infra?
Install Tonone for Claude Code via the get-started guide at tonone.ai/get-started. /forge-infra ships with the Forge agent and is invoked as a slash command in any Claude Code session. Tonone is free and MIT-licensed.
Is /forge-infra free?
Yes. The skill is part of Tonone, which is MIT-licensed. The only cost is Claude Code token usage during the work.
Does /forge-infra produce reusable modules?
Yes. The output is module-first: network, database, and service modules that compose. Subsequent services in the same account import the network module rather than duplicating it.

Pairs well with