Most cloud infrastructure goes wrong not in deployment but long before it, when someone copy-pastes a Terraform snippet from a blog post, skips the IAM module because it looks complicated, and ships to production with a storage bucket that is publicly readable and an instance type three times larger than the workload needs. The damage is silent: the app works, the tests pass, and the bill arrives at the end of the month as a surprise. The security misconfiguration sits there for months until a compliance review finds it. The pattern repeats because writing good infrastructure as code is a specialist skill that most teams cannot staff full-time and that generalist AI tools consistently get wrong, they produce plausible-looking Terraform that skips the opinions that make infrastructure actually production-grade. That gap is exactly what Forge fills.
Why the generalist approach breaks down
Ask ChatGPT to write you a VPC module in Terraform and you will get something that compiles. Ask a cloud infrastructure engineer to review it and they will immediately flag the missing private subnet routing, the overly permissive security group egress rules, the absence of VPC flow logs, and the fact that the NAT gateway configuration will create a single point of failure in a multi-AZ setup. The generalist tool has no position on those decisions, it produces output that satisfies the literal request and ignores everything that makes infrastructure reliable, secure, and cost-aware. The problems surface three months later, under load, when the on-call engineer is trying to figure out why connections are timing out at 2 a.m.
Cursor and GitHub Copilot have the same blind spot, compounded by the fact that they are editor-level completion tools. They will happily autocomplete a resource "aws_s3_bucket" block without ever mentioning the bucket policy, versioning configuration, or server-side encryption setting that turn a bucket from a liability into an asset. They are not making infrastructure decisions, they are completing patterns they have seen before. When your infrastructure has real requirements around compliance, cost governance, or resilience, autocomplete produces a first draft that looks done but is not. Every infrastructure decision that required judgment is missing.
The deeper problem is that cloud infrastructure is one of the few engineering domains where wrong decisions are expensive in three independent ways simultaneously: they create security exposure, they inflate cost, and they create operational failure modes that do not surface until the system is under real load. A generalist tool that does not have opinions about IAM least-privilege, right-sizing, multi-AZ design, and encrypted storage is not actually useful for infrastructure work, it is useful for infrastructure drafting, and the difference matters. Teams that use generalist tools for IaC consistently end up with a mix of production-grade and dangerously underspecified resources, and the only thing holding it together is that nobody has looked closely enough to notice.
What a cloud infrastructure engineer actually does
On a human engineering team, the infrastructure engineer is the person who owns what runs the application, the compute, the networking, the storage, the access controls, and the glue that holds all of it together across environments. They think in failure modes: what happens when an availability zone goes down, what happens when autoscaling does not trigger fast enough, what happens when a service account is compromised. They write Terraform that documents its own reasoning, variable descriptions, output explanations, comment blocks that capture why a decision was made rather than what was decided. They review other people's infrastructure changes not for syntax but for the security and reliability consequences that the author may not have thought through.
The infrastructure engineer is also the person your team calls when the cloud bill arrives with an unexpected number on it. They know which instance types are oversized for the workload, which reserved instance commitments are about to expire, which idle resources have been running for months because nobody deleted the staging environment from six sprints ago. That combination of operational knowledge, security instinct, and cost awareness is hard to find and harder to keep. Forge makes it available on demand, in the IaC language your team already uses, across the cloud providers you are actually running.
Meet Forge
Forge is Tonone's cloud infrastructure engineer, a purpose-built specialist agent for GCP, AWS, Azure, Cloudflare, and Fly.io, working in Terraform, Pulumi, or CDK depending on what your project already uses. Forge does not write infrastructure that looks production-grade; it writes infrastructure that is production-grade. That means IAM with least-privilege from the first resource, not bolted on later. It means subnet strategy and CIDR planning documented with their rationale, not left implicit. It means cost and security are first-class outputs of every infra build, not afterthoughts surfaced by audits.
Tonone's Forge builds production-grade infrastructure as code across GCP, AWS, Azure, Cloudflare, and Fly.io, with IAM, cost awareness, and security baked in from the first resource, not added as an afterthought.
What Forge actually does
Building production infrastructure from scratch
The forge-infra skill is where Forge earns its name. You describe what you need, a GKE cluster with a private node pool, an RDS instance behind a VPC, a multi-region CDN setup, and Forge detects your cloud provider and target platform from the existing project context, then produces complete, production-grade IaC. Not a starter template. Not a hello-world module. Compute with the right instance family for the workload, networking configured to isolate traffic correctly, storage with encryption and versioning on from day one, and IAM that grants each component the minimum permissions it actually needs. The output includes inline comments that explain why each decision was made, why this CIDR range, why this instance type, why the storage bucket policy is structured the way it is, so the infrastructure is maintainable by whoever works on it next, not just by whoever wrote it. For teams starting a new cloud environment or expanding to a new region, forge-infra compresses weeks of careful infrastructure work into hours, without cutting the corners that create problems later.
Designing networking infrastructure that holds
Networking is the part of cloud infrastructure that looks simple until it is not. A VPC that worked fine with three services starts behaving unexpectedly when you add a fourth, because the subnet strategy was never planned beyond what existed at the time. A firewall rule that was reasonable for a development environment accidentally ships to production with open egress. The forge-network skill addresses this directly: it designs and builds networking infrastructure with a coherent subnet strategy and CIDR planning that leaves room for growth, DNS configuration that handles internal and external resolution correctly, load balancers with health checks and SSL termination configured properly, and firewall rules that follow least-privilege ingress and egress at the rule level, not at the VPC level where it is too broad to be useful. Every networking decision is documented with its rationale, which means the next engineer who reads the configuration understands why it is the way it is rather than inheriting a structure they are afraid to change. For teams that have grown their cloud footprint organically and ended up with networking they do not fully understand, forge-network can also document and rationalize the existing setup before proposing changes.
Auditing existing infrastructure for real risk
The forge-audit skill is what you run when you inherit a cloud environment and need an honest assessment of what you have inherited. Forge reads the existing IaC and cloud configuration and produces a prioritized finding list covering IAM permissions that are over-privileged, public exposure on storage buckets and database instances, resources that are unencrypted at rest or in transit, idle and oversized instances that are running but serving no traffic, and missing backup policies that mean a failure event would result in data loss. The output is not a generic checklist, it is a finding per resource, with the specific misconfiguration, the severity, and the remediation steps in the exact IaC language you are using. The prioritization reflects actual risk: a publicly readable bucket with customer data is severity critical; an idle dev instance without a backup policy is low. Security teams and compliance auditors can use the output directly; engineering teams can use it as a backlog of infrastructure improvements with enough context to act immediately.
Tonone's Forge forge-audit skill audits existing cloud infrastructure for IAM over-privilege, public storage exposure, unencrypted resources, and cost waste, producing a prioritized finding list with remediation steps in your IaC language.
Finding what the cloud bill is actually paying for
The forge-cost skill turns cloud cost analysis from a monthly ritual of confusion into an actionable engineering conversation. Forge analyzes cloud spend to identify idle resources that are running but serving nothing, instances that are sized for peak load they have never actually seen, committed use discount gaps where on-demand pricing is paying for stable workloads that qualify for reservations, and architectural patterns that are more expensive than their alternatives without being more reliable. The output is not a list of metrics, it is a set of specific changes with expected monthly savings per change, so engineering and finance can agree on a prioritized cost reduction plan rather than staring at a bill and guessing. For growing teams where cloud spend is becoming a material line item, forge-cost provides the infrastructure expertise to distinguish necessary spend from waste without requiring a dedicated FinOps function.
Diagnosing runtime infrastructure problems
The forge-diagnose skill is what you reach for when something in the infrastructure is wrong and you cannot figure out why. Cold start latency on a service that was fine last week. Connection timeouts that happen intermittently and do not correlate with any obvious pattern. Autoscaling that triggers too late and leaves the service under-provisioned during traffic spikes. Connection pool exhaustion that looks like application errors but is actually an infrastructure configuration problem. Forge diagnoses these by reading logs, metrics, and configuration together, not just the application logs, not just the cloud console metrics, but the combination of signals that reveals whether the problem is in the application, the infrastructure, or the interaction between them. The output identifies the actual cause rather than the visible symptom, with a remediation plan that addresses the root issue rather than masking it. For teams running on Claude Code, forge-diagnose is the fastest path from an infrastructure incident to a grounded diagnosis that can be acted on.
Inventorying what is actually running
Before Forge can build, audit, or optimize anything, it needs to know what exists. The forge-recon skill performs infrastructure reconnaissance: it inventories all cloud resources across accounts and regions, maps the connections between services, identifies configuration drift between what the IaC definitions say should exist and what is actually running, and flags high-risk items that warrant immediate attention. The output is a readable map of the cloud environment, not a raw export from the cloud console, but an organized summary of what is running, how it is connected, and where the risks are. For teams that have grown their infrastructure faster than their documentation, forge-recon produces the inventory that should have existed from the start. It is also the natural entry point before any forge-audit or forge-cost engagement, grounded context before opinions.
Tonone's Forge forge-recon skill inventories cloud resources across accounts and regions, maps service connections, and identifies configuration drift between IaC definitions and what is actually running.
A worked example
A startup is scaling from a single-region Fly.io deployment to AWS with a proper VPC, private subnets, and a CDN in front of the application. They hand Forge the brief: "Set up a production VPC on AWS with a private ECS cluster, RDS Postgres in a private subnet, and CloudFront in front." Forge starts with a forge-recon of the existing Fly.io setup to understand the current architecture and traffic patterns, then produces a Terraform skeleton with cost and IAM notes inline.
The output is not a template, it is a production-grade starting point with explicit decisions documented, cost considerations noted, and IAM roles scoped to minimum permissions from the beginning:
# forge-infra output, production VPC skeleton
# Cloud: AWS Region: us-east-1 Estimated monthly baseline: ~$180-240
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.5.2"
name = "acme-prod"
cidr = "10.100.0.0/16"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
private_subnets = ["10.100.1.0/24", "10.100.2.0/24", "10.100.3.0/24"]
public_subnets = ["10.100.101.0/24", "10.100.102.0/24", "10.100.103.0/24"]
# NAT per-AZ for HA, adds ~$135/mo. Drop to single-AZ if cost is priority
# over resilience in early stages.
enable_nat_gateway = true
single_nat_gateway = false # true saves ~$90/mo; acceptable for staging
one_nat_gateway_per_az = true
enable_flow_log = true
flow_log_destination_type = "cloud-watch-logs"
create_flow_log_cloudwatch_log_group = true
create_flow_log_cloudwatch_iam_role = true
}
# ECS cluster, no EC2 launch type; Fargate for ops simplicity
resource "aws_ecs_cluster" "app" {
name = "acme-prod"
setting {
name = "containerInsights"
value = "enabled"
}
}
# RDS Postgres, private subnet only; no public endpoint
module "rds" {
source = "terraform-aws-modules/rds/aws"
version = "6.6.0"
identifier = "acme-prod-pg"
engine = "postgres"
engine_version = "16"
instance_class = "db.t4g.medium" # upgrade to db.r8g.large if p99 > 20ms
allocated_storage = 100
storage_encrypted = true # required, KMS key below
db_subnet_group_name = module.vpc.database_subnet_group_name
vpc_security_group_ids = [aws_security_group.rds.id]
publicly_accessible = false # never true in prod
backup_retention_period = 7
deletion_protection = true
}
# IAM, task execution role scoped to ECR pull + Secrets Manager only
resource "aws_iam_role" "ecs_task_exec" {
name = "acme-prod-ecs-task-exec"
assume_role_policy = data.aws_iam_policy_document.ecs_assume.json
# Inline policy added below, no managed AdministratorAccess
}This is the kind of infrastructure starting point a senior cloud engineer would produce on their first day with a new client, complete enough to deploy, documented enough to understand, and opinionated enough to prevent the obvious mistakes. The cost notes mean the team can decide how much HA they want to pay for before the infrastructure exists. The IAM comments mean there is no path to an over-privileged task execution role sneaking into production.
If you need production-grade infrastructure as code across AWS, GCP, Azure, or edge providers, whether you are building from scratch, auditing what exists, investigating a cloud bill, or debugging a runtime problem, Forge is the specialist for it. Run /forge-infra with a brief description of what you need and get IaC with IAM, cost, and security baked in from the start.
Forge vs the alternatives
Forge is not competing with Terraform documentation or a cloud provider's wizard, it is the specialist who knows when each tool applies, what the production requirements are, and what a generalist tool will skip. The comparison below captures the functional differences that matter when you are building or auditing real cloud infrastructure.
| Capability | Tonone | Generalist chatbot | Cursor / Copilot |
|---|---|---|---|
| IaC with IAM least-privilege from the start | Yes, IAM roles scoped to minimum permissions in every forge-infra output | No, IAM is typically left as an exercise or uses managed admin policies | No, autocomplete suggests patterns without IAM opinions |
| Cost awareness in infrastructure output | Yes, estimated monthly cost and right-sizing notes inline in the IaC | No, no cost context in generated code | No, no project-level cost reasoning |
| Security audit of existing cloud setup | Yes, forge-audit produces prioritized findings with remediation steps per resource | Partial, can review code you paste, but no cloud-native resource inventory | No, code suggestions only, no infrastructure audit capability |
| Runtime infrastructure diagnostics | Yes, forge-diagnose reads logs, metrics, and config together to find root cause | Partial, can reason about logs you paste but lacks cloud context | No, no runtime observability integration |
| Multi-cloud coverage (AWS, GCP, Azure, Fly, Cloudflare) | Yes, detects provider and produces idiomatic IaC per platform | Partial, knows syntax but no production opinions per provider | No, provider-specific completions vary widely in quality |
| Configuration drift detection | Yes, forge-recon compares IaC definitions against what is actually running | No, no cloud state access | No, file-level only, no cloud state awareness |
Tonone's Forge produces infrastructure as code that is production-grade from the first commit, not a starting template that requires a security review before it is safe to deploy.
Install and try
Tonone is free and MIT-licensed. Install it once and all 23 agents, including Forge, are available in your Claude Code session.
1. Add to marketplace
2. Install Forge
Frequently asked questions
- What does Tonone's Forge do?
- Forge is Tonone's AI cloud infrastructure engineer. It builds production-grade infrastructure as code across GCP, AWS, Azure, Cloudflare, and Fly.io using Terraform, Pulumi, or CDK. It also audits existing cloud setups for security misconfigurations and cost waste, diagnoses runtime infrastructure problems, and inventories cloud resources across accounts and regions.
- How is Forge different from asking ChatGPT to write Terraform?
- ChatGPT produces Terraform that compiles but typically skips IAM least-privilege, encryption settings, backup policies, and cost-aware instance sizing. Forge is a specialist agent that treats those as first-class requirements, every forge-infra output includes IAM scoped to minimum permissions, cost estimates, and security configuration from the start.
- Can Forge audit an existing cloud environment I did not build?
- Yes. The forge-audit skill reads your existing IaC and cloud configuration and produces a prioritized finding list covering IAM over-privilege, public storage exposure, unencrypted resources, idle instances, and missing backup policies. Each finding includes severity and remediation steps in your IaC language.
- What AI can help me reduce my AWS or GCP cloud bill?
- Tonone's forge-cost skill analyzes your cloud spend to find idle resources, oversized instances, committed use discount gaps, and architectural changes that reduce cost without reducing capacity. The output includes expected monthly savings per change so you can prioritize.
- What does forge-diagnose do for infrastructure incidents?
- forge-diagnose reads logs, metrics, and configuration together to find the actual root cause of runtime infrastructure problems, cold start latency, connection timeouts, autoscaling failures, network anomalies, and connection pool exhaustion. It identifies the cause rather than the symptom, with a remediation plan.
- Does Forge work with AWS, GCP, and Azure?
- Yes. Forge works across AWS, GCP, Azure, Cloudflare, and Fly.io. It detects your cloud provider from the existing project context and produces idiomatic IaC in Terraform, Pulumi, or CDK depending on what your project already uses.
- How do I install Tonone's Forge agent?
- Install Tonone via the get-started guide at tonone.ai/get-started. Forge is one of 23 agents included in the Tonone package. Invoke it with slash commands like /forge-infra, /forge-audit, or /forge-cost. Tonone is free and MIT-licensed.
- What is forge-recon and when should I run it?
- forge-recon performs infrastructure reconnaissance: inventorying all cloud resources across accounts and regions, mapping connections between services, and identifying configuration drift between your IaC definitions and what is actually running. Run it when inheriting a cloud environment or before any audit or cost analysis engagement.