Skip to main content
Back to the field guide

A field guide to the /forge-diagnose skill

Diagnose Cloud Infrastructure Issues with AI

Cold starts, timeouts, autoscale failures, network anomalies. /forge-diagnose reads logs, metrics, and config to find root cause not symptoms.

Forge · Infrastructure8 min readFebruary 11, 2026

Runtime infrastructure issues feel different from application bugs. The application code is correct; the system is misbehaving anyway. The cold start latency is too high. The connection pool is exhausting under traffic the team thought was modest. Autoscale failed to provision new instances during a spike. The network has occasional 200ms blips with no obvious cause. Each of these has a root cause in the infrastructure layer (IAM permissions, instance types, autoscale group config, networking layout) and is invisible from the application logs.

The /forge-diagnose skill reads the cloud logs, metrics, and configuration to find the actual root cause rather than the symptom. Cold start latency traces to the Lambda cold-start memory configuration, the VPC ENI provisioning latency, or the package size. Autoscale failure traces to the ASG max size, the IAM permissions for instance launch, or the launch template. Connection pool exhaustion traces to the database max connections, the application's pool size, and the deploy-time concurrency. Each diagnosis is grounded in evidence and produces a fix that addresses the cause.

What infrastructure diagnosis requires

The same loop as application performance diagnosis: localize the symptom, identify the root cause, propose a fix, verify the fix works. The difference is the data sources. Application diagnosis reads APM. Infrastructure diagnosis reads cloud-provider logs and metrics: CloudWatch for AWS, Cloud Logging and Monitoring for GCP, Azure Monitor for Azure. The skill reads from whichever the project uses and correlates with the configuration to find the cause.

How /forge-diagnose works

The skill asks for the symptom (cold start latency, timeout, autoscale failure, network blip) and the affected resource. It reads the logs and metrics for the resource in the relevant window and the configuration for the resource. It produces a hypothesis with the evidence, a proposed fix with reversibility note, and a verification plan. The output is the diagnosis the team would otherwise iterate on for hours.

Cold start latency is most often the package size and least often the memory configuration, despite both being commonly cited. /forge-diagnose checks size first because the typical fix (smaller bundle, dependency pruning) addresses the dominant cost.

Tonone's /forge-diagnose skill diagnoses runtime infrastructure issues by reading cloud logs, metrics, and configuration to find the actual cause.

When to use /forge-diagnose

/forge-diagnose is the right call when production has a runtime infrastructure problem (cold start, timeout, scaling, networking) and the team needs structured diagnosis. Skip for application performance (use /spine-perf) or for cost analysis (use /forge-cost).

CapabilityTononeGeneralist chatbotCursor / Copilot
Reads cloud logs and metricsYes, CloudWatch / GCP / AzureNo data accessTool-specific
Correlates with configurationYes, finds config-driven causesSymptom guessesVariable
Hypothesis grounded in evidenceYes, cited per claimGeneric causesVariable
Fix with reversibility noteYes, both surfacedFix onlyVariable

/forge-diagnose covers infrastructure runtime issues. /spine-perf covers application performance. /vigil-incident leads incident response when an alert fires. /forge-cost covers cost optimization.

Install

/forge-diagnose ships with the Forge agent in Tonone for Claude Code. Install Tonone, configure cloud credentials, and the skill diagnoses runtime infrastructure issues.

1. Add to marketplace

$ claude plugin marketplace add tonone-ai/tonone

2. Install Forge

$ claude plugin install forge@tonone-ai

Frequently asked questions

What does /forge-diagnose do?
It diagnoses runtime infrastructure issues (cold starts, timeouts, autoscale failures, network anomalies) by reading cloud logs, metrics, and configuration.
What clouds does /forge-diagnose support?
AWS (CloudWatch, X-Ray), GCP (Cloud Logging, Cloud Monitoring), and Azure (Azure Monitor).
When should I use /forge-diagnose?
When production has a runtime infrastructure problem and the team needs structured diagnosis.
How do I install /forge-diagnose?
Install Tonone for Claude Code via tonone.ai/get-started. /forge-diagnose ships with the Forge agent. Tonone is free and MIT-licensed.

Pairs well with