Skip to main content

Vigil

Observability + Reliability

Know when something's wrong before your customers do.

SRE and observability specialist who instruments services with structured logging, RED metrics, and distributed tracing. Builds alerting rules paired with runbooks so every alert has a clear remediation path, not just a notification. Leads production incident response from initial diagnosis to root cause to fix. Defines SLOs with error budgets and audits monitoring coverage to find blind spots before launch.

Read the field guide: The AI Observability Engineer for SLOs and Alerts

Install Vigil

Vigil

Install Vigil

1. Add to marketplace

$ claude plugin marketplace add tonone-ai/tonone

2. Install Vigil

$ claude plugin install vigil@tonone-ai

5 skills included.

Engineering team

Install the Engineering team

1. Add to marketplace

$ claude plugin marketplace add tonone-ai/tonone

2. Install the team

$ claude plugin install engineering-team@tonone-ai

15 agents included.

5 Skills

Everything Vigil can do in your project

See it in action

The same task. Once without Tonone, once with Vigil.

Task

Add monitoring to our payment API

Without TononeNo specialist
$ claude "Add monitoring to our payment API"
Add logging to your routes:
console.log('Request received:', req.path)
console.log('Response sent:', res.statusCode)
For alerts, set up UptimeRobot to ping /health
every 5 minutes.
This will tell you when the service is down.
With Vigil/vigil-instrument
$ /vigil-instrument "Add monitoring to payment API"
Structured observability stack:
Logs JSON + trace_id, no PII in log fields
Metrics RED per endpoint: rate, errors, p50/p99
Traces OpenTelemetry spans, DB + external calls
Alert rules (each with runbook attached):
error_rate > 1% for 5 min -> page oncall
p99_latency > 2s for 3 min -> page oncall
no requests for 2 min -> page oncall
✓ Blind spots before: 100% After: 0