Vigil
Observability + ReliabilityKnow when something's wrong before your customers do.
SRE and observability specialist who instruments services with structured logging, RED metrics, and distributed tracing. Builds alerting rules paired with runbooks so every alert has a clear remediation path, not just a notification. Leads production incident response from initial diagnosis to root cause to fix. Defines SLOs with error budgets and audits monitoring coverage to find blind spots before launch.
Read the field guide: The AI Observability Engineer for SLOs and AlertsInstall Vigil
Vigil
Install Vigil
1. Add to marketplace
$ claude plugin marketplace add tonone-ai/tonone
2. Install Vigil
$ claude plugin install vigil@tonone-ai
5 skills included.
Engineering team
Install the Engineering team
1. Add to marketplace
$ claude plugin marketplace add tonone-ai/tonone
2. Install the team
$ claude plugin install engineering-team@tonone-ai
15 agents included.
5 Skills
Everything Vigil can do in your project
See it in action
The same task. Once without Tonone, once with Vigil.
Task
Add monitoring to our payment API
Without TononeNo specialist
$ claude "Add monitoring to our payment API"
Add logging to your routes:
console.log('Request received:', req.path)
console.log('Response sent:', res.statusCode)
For alerts, set up UptimeRobot to ping /health
every 5 minutes.
This will tell you when the service is down.
With Vigil/vigil-instrument
$ /vigil-instrument "Add monitoring to payment API"
Structured observability stack:
Logs JSON + trace_id, no PII in log fields
Metrics RED per endpoint: rate, errors, p50/p99
Traces OpenTelemetry spans, DB + external calls
Alert rules (each with runbook attached):
error_rate > 1% for 5 min -> page oncall
p99_latency > 2s for 3 min -> page oncall
no requests for 2 min -> page oncall
✓ Blind spots before: 100% After: 0