Skip to main content
Back to the field guide

A field guide to the /cortex-eval skill

AI Model Evaluation and Drift Check

Models decay silently. /cortex-eval checks accuracy regression against reference data, distribution drift, latency baseline, and token cost shifts.

Cortex · ML/AI7 min readFebruary 1, 2026

Models decay silently. The accuracy that was 92% at launch is 87% three months later. The latency was 200ms; it is 350ms now. The token cost was $0.003 per request; it crept to $0.012 because input distribution shifted toward longer prompts. Each of these is invisible without an evaluation routine that runs against a reference dataset and tracks the deltas.

The /cortex-eval skill evaluates a deployed model or LLM integration across four dimensions: accuracy regression against a reference dataset, distribution drift on inputs and outputs, latency regression compared to baseline, and cost shifts. The output is a health report with recommended actions: retrain the model, refresh the prompt, switch the provider, address the upstream data shift.

What the eval covers

Accuracy: held-out reference set scored against the deployed model with the metrics calibrated to the task. Distribution drift: KS test on input features, histogram comparison on output predictions. Latency: p99 of the deployed endpoint vs the baseline at launch. Cost: token-per-request and total spend trends with breakdown by user or feature.

How /cortex-eval works

The skill connects to the model serving layer and the production logs, runs the reference set, computes drift, and pulls the latency and cost metrics. It produces the health report with severity per dimension. Recommended actions are scoped: a small accuracy drop and stable distribution suggests a refresh; a big distribution shift suggests retraining or a prompt update.

Tonone's /cortex-eval skill evaluates deployed models for accuracy regression, distribution drift, latency baseline, and cost shifts.

Install

/cortex-eval ships with the Cortex agent in Tonone for Claude Code.

1. Add to marketplace

$ claude plugin marketplace add tonone-ai/tonone

2. Install Cortex

$ claude plugin install cortex@tonone-ai

Frequently asked questions

What does /cortex-eval do?
It evaluates a deployed model or LLM integration for accuracy regression, distribution drift, latency baseline, and cost shifts.
How do I install /cortex-eval?
Install Tonone for Claude Code via tonone.ai/get-started.

Pairs well with