What does /cortex-eval do?

It evaluates a deployed model or LLM integration for accuracy regression, distribution drift, latency baseline, and cost shifts.

How do I install /cortex-eval?

Install Tonone for Claude Code via tonone.ai/get-started.

AI Model Evaluation and Drift Check

Models decay silently. The accuracy that was 92% at launch is 87% three months later. The latency was 200ms; it is 350ms now. The token cost was $0.003 per request; it crept to $0.012 because input distribution shifted toward longer prompts. Each of these is invisible without an evaluation routine that runs against a reference dataset and tracks the deltas.

The /cortex-eval skill evaluates a deployed model or LLM integration across four dimensions: accuracy regression against a reference dataset, distribution drift on inputs and outputs, latency regression compared to baseline, and cost shifts. The output is a health report with recommended actions: retrain the model, refresh the prompt, switch the provider, address the upstream data shift.

What the eval covers

Accuracy: held-out reference set scored against the deployed model with the metrics calibrated to the task. Distribution drift: KS test on input features, histogram comparison on output predictions. Latency: p99 of the deployed endpoint vs the baseline at launch. Cost: token-per-request and total spend trends with breakdown by user or feature.

How /cortex-eval works

The skill connects to the model serving layer and the production logs, runs the reference set, computes drift, and pulls the latency and cost metrics. It produces the health report with severity per dimension. Recommended actions are scoped: a small accuracy drop and stable distribution suggests a refresh; a big distribution shift suggests retraining or a prompt update.

Tonone's /cortex-eval skill evaluates deployed models for accuracy regression, distribution drift, latency baseline, and cost shifts.

CortexML

/cortex-eval

Evaluates a deployed model or LLM integration for performance issues: checks for accuracy degradation against a reference dataset, data distribution drift that may explain behavior changes, latency regression compared to baseline, and cost increases from token usage changes. Produces a health report with recommended actions.

When model outputs seem worse after a retraining r…

CortexML

/cortex-prompt

Designs, versions, and evaluates prompts for LLM-powered features. Builds evaluation suites that test prompts against a representative set of inputs so quality regressions are caught before they reach users. Applies systematic prompt engineering techniques and documents what each version changes and why.

When building AI features where prompt quality dir…

CortexML

/cortex-integrate

Integrates an LLM into a production service: API client with retry and timeout handling, streaming response support, semantic caching to avoid redundant API calls, fallback provider configuration, and cost controls including token budget enforcement and request rate limiting per user.

When adding an AI feature to an existing product f…

CortexML

/cortex-model

Builds an end-to-end ML pipeline: data ingestion with validation, feature engineering with a feature store, model training with cross-validation and hyperparameter tuning, evaluation against a held-out test set, and deployment to a serving endpoint with monitoring. Chooses the right model type for the problem.

When building a prediction, classification, or reg…

Install

/cortex-eval ships with the Cortex agent in Tonone for Claude Code.

1. Add to marketplace

$ claude plugin marketplace add tonone-ai/tonone

2. Install Cortex

$ claude plugin install cortex@tonone-ai

Frequently asked questions

What does /cortex-eval do?: It evaluates a deployed model or LLM integration for accuracy regression, distribution drift, latency baseline, and cost shifts.
How do I install /cortex-eval?: Install Tonone for Claude Code via tonone.ai/get-started.