Skip to main content
Back to the field guide

Meet Surge

The AI Growth Engineer for Activation and Retention

Tonone's Surge maps activation funnels with drop-off hypotheses, builds PLG onboarding loops, designs retention playbooks, queues growth experiments with kill conditions, and builds landing pages for acquisition.

Surge · Growth10 min readApril 14, 2026

Growth is the discipline that the rest of product work is ultimately in service of, and also the discipline most often practiced as ritual rather than science. A team that ships features without measuring activation is not doing product development; they are writing software. A team that runs A/B tests without pre-registered hypotheses is not experimenting; they are generating noise. A team with a retention problem that responds by building more features has inverted the causal chain: retention is downstream of value delivery, and building more features into a product whose users are not yet getting value from the features it already has is not growth work, it is scope expansion. AI growth tools that generate experiment ideas and onboarding copy without diagnosing the actual activation failure mode are the A/B test platform equivalent of a gym membership, they provide the infrastructure for the work without ensuring the work is being done in the right direction. Surge was built to do the diagnostic work first: to identify where growth is actually failing before designing interventions, and to design interventions that are specific enough to produce a clear result.

Why the generalist approach fails at growth

A generalist chatbot asked for growth advice produces a list of growth tactics, referral programs, email drip sequences, onboarding checklists, feature announcements, win-back campaigns. The list is internally consistent and would not surprise anyone who has read a growth marketing blog. What it is missing is the diagnosis: which of these tactics is relevant to the specific growth problem the team has, in what sequence, with what metrics to validate whether it is working. A team with a 28% activation rate does not have the same problem as a team with a 62% activation rate and a 40% day-30 retention rate. The first team needs to fix the first session; the second team has a week-two drop-off problem. Applying the same list of tactics to both produces mediocre results for both, because the tactics were not chosen for the actual failure mode.

Growth hacking playbooks and books, the Reforge course content, the Brian Balfour frameworks, the Andrew Chen essays, are excellent for understanding growth theory and learning the vocabulary. They are not designed to be applied directly to a specific product's specific growth problem. They describe what worked at Airbnb, Dropbox, Pinterest, and Slack. Those case studies are instructive but not prescriptive. The product-led growth loop that worked for Dropbox (the refer-a-friend for storage space mechanic) worked because Dropbox was a storage product where more storage was the core value and sharing was native to the use case. Applying the same loop to a B2B SaaS product with a six-person buying committee and an annual contract produces a broken mechanic, not viral growth. Growth theory requires translation into the specific product context before it produces interventions that work.

A/B test platforms alone, Optimizely, Statsig, LaunchDarkly, provide the infrastructure to run experiments without providing the judgment to design experiments that will produce useful results. An experiment without a pre-registered hypothesis, a minimum detectable effect calculation, and a decision rule for null results is not an experiment, it is a change with monitoring. The platform records the data. The interpretation remains ambiguous. Teams running experiments this way generate a backlog of "inconclusive" results that nobody knows how to act on, while the actual growth problems remain unaddressed because the experiments were not designed to answer a specific question. The infrastructure is necessary; it is not sufficient.

What a growth engineer actually does

A senior growth engineer starts with the current growth equation, the quantified model of how the product acquires, activates, retains, and monetizes users, and identifies the weakest link. Not the most exciting feature to test, but the step in the user journey where the highest percentage of users are failing to get value and leaving. That diagnostic is the most important work in growth, and it is also the most frequently skipped: it requires data analysis, user research synthesis, and the humility to work on boring problems (like an email confirmation flow that loses 30% of signups) rather than exciting ones (like a viral loop that might double acquisition). Once the weakest link is identified, the growth engineer designs experiments that are specific enough to test a single hypothesis, powered correctly to detect a real effect, and structured with a kill condition so that failed experiments produce a clear learning rather than a lingering feature.

The PLG (product-led growth) dimension of growth engineering is particularly nuanced. PLG is not a growth tactic, it is a distribution model where the product itself drives acquisition and expansion through the value it creates in use. Building PLG loops requires understanding where in the product's workflow natural sharing or network effects occur, what the "aha moment" is for the product (the specific moment when a new user gets the value that makes them retain), and how to reduce the time from signup to that moment. These are product design questions as much as growth questions, which is why the best growth engineers work at the intersection of product design, data analysis, and marketing, not within any single one.

Meet Surge

Surge is Tonone's dedicated AI growth engineer, a purpose-built agent for the full growth workflow, from activation funnel diagnosis through experiment design, retention playbook production, and PLG loop mapping. It starts with the diagnostic question: where is growth actually failing, and what is the specific intervention that has the highest probability of fixing it? From that diagnosis, it produces a prioritized experiment queue, a retention playbook, or a PLG loop design, with the specificity and rigor that turns growth work from a list of tactics into a learning system. Surge does not produce generic growth advice; it produces growth interventions calibrated to the specific failure mode it diagnosed.

Tonone's Surge is the AI growth engineer that diagnoses the specific activation or retention failure mode before designing interventions, producing experiments with pre-registered hypotheses, kill conditions, and clear decision rules.

What Surge actually does

Diagnosing activation failure and designing the fix

The surge-activation skill takes the current activation funnel, described or instrumented, and produces a diagnostic analysis with a prioritized experiment queue. The output identifies: the aha moment for the product (the specific user action or outcome that is most correlated with long-term retention), the time-to-aha for the current onboarding flow (how long it takes the average new user to reach the aha moment), the steps in the onboarding flow that have the highest drop-off, and the root cause hypothesis for each major drop-off point. From that diagnosis, it produces three to five experiments in priority order: each one targeting a specific drop-off hypothesis, designed with a hypothesis statement, the expected effect and reasoning, a minimum sample size, the guardrail metrics to monitor, and a kill condition (the signal that would indicate the experiment is causing harm before the full sample is collected). The experiments are sequenced so that each one builds on the learning from the previous: the sequence is a learning roadmap, not just a list of ideas. surge-activation is the skill that answers the question "what should our growth team work on next" with a specific, evidence-grounded answer rather than a prioritization debate.

Building retention playbooks for at-risk users

The surge-retention skill produces a retention playbook: the behavioral signals that predict churn (leading indicators, not the churn event itself), the intervention triggers calibrated to each signal, the intervention content for each trigger (the email, in-app message, or success call script that addresses the specific disengagement pattern), and the success metric for each intervention. A retention playbook is not a drip sequence, it is a structured response to specific behavioral patterns. A user who has not returned in seven days after onboarding has a different disengagement signal than a user who returned daily for three weeks and then went quiet. The first needs a re-engagement message that addresses the obstacle to their return; the second needs an investigation into what changed. surge-retention produces playbooks that distinguish between these patterns and designs the right intervention for each one. The output also includes a cohort analysis structure: the behavioral segments that have different retention curves, the product changes that moved retention for each segment in the past, and the experiments queued for each segment based on the current gap between actual and target retention. For teams that have never formalized retention work, surge-retention produces a baseline playbook that can be implemented in a week and iterated from there.

Designing growth experiments with rigor

The surge-experiment skill produces a complete growth experiment specification: the hypothesis (what change, affecting which user behavior, producing what outcome), the variant description, the primary metric and measurement method, the minimum detectable effect and sample size, the test duration, guardrail metrics, and a kill condition (the specific signal that triggers early stopping). Every surge-experiment output includes a "what we'll learn" section, what the team will know if the experiment wins, what they will know if it loses, and what they will know if it produces a null result. This section prevents the most common experiment failure mode: a null result that produces no learning because the hypothesis was not specific enough to be invalidated. The experiment spec is also designed to be filed as a record: when the team reviews the experiment backlog six months later, they can understand exactly what was tested, why, and what was learned, building the institutional knowledge that compounds over time. surge-experiment connects directly to Lumen's lumen-abtest for the statistical design, ensuring the experimental rigor is consistent across the growth and product analytics workflows.

Mapping product-led growth loops

The surge-plg skill maps the natural PLG loops in a product: the points in the workflow where sharing, collaboration, or network effects occur naturally, the current conversion rates through those loops, the friction points that prevent the loops from closing, and the experiments that would improve loop efficiency. For products where PLG is a viable distribution model, surge-plg produces a loop architecture: the acquisition loop (how users invite others into the product), the engagement loop (how usage deepens over time), and the expansion loop (how team or account growth occurs through usage). Each loop includes the mechanic, the current state, the target state, and a prioritized set of interventions to move from current to target. For products where PLG is less viable (high price point, complex buying process, enterprise contracts), surge-plg identifies the limited PLG elements that can still be built into the product, viral features, shareable outputs, referral mechanics, and prioritizes them by expected impact on acquisition cost.

Growth intelligence before experiments begin

The surge-recon skill performs a rapid growth audit before any experiment or playbook work begins: it maps the current growth equation (acquisition rate, activation rate, retention at day-7, day-30, and day-90, expansion rate), identifies the weakest link in the funnel, and produces a growth health brief. The brief includes: the current state of each growth lever, the benchmarks for the product's category, the biggest gap between actual and benchmark performance, and the single most important growth decision to make before the next experiment cycle begins. surge-recon also audits the current experiment backlog (if one exists) for quality: experiments with insufficient power, experiments without kill conditions, experiments testing multiple variables simultaneously, and experiments queued out of causal sequence. For teams new to structured growth work, surge-recon is the right entry point: it produces a baseline assessment and a clear priority for the first experiment, preventing the most common startup failure mode of running experiments in random order rather than in the sequence that produces compounding learning.

A worked example

A B2B SaaS team has a 24% day-7 activation rate and does not know why. They have signup data, a basic funnel in Amplitude, and a sense that the onboarding flow is "too long." They run surge-recon first, which identifies that the actual problem is not length, users who reach step 4 of the onboarding (connecting an integration) have a 71% day-7 activation rate, while users who skip step 4 have a 9% rate. The recon output recommends the first experiment: make step 4 mandatory rather than optional.

Then surge-activation produces the full experiment queue. An excerpt of the activation funnel spec looks like this:

markdown
## Surge, Activation Funnel Spec

### Current state
Signup → Email confirm (loss: 18%) → Profile setup (loss: 12%)
→ Integration step (loss: 61%) → First core action (loss: 44%)

### Aha moment
Connecting first integration + completing first core action within session.
Evidence: 71% day-7 retention for users who hit both vs. 9% for those who skip.

### Experiment queue (priority order)

**EXP-01: Make integration step mandatory**
  Hypothesis: removing the skip option will increase integration completion
  rate from 39% to ≥65%, driving day-7 activation uplift ≥ +15pp.
  Primary metric: integration completion rate.
  Sample: 1,200 signups/arm. Duration: ~12 days.
  Kill condition: signup completion rate drops by >5pp (users abandoning at
  integration step rather than completing).
  What we'll learn:
  - WIN: step 4 completion rate drives activation; optimize integration UX next.
  - LOSS: skip behavior is not the cause; investigate email confirm drop-off.
  - NULL: effect is real but smaller; test integration pre-fill (EXP-02).

**EXP-02: Pre-fill integration from OAuth on signup**
  Hypothesis: reducing integration friction via OAuth pre-fill will reduce
  time-to-integration by 60% and increase completion rate ≥ 15pp.
  Sequences after EXP-01 (tests friction, not optionality).

**EXP-03: Email confirm removal (magic link direct to onboarding)**
  Hypothesis: eliminating the email confirmation step will reduce signup-to-
  onboarding drop by 15pp for signups from non-enterprise domains.
  Risk: deliverability and spam implications, route to security review first.

The team runs EXP-01. Integration completion jumps from 39% to 61%. Day-7 activation moves from 24% to 38%. The kill condition was not triggered. Because the hypothesis was specific and the decision rule was pre-registered, the team knows exactly what they learned: mandating the integration step was not the whole fix (completion went to 61%, not 65%), which means EXP-02 (reducing friction for users who want to complete it but cannot easily) is the right next experiment. The experiment sequence is a learning roadmap, not a list of features. Three experiments in, the team has a causal model of their activation funnel that they did not have before, and that model will compound into better interventions with every experiment cycle.

Before designing your first growth experiment, run surge-recon to identify the weakest link in your current funnel. The most common growth mistake is running experiments that optimize an already-working step while the actual failure mode, a step losing 60% of users, goes unaddressed because it is less exciting to fix. Surge diagnoses first, then designs the experiment sequence that addresses the actual constraint.

Surge vs the alternatives

Surge is not a list of growth tactics and it is not an A/B test platform. It is the growth judgment that determines which experiments to run, in what sequence, with what rigor, and the diagnostic work that ensures experiments are addressing the actual growth constraint rather than the most interesting hypothesis. The comparison below makes the functional differences concrete.

Tonone's Surge surge-retention skill produces retention playbooks that distinguish between disengagement patterns and design a specific intervention for each, not a generic drip sequence applied to all churning users.

CapabilityTononeGeneralist chatbotCursor / Copilot
Activation failure diagnosis before experiment designYes, aha moment identification, drop-off root cause, experiment queue in causal sequenceTactic list without diagnosis, not calibrated to the actual failure modeGrowth frameworks without product-specific diagnosis
Retention playbooks by disengagement patternYes, behavioral signal triggers, intervention content for each pattern, success metricsGeneric re-engagement advice, not calibrated to specific churn signalsDescribes frameworks, not product-specific playbooks
Experiments with kill conditions and pre-registered rulesYes, kill condition, what-we'll-learn for win/loss/null, decision ruleExperiment ideas without rigor, no power calculation, no kill conditionTest infrastructure, no experiment design or decision framework
PLG loop mapping and optimizationYes, acquisition/engagement/expansion loop architecture with current vs. target statePLG concepts and examples, not mapped to the specific productNot applicable, general growth content, not product-specific analysis
Growth health audit with benchmark comparisonYes, surge-recon maps the full growth equation and identifies the weakest linkNo, no quantitative growth auditCategory averages only, no product-specific analysis
Experiment sequence as learning roadmapYes, experiments sequenced so each builds on the previous learningUnsequenced ideas, no causal orderingPlatform runs tests in order received, no strategic sequencing

Tonone's Surge surge-plg skill maps acquisition, engagement, and expansion loops in the specific product context, identifying where PLG mechanics can be built and what current friction is preventing the loops from closing.

Install and try

Tonone is free and MIT-licensed. Install it once and all 23 agents, including Surge, are available in your Claude Code session. You pay only for the Claude Code token usage during work. Start with surge-recon to map your current growth equation and identify the weakest link before designing your next experiment.

1. Add to marketplace

$ claude plugin marketplace add tonone-ai/tonone

2. Install Surge

$ claude plugin install surge@tonone-ai

Frequently asked questions

What does Tonone's Surge do?
Surge is Tonone's AI growth engineer. It diagnoses activation failure modes, builds retention playbooks by disengagement pattern, designs growth experiments with pre-registered hypotheses and kill conditions, maps PLG loops for specific products, and audits the full growth equation to identify the weakest link before any work begins.
What is the aha moment and how does Surge identify it?
The aha moment is the specific user action or outcome most correlated with long-term retention, the moment when a new user gets the value that makes them come back. Surge's surge-activation skill identifies it by correlating early-session behaviors with long-term retention data, then designs the onboarding flow to minimize time-to-aha.
What is a kill condition and why does Surge include one in every experiment?
A kill condition is a pre-specified signal that triggers early stopping of an experiment if the variant is causing harm before the full sample is collected. Examples include a drop in signup completion rate, an increase in support ticket volume, or a degradation in a guardrail metric. Surge includes kill conditions in every experiment spec to prevent experiments from running to statistical significance while causing undetected damage.
How is Surge different from using a growth playbook or Reforge content?
Growth playbooks and courses describe frameworks and case studies from specific companies. Surge applies growth principles to your product's specific growth equation, your aha moment, your funnel drop-off points, your retention cohorts. The output is interventions calibrated to your context, not frameworks you need to translate yourself.
Can Surge help with product-led growth if my product has a complex B2B buying process?
Yes. surge-plg is designed for this nuance. For products where full PLG is not viable (enterprise contracts, complex buying committees), it identifies the limited PLG mechanics that can still be built in, shareable outputs, viral features, referral mechanics, and prioritizes them by expected impact on acquisition cost. Not every product can be Dropbox, but almost every product has some PLG surface.
What is the difference between Surge and Lumen for experiment design?
Lumen's lumen-abtest skill handles the statistical design of A/B tests, power calculations, sample sizes, and decision rules for product analytics experiments. Surge's surge-experiment skill handles the growth strategy layer, what to test, why, in what sequence, with what kill conditions, and connects to Lumen for the statistical rigor. They work best together.
Is Tonone's Surge free?
Yes. Tonone is MIT-licensed and free to use. Surge is one of 23 agents included in the Tonone package. You pay only for Claude Code token usage during the work itself.
How do I start with Surge if I have never done structured growth work?
Start with surge-recon. It maps your current growth equation, acquisition rate, activation rate, retention at day-7, day-30, and day-90, compares it to category benchmarks, and identifies the single weakest link. That becomes the priority for your first experiment. surge-recon produces a baseline that compounds with every experiment cycle you run from it.

Pairs well with