GenGrowth
Methodology|

How to Run Your First Growth Experiment: A Step-by-Step Playbook

GenGrowth Team·9 min read·Updated February 20, 2026

Most growth experiments fail because they skip the boring parts: hypothesis formation, sample sizing, and measurement criteria. This playbook covers the full process from idea to iteration, with templates you can use today.

Why Most Growth Experiments Fail

According to data from Reforge and GrowthHackers, roughly 70-80% of growth experiments fail to produce statistically significant results. That number sounds discouraging, but it is actually expected -- the goal is not to win every experiment, but to run enough experiments that the winners compound. The problem is that most teams do not run enough experiments, and the experiments they do run are poorly designed.

The three most common failure modes are:

  • No clear hypothesis. "Let us try posting on Reddit" is not a hypothesis. "Posting value-first content on r/SaaS will drive 50 qualified visitors per post within 7 days" is a hypothesis.
  • No success criteria defined upfront. If you do not decide what "success" looks like before you start, you will rationalize any result as a win or dismiss any result as inconclusive.
  • Insufficient sample size. Running an A/B test with 200 visitors and declaring a winner is statistically meaningless. You need to calculate minimum sample sizes before launching.

Step 1: Form a Testable Hypothesis

Every experiment starts with a hypothesis that follows this structure:

"If we [take this action], then [this metric] will [change in this direction] by [this amount], because [this reasoning]."

Examples:

  • "If we add a product comparison table to our pricing page, then our pricing-to-signup conversion rate will increase by 15%, because visitors currently leave to compare us with competitors on third-party sites."
  • "If we publish 10 glossary pages targeting long-tail keywords, then organic traffic from informational queries will increase by 2,000 monthly visits within 60 days, because we currently have zero coverage for these terms and competition is low."

Notice that each hypothesis includes a specific metric, a directional prediction, a magnitude estimate, and a causal reasoning. This specificity forces you to think clearly about what you are testing and why.

Step 2: Define Success and Failure Criteria

Before running any experiment, write down three things:

  1. Primary metric: The one number that determines success or failure. For SEO experiments, this is usually organic traffic or keyword rankings. For social experiments, it is click-through rate or qualified visits. For conversion experiments, it is the conversion rate itself.
  2. Minimum detectable effect (MDE): The smallest change that would be practically meaningful. A 0.1% increase in conversion rate might be statistically significant but practically worthless if it does not move revenue. Define the threshold that makes the experiment worth the effort.
  3. Guardrail metrics: Secondary metrics that must not degrade. If you are testing a more aggressive CTA, your primary metric might be click-through rate, but your guardrail should be bounce rate. If CTR goes up but bounce rate also spikes, the experiment is not a real win.

Step 3: Calculate Sample Size

For A/B tests and conversion experiments, sample size determines how long you need to run the experiment. The formula depends on three inputs:

  • Baseline conversion rate (your current rate)
  • Minimum detectable effect (from Step 2)
  • Statistical power (typically 80%) and significance level (typically 95%)

For a baseline conversion rate of 3% and an MDE of 20% relative improvement (to 3.6%), you need approximately 14,500 visitors per variation at 80% power and 95% significance. If your page gets 1,000 visitors per day, that is about 29 days for a two-variation test.

For content and SEO experiments, sample size works differently. You typically need 60-90 days of data to see organic traffic effects, because Google takes time to crawl, index, and rank new content. Plan your measurement window accordingly.

Step 4: Design the Experiment

Keep experiments as simple as possible. Test one variable at a time. Multi-variate tests sound sophisticated but require exponentially more traffic and introduce confounding variables.

Document the experiment design using this template:

  • Hypothesis: [from Step 1]
  • Primary metric: [from Step 2]
  • Guardrail metrics: [from Step 2]
  • Duration: [from Step 3]
  • Control: What the current experience looks like
  • Treatment: What the new experience looks like
  • Targeting: Which users see the experiment (all, new only, segment-specific)
  • Rollback plan: How to revert if something breaks

Step 5: Execute with Tracking

Every experiment needs clean attribution. At minimum:

  • Use unique UTM parameters for each treatment: utm_campaign=exp_001&utm_content=treatment_a
  • Log experiment assignment in your analytics system (Google Analytics 4, Mixpanel, or Amplitude)
  • Set up a real-time dashboard so you can monitor for anomalies (crashes, tracking failures, extreme outliers)

GenGrowth automates this through its execution pipeline, which assigns UTM fingerprints to every piece of content and tracks performance automatically.

Step 6: Measure and Analyze

When the experiment reaches its planned duration or sample size, analyze results using this checklist:

  1. Did the primary metric change in the predicted direction?
  2. Is the change statistically significant (p < 0.05)?
  3. Is the change practically significant (exceeds your MDE)?
  4. Did any guardrail metrics degrade?
  5. Are there segment-level differences (e.g., works for new users but not returning users)?

If the answers are yes, yes, yes, no, and consistent across segments, you have a winner. Ship it.

Step 7: Iterate

Every experiment generates learnings, regardless of outcome:

  • Winner: Ship the treatment and design a follow-up experiment to push the metric further.
  • Loser: Document why the hypothesis was wrong. Was the reasoning flawed, or was the execution imperfect? Revise the hypothesis and test again.
  • Inconclusive: Increase sample size or extend duration. If still inconclusive, the effect is likely too small to matter -- move on to higher-impact experiments.

Experiment Velocity Benchmarks

The best growth teams run 8-12 experiments per month across all channels. Early-stage startups with limited traffic should focus on 3-4 experiments per month, prioritizing high-impact channels. The key metric is not win rate -- it is experiment velocity. Teams that run more experiments learn faster and compound their advantages.

For more on how to set up the measurement infrastructure that makes rapid experimentation possible, see our guide on marketing attribution models. And for a real-world example of experimentation in action, read our Week 1 experiment report on social-first content distribution.

GT

GenGrowth Team

Growth Automation Engineers

We build tools that help product teams automate growth experiments.