The Short Answer
An A/B test should run for a minimum of two full business cycles (14 days) to capture day-of-week behavioral variation, and must reach a pre-calculated minimum sample size before you evaluate results — regardless of what the data looks like mid-test. The required sample size per variant depends on your baseline conversion rate, the minimum detectable effect (MDE) you care about, your desired statistical power (typically 80%), and your confidence level (typically 95%). For a landing page converting at 3% with a 20% relative MDE, most calculators will return a required sample of 4,700–5,000 visitors per variant — roughly 9,400–10,000 total visitors before the test is valid.
Understanding the Core Concept
Test duration is a derived output, not a primary input. You do not decide how long to run a test — you decide what effect size you want to detect, and the math tells you how long that will take given your traffic volume. The four inputs that determine required sample size (and therefore test duration) are:
Why Stopping Tests Early Destroys Statistical Validity
The most dangerous A/B testing behavior — practiced by a large majority of non-statistician marketers — is "peeking": checking test results before the required sample size is reached and stopping the test when a variant appears to be winning. Peeking inflates your false positive rate from 5% to as high as 20–40%, meaning up to 40% of "winners" identified through early stopping are actually random noise with no real effect.
Real World Scenario
Theory is necessary but not sufficient. Most marketers need duration estimates anchored in their real traffic volumes and conversion rates. The following scenarios cover the most common A/B testing contexts and give specific duration estimates.
Strategic Implications
Understanding these implications allows you to proactively manage your operational efficiency. Utilizing our specific tools provides the exact data points required to prevent margin erosion and optimize your strategic approach.
Actionable Steps
First, audit your current numbers using the calculator above. Second, identify the largest gaps between your actuals and the standard benchmarks. Third, implement a tracking system to monitor these metrics weekly. Finally, review your process every quarter to ensure you are continually optimizing.
Expert Insight
The biggest mistake companies make is relying on generalized industry data instead of their own precise calculations. When you map your exact costs and parameters into a standardized tool, you unlock compounding efficiencies that your competitors often miss.
Future Trends
Looking ahead, we expect margins to tighten as market pressures increase. The companies that build automated, real-time calculation workflows into their daily operations will be the ones that capture the most market share in the coming years.
Historical Context & Evolution
Historically, these calculations were done using rudimentary spreadsheets or expensive proprietary software, making it difficult for smaller operators to accurately predict costs. Modern, web-based tools have democratized this process, allowing immediate, precise calculations on demand.
Deep Dive Analysis
A rigorous analysis of this topic reveals that small percentage changes in these core metrics produce exponential changes in overall profitability. By standardizing your approach and continuously verifying against your specific constraints, you build a resilient operational model that can withstand market fluctuations.
3 Rules for Running Valid A/B Tests
Calculate Sample Size Before You Launch, Not After
The only valid way to set test duration is to pre-calculate the required sample size based on your MDE, power, and confidence level before the test goes live. Calculating sample size after observing data ("we got 500 visitors and saw a winning result, is that enough?") is a post-hoc rationalization that inflates false positive rates. Commit to the sample size before launch — and treat any result observed before that threshold as preliminary data only.
Set Your MDE Based on Business Impact, Not Wishful Thinking
Before entering an MDE into a sample size calculator, ask: "If this change produces exactly this effect, would we implement it?" A 2% relative uplift on a checkout page converting 1,000 transactions per month generates 20 additional sales. If your average order value is $150, that is $3,000 in additional monthly revenue. Is that worth the engineering and design resources spent implementing the change? If not, set your MDE higher — and accept that you may not detect smaller effects that are operationally too small to matter.
Run One Test at a Time on the Same Page or Flow
Running simultaneous tests on the same page (testing the headline and the CTA button at the same time as separate A/B tests) contaminates both experiments. Users exposed to both variants create interaction effects that make it impossible to attribute a conversion change to either individual element. If you need to test multiple elements simultaneously, use a full multivariate test with proper factorial design — or test sequentially, one change at a time.
Automate Tracking Integrate your calculation process into your weekly operational review to spot trends early.
Validate Assumptions Check your base numbers against actual invoices and costs quarterly to ensure accuracy.
Glossary of Terms
Metric
A standard of measurement.
Benchmark
A standard or point of reference.
Optimization
The action of making the best use of a resource.
Efficiency
Achieving maximum productivity with minimum wasted effort.
Frequently Asked Questions
Disclaimer: This content is for educational purposes only.