This is one of the most common questions in A/B testing. There's no one right answer, because it is dependent on your business cycles, customers, traffic...etc.
Ideally, experiments are run until they are deemed conclusive and statistically significant. But if you reach statistical significance in a day, should you stop the experiment? In this article, we will provide general tips for determining when to stop an experiment, but ultimately you can decide what makes the most sense for your business and goals.
General recommendations
A common rule of thumb is that experiments should run for 2 weeks. We recommend a minimum of 7 days. If customers behave differently on different days of the week this should help mitigate outliers. It is also usually long enough to accumulate enough conversions to reach a statistically significant result and be representative of your complete audience. However, this is a very generic rule of thumb and you should always consider your unique situation when determining experiment length.
Ending a test too early
A common pattern in A/B testing graphs is two converging lines that initially showed a significant difference, but as time went on and the sample size grew the improvement became negligible. If you had stopped the test on day 3 or 4, you would have moved forward with implementing a variant that may not actually have the impact you thought. Basically, you're letting premature data make your decisions, which will muddy your insights and understanding of user behavior over time. And could lead to unnecessary development time and energy.
Statistical confidence as an indicator
If your experiment has reached confidence and 7+ days have passed, it's probably a good indicator to stop the experiment. But if you haven't reached confidence after a week or two should you keep waiting inevitably? In short, no.
It is common for experiments to be inconclusive, meaning they will never reach statistical significance because there is a negligible difference in conversion rates between the variants. We recommend looking at the graph to see the pattern. Is your test slowly converging over time with confidence shrinking, or are you nearing confidence by the day and just need a few more conversions before making a decision? When to stop an experiment boils down to a personal decision made based on your traffic, business goals, risk tolerance...etc. Learn more about why your test may not reach confidence.