If your experiment is struggling to reach statistical significance there are several potential culprits. As a reminder, statistical significance is a measure of how unusual your results would be if there was no difference between your variation and control (i.e. the lift was due to random chance). Unfortunately, the reality of A/B testing is that the majority of your experiments will most likely be inconclusive, so don't be discouraged. But keep reading to learn some possible causes and tips to help you reach conclusive results more often.
Changes are too small
Sometimes small changes make an enormous impact. But other times these tweaks may not be significant enough to actually change user behavior.
A small change may result in a lift, but if the lift is small, it may take longer to reach confidence since your confidence calculation is telling you how likely it is that the lift is not random chance. Or there's a chance that there is no statistical change and your test may never reach stat sig. You can read more on determining when to stop an experiment here.
Our best advice to avoid this problem is to focus on hypotheses that are expected to really move the needle. These can be small or large, but make sure that they are expected to actually impact your website's visitors behavior. For example, changing the copy on your homepage hero headline or a primary CTA button may be a small update that could have a significant impact. Whereas if you are adjusting the headline of a section buried further down the page you may not see the lift you're hoping for.
Not enough conversions
Another common problem is that you aren't getting enough conversions to confidently determine if there is a statistical difference between variants. This could be because you have a low-traffic website or perhaps because you are running a test that is only triggered by a small audience.
In these instances, it could take a very long time to reach statistical significance, especially if the change is only having a small impact.
For low-traffic websites, if you are consistently struggling to reach statistical significance, you can adjust your threshold to 90% rather than the standard 95%. Although there is more of a chance for a Type I error or a "false positive" you can continue experimentation more rapidly, and although there is a chance the experiment was not statistically impactful, the likelihood that it was impacting you negatively is very unlikely.
We also recommend ensuring that you set your experiments up to succeed by ensuring you are testing a large enough audience. Remember, you can always retroactively filter your results on the reports page to see if a specific device type, browser, location...etc. is being impacted differently. So try not to over-prune your audience to reach statistical significance faster, especially if you're testing on an already low-traffic page.
Too many variables
This really falls under too few conversions as well, but we separated it because it's a common problem. Since your traffic is divided among your variants, the more variants you add, the fewer conversions each will receive. Although it may be tempting to add lots of variants to test a bunch of options simultaneously, this may not be the most advantageous approach.
Depending on your website traffic, we recommend limiting the number of variants tested in a single experiment to ensure that each receives enough conversions to provide meaningful results. There's no exact rule for this, but if your tests aren't reaching confidence and you're running 6 variables in each experiment, maybe tone it back to 2 or 3 and see if it helps.