Statistical Confidence is the probability that the results of your test are not due to chance. In a hypothesis test, we are trying to determine if our sample would accurately reflect the entire population. We do not want the result to be due to chance, but rather because of the change made in the variant. Results should be determined to be statistically significant before accepting or rejecting the change. Generally, a statistical significance of 5% or lower is accepted, meaning there is a 5% chance that the results are due to chance.
In Compose, a confidence of 90%, 95%, or 99% can be chosen. We recommend at least 95%, but you can choose 99% for a stricter test, or 90% if you are fairly confident in a positive result and want a shorter test. The stricter the test, the more statistically confident you are that the results from your sample would hold true for the entire population. However, the 99% confidence level test will require more traffic and likely take longer to run.
For example, we have an A/B test with 1 variant with a goal to increase revenue conversions. Our variant reaches 95% confidence with a 5% lift in revenue conversions. This means that we are 95% confident that if all visitors were directed to the variant, we would see a 5% lift in conversions. If we were to repeat this experiment 100 times, 5 out of 100 samples would see a change in revenue conversions greater or less than 5%.
Tests not reaching stat sig? Read more on why and tips for designing conclusive experiments