At Compose, we're committed to providing you with robust tools for A/B testing that offer accurate insights into your experiments. One crucial aspect of conducting A/B tests is understanding how statistical calculations are performed. In this article, we'll delve into the methodology behind our statistical calculations, focusing on the Frequentist approach we employ.
What is the Frequentist approach?
The Frequentist approach to statistics is based on the idea of repeated sampling. It emphasizes the long-run frequency or probability of events occurring under certain conditions. In the context of A/B testing, this means making inferences about population parameters (such as conversion rates) based on sample data, assuming that the same experiment could be repeated infinitely.
Benefits of the Frequentist approach
A/B testing tools will use either a Frequentist or Bayesian approach for their statistics calculations. Here's a few reasons why at Compose we started with the Frequentist approach, however, we are planning to launch both options on the platform later this year, since there are pros and cons for each.
Objectivity: Frequentist methods are often perceived as more objective because they rely solely on the observed data and do not incorporate subjective prior beliefs.
Widely Accepted: Frequentist statistics have been the traditional framework for hypothesis testing and are widely taught and used in various fields. Many standard statistical tests and procedures are based on the Frequentist approach.
Clarity in Interpretation: Hypothesis tests in the Frequentist framework provide clear, binary outcomes (reject or fail to reject the null hypothesis), which can be easier to interpret for decision-making purposes.
Less Subjective: Frequentist methods do not require specification of prior distributions, making them less dependent on subjective judgments.
Key concepts in Frequentist statistics
Null Hypothesis (H0): This is the default assumption that there is no significant difference between the control and experimental groups.
Alternative Hypothesis (H1): This is the assertion that there is a significant difference between the groups.
Test Statistic: A test statistic is a numerical summary of sample data that is used to assess the truth of the null hypothesis. It provides a basis for making decisions about the population parameter.
p-value: The p-value is the probability of observing a test statistic as extreme as, or more extreme than, the one calculated from the sample data, assuming that the null hypothesis is true. A lower p-value suggests stronger evidence against the null hypothesis.
Significance Level: Denoted by α, the significance level represents the threshold below which you reject the null hypothesis. Commonly used significance levels include 0.05 or 0.01, indicating a 5% or 1% chance, respectively, of falsely rejecting the null hypothesis. In Compose you designate the significance level by setting a confidence interval in the goals section when creating your experiment.
Confidence Interval: A confidence interval is a range of values constructed around a sample estimate of a population parameter. It provides a measure of the uncertainty associated with the estimate. For example, a 95% confidence interval indicates that if the experiment were repeated many times, 95% of the intervals would contain the true population parameter.
Example calculation
Suppose you are conducting an A/B test to compare the conversion rates of two website designs. After collecting data and calculating a t-statistic of 2.36, you find a corresponding p-value of 0.021. With a significance level of 0.05, since the p-value is less than α, you reject the null hypothesis and conclude that there is a statistically significant difference in conversion rates between the two designs.