A two-tailed z-test is a statistical hypothesis test in which you analyze the probability of there being a difference between samples. There are a variety of tests that analyze different parameters, such as there being a specific increase or decrease between 2 groups, while a two-tailed z-test is one that is used to determine whether or not there is any significant difference between two groups. The test consists of a null hypothesis that is assumed to be correct and an alternative hypothesis that the test is trying to prove correct.
In Compose, we use this to test if there is a difference between your original (baseline) and your variants. Each variant is its own two-tailed z-test, which will tell us if there is a significant difference as compared to the baseline. We use a two-tailed z-test because we want to test whether there is a difference in general and not just test specifically for a positive or negative difference. The growth % show in the experiment will tell us whether it is a positive or negative difference, granted the results are statistically significant.
The confidence level chosen for the experiment dictates the significance level for the z-test. A 95% confidence would mean a 5% significance is used. This means there is a 5% chance that you results are due to chance and not due to the change made in the variant. Read more about confidence here.
Example:
We want to know whether or not a green checkout button will bring more revenue conversions than the original blue button.
The null hypothesis is: Green Button Revenue Conversions = Blue Button Revenue Conversions
The alternative hypothesis is: Green Button Revenue Conversions ≠ Blue Button Revenue Conversions
A statistically significant result would mean that we are confident that the population sample for the variant is different from the baseline. We would reject the null hypothesis and accept the alternative. This is used in conjunction with the growth calculation to determine whether or not we should make the change in the variant. If the growth is positive and the results are statistically significant, then we should make the change. If the growth is negative and the results are statistically significant, then we should not make the change. If the confidence never reaches the desired point, then the results are inconclusive.