A/B testing is a powerful tool because it allows you to make data-driven decisions and optimize your website to increase conversions or achieve other business goals. By testing different variations of your webpages, you can learn what works best for your audience and make changes that improve your website’s performance.
Let’s say you have a website and you’re not sure whether the color of your “Start Now” button should be orange, green, or red. You could create three versions of your webpage, one with an orange button, another with a green button, and the other with a red button, and then show each version to a random group of visitors.



By tracking visitor behavior, such as how many people clicked on the button or made a purchase, you can see which version of the webpage performed better.

The graph above shows that variant B achieves a conversion improvement over the original version of almost 135%, and variant C achieves a conversion improvement of more than 60%.
It’s tempting to simply declare the version with the highest conversion rate as the winner in an A/B test and make it the final version of your website, isn’t it? But wait, don’t bee too hasty! It’s important to consider statistical significance before making any decisions.
Statistical Significance
Statistical significance refers to the likelihood that a result is not due to chance. In A/B testing, it means that the difference in conversion rates between the two versions of the webpage is not just a fluke or coincidence.
In statistics, this “likelihood” is calculated through the p-value and is measured between 0 and 1. The p-value tells us the strength of evidence against a null hypothesis (that is, whether the data we collected occurred by chance alone or not). There are several tests to calculate the p-value, including the T-test, Chi-square test, G-test, ANOVA, and Regression analysis, among others.
In A/B testing, the null hypothesis is that there is no difference between the variants and the original version. In our example, after collecting the page visits and clicks to color buttons of the different variants, we calculate the p-value.
If the p-value is low (say, the commonly used threshold of 0.05), this suggests that the observed results are unlikely to have occurred by chance. You can reject the null hypothesis (i.e. there is no difference between the original page and the variants created) in favor of the alternative hypothesis (i.e. different button colors do indeed have an impact on conversion).
In general, the p-value is a way to help you decide whether your test results are statistically significant and whether you can draw reliable conclusions from your data. So, in any A/B test you perform on your website, never make a final decision without verifying that your data is statistically significant first.

The results you get with Nelio A/B Testing always show the confidence level of the observed results. If the confidence shown is higher than 95% (this is, a p-value less than 0.05), you can be reasonably confident that the difference between the original version and its variants is meaningful and not just a result of random variation. However, if the difference is not statistically significant, we cannot be sure that the winning variation is truly better, so either ignore the test result or take it with a grain of salt.
Other Common Mistakes
In addition to the statistical significance, there are several common mistakes that you should be aware before jumping to conclusions when A/B Testing.

Nelio A/B Testing
Native Tests for WordPress
Use your WordPress page editor to create variants and run powerful tests with just a few clicks. No coding skills required.
Lacking a clear hypothesis
Let’s take again our simple example. To increase sales, we just could decide to run an A/B test to see if changing the color of the “Start Now” button on our website “will have any effect.” This is not a clear hypothesis.
Now, before you start a test, you need to have a clear hypothesis. In other words, you need to know what motivates the change and what you expect to happen when you make this change. For example, our hypothesis could be something like this: “Since green is associated to Going and red is associated to Paying attention, changing the color of the Start Now button from orange to either green or red will result in a 20% increase in click-through rates and a 10% increase in sales.”
By having a clear hypothesis like this, you know exactly what you’re trying to achieve and what metrics you need to measure to determine if the test is successful or not. Moreover, it also helps you to focus your test on the most important changes. For example, instead of changing the button color, you might decide to also change the font size, add a new image, or adjust the pricing.
A clear hypothesis is like having a roadmap for your A/B test. It helps you to stay focused, measure the right metrics, and make informed decisions based on the results.
Testing too many variations
Testing too many variations in an A/B test is like trying to juggle too many balls at once. It can make things more difficult and increase the chances of dropping some.
When you test too many variations, you may end up with results that appear significant but are actually due to chance. That is, you increase the likelihood of what’s called “false positives.”
Testing too many variations can also spread your resources too thin, making it harder to detect meaningful differences between them. With so many variations, you are dividing your traffic among them, reducing the statistical power of your test. This means that even if there is a significant difference between your variations, it may be harder to detect.
You also increase time, resources, and complexity. It can take a lot of time and resources to analyze the different results and it is harder to interpret your results and draw meaningful insights.
Testing for too short a time
We said that the statistical significance of an A/B test depends on the amount of data collected and the difference in performance between the variations. You may think that once your tests have run for long enough obtain statistically significant results, it is sufficient to have reliable conclusions… Well, not so fast.
There are other factors related to timing to take into account on your website. Do you have seasonal or cyclical variations in your user behavior? For example, if you run a test for a short period during a holiday season, you might observe changes in user behavior that are not actually related to the change you are testing. By running tests for longer periods, you can capture more data and get a more accurate picture of how changes affect user behavior over time.
Running A/B tests for longer periods can also help you detect seasonal or cyclical variations in user behavior. For example, user behavior may differ during the holiday season: they may be more likely to make purchases, or they may be more likely to be interested in certain types of products or content. Therefore, if you run a test for a short period during a holiday season, you might observe changes in user behavior that are not actually related to the change you are testing (unless you only want to test the holiday season).
In other words, when creating A/B tests it is also important to take time into account from the point of view of capturing the temporality of your products or user behavior to lead accurate and reliable results.
Conclusion
You should not run A/B tests randomly. Instead, you must first have a clear idea of what it is you want to improve, hypothesize which changes might lead to such an improvement, run the test and wait until you get data that’s representative of your users and business and, above all, do not rush to implement changes if the results are not statistically significant.
With Nelio A/B Testing, you know at all times the confidence level of the results obtained and, not only that, once you have found a winning version, you only need to click on the apply button of the final variant to apply the changes as definitive.

Featured Image by Markus Winkler on Unsplash.
Leave a Reply