Quick fix! by Sharon

You’ve created your first test and you now see some data on your results. As it turns out, according to these results the alternative page you’re testing is way better than the original one. But… can you trust these results? Is the alternative page really better than the original one? Today, I’d like to talk about what statistical significance is, its importance, and why you should be careful when calling a winner. Or, as people usually call it, “how to tell if an experiment is over or not”.


A few weeks ago my colleague Antonio described how to build and test an optimization plan. There, he explained the importance of following and applying the scientific method on your A/B tests. As we’ve already discussed, A/B Testing is a process that consists on the following steps:

  1. Understand your business. Identify your goals and analyze how you’re performing compared to your competitors.
  2. Understand your web. You’ll be probably using A/B Testing on your website, so take a look at its performance and identify possible improvements.
  3. Form a hypothesis. Address one of the issues you’ve identified in the previous steps. Think how it can be solved and propose a solution.
  4. Create and run your test. Once you know what to test, it’s time to implement it as an A/B test.
  5. Act on the results. Finally, once the results you get are conclusive, it’s time to make the best alternative permanent and move on to the next test.

If you want to run successful A/B tests, you’ll have to follow all the previous steps, paying special attention to the first three. Remember you’re supposed to think before acting, so always take your time to understand your environment and to design new tests.

Once you’ve started your experiment, and assuming you have a site with enough traffic, you’ll start to see the first results of your tests very soon. For instance, you might have the following data after the first 20 minutes:

  • Original Page. 3 conversion per 6 page views (conversion rate: 50%).
  • Alternative Page. 2 conversions per 8 page views (conversion rate: 25%).

In this example with the data we have right now, it’s quite clear that the original page is better than the alternative, isn’t it? I mean, look at the results, original’s conversion rate is twice the alternative’s! Let’s stop the experiment, accept the data, and create a new experiment, shall we?

Crash dummy by Clement127
Crash dummy by Clement127.

Sample Size and Statistical Significance for Dummies

Not so fast! Sure, the original page has a better conversion rate right now… but look closer: it only had one more conversion than the alternative (3 vs 2) and the total number of page views we had in the experiment is just 14. Your intuition is probably telling you that you should wait a little bit more before claiming a winner; that is, wait until the numbers are bigger and you’ll be more confident of the result. And, luckily for you, your intuition is right!

With split testing techniques, you start by hypothesizing that a certain variant on a given page (in our example, the landing page) will improve your conversion rateThis hypothesis is just a belief, not a certainty. Therefore, you need to test it and check whether it’s right (and thus the alternative actually converts better than the original version) or wrong (it doesn’t convert better, either because both variants convert equally well or because your alternative actually converts worst).

When you run the test, you start collecting data that will either support or disprove your hypothesis. We’ve just seen that 14 page views are not enough to tell whether we’re right or wrong. What about 100? Or 500? Shouldn’t it be better if we wait until we have 1,000 page views? Or a million? Clearly, the bigger your sample is, the more confident you can be about your results.

When you run an A/B test, you don’t want to wait forever; you just want to be “confident enough”. Statistical significance is about confidence. In statistics, statistical significance is attained when a p-value is less than the significance level. For example if the p-value less than 5%, then your result is at least 95% likely to be accurate. In other words, there’s a 5% probability that you called a winner that wasn’t actually a winner; it just won because of pure chance.

Standard normal distribution shading area between −1.96 and 1.96
In a two-tailed test, the rejection region for a significance level of α=0.05 is partitioned to both ends of the sampling distribution and makes up 5% of the area under the curve (white areas). Source: Qwfp CC BY-SA 3.0

In principle, you should wait (at least) until the results are statistically significant before stopping an experiment. But what happens if no alternative is better than the other? “Good scientific practice, a significance level is chosen before data collection” (by default, Nelio sets it to 95%, i.e., p < 0.05 ) and the sample size is also fixed beforehand. There are a few calculators on the web that will help you which values should be used (depending on the uplift you expect, your current conversion rates, and so on). As a rule of thumb, I recommend using a 95% significance level and setting the sample size to 1,500 or 2,000 page views per test (assuming the test has two alternatives only). Using these numbers and being subscribed to our Basic Plan, you’ll be able to run two to three tests per month.

It’s Over! Now What?

Once you’re experiment is over, there are a few things you can do:

  • If your hypothesis was right… Congratulations! You designed a great experiment that helped you improve your site. Moreover, your beliefs were right; it looks like you know a lot about your business and your customers 🙂 But that doesn’t mean you’re done. Consider taking the following steps:
    1. Do you have any other hypothesis you want to test along with the first one? Now that’s the time to test them! It’s impossible to get a perfect website, but it’s always possible to get a better one. Keep testing and tweaking your site and seek further improvements.
    2. Can the same hypothesis be applied to other locations? Imagine you’ve tested the call to action label on your landing page. If you have call-to-action buttons elsewhere in your site, why don’t you run the same test on these other pages? If it worked on your landing page, it might also work there!
  • If your hypothesis was wrong… you might think that you’ve lost some precious time. But that’s not true! An unsuccessful test is also a success, because it gives you powerful insights and helps you disprove a wrong belief you had. It prevented you from implementing a change in your marketing strategy that might have ended up being harmful, and such information is always welcome.
    • Review your work. Why did you believe in your hypothesis? Why didn’t it work? Proper A/B Testing is not only about improving your conversions, but also understanding your business better. Even if tests don’t work out as expected, there’s a lesson to learn.

All in All…

A/B Testing is not difficult, but it requires some time. Understand your business, think about hypothesis, and validate them with tests. That’s how you’re going to improve your conversion rate and you’re going to gain a deeper understanding of your target market. The only important things to remember when running a test are (a) let your tests run until the pre-defined sample size has been reached and (b) only after that look at the confidence of your results.

Never stop thinking about your business, your customers, your competitors, and your site. These four elements are the starting point of your testing strategy.

Featured image by Sharon.

Leave a Reply

Your email address will not be published. Required fields are marked *

I have read and agree to the Nelio Software Privacy Policy

Your personal data will be located on SiteGround and will be treated by Nelio Software with the sole purpose of publishing this comment here. The legitimation is carried out through your express consent. Contact us to access, rectify, limit, or delete your data.