Why isn’t A/B testing done all the time by everyone? In this article I’ll discuss the A/B test promise, guilty secrets common to all tool providers, and how to get out of A/B testing purgatory.

There was a time, not so long ago, when marketers thought the internet was finally going to answer Wanamaker’s remark famous quip: “half the money I spend on advertising is wasted; the trouble is I don’t know which half.”

There is no greater exponent of marketing nirvana than A/B testing. But how easy it, really, to run a successful A/B test?

A/B testing – where are we?

Many companies with business-critical website already have an A/B test tool on their website. They tend to fall into two camps:

      a) They’re spinning their wheels in A/B testing purgatory. They run random tests but they’re not seeing real uplift in terms of business growth.

      b) They tried A/B testing but stopped a while ago due to resource and prioritisation

By now, you’re familiar with the A/B tool promise: “put our code on your website, run an A/B test, increase revenue!”

Is your A/B test tool is lying to you?

What A/B test tools don’t tell you

‘Back in the day’ (i.e. five years ago:)) most companies were using either Visual Web Optimizer or Optimizely. Now you’ve got Convert Experiences, SiteSpect, AB Tasty, Google Optimize and a dozen others. The industry has increasingly fragmented over the years. It’s a bull market.

All A/B test tools are powerful, easy to setup and use. I tell businesses that the choice of tool isn’t important. What matters is how you use them.

The challenge is twofold: leveraging data and understanding statistics. I discuss the former in my article The biggest barrier to growth according to marketers; this article is concerned with the latter.

All the A/B test tools I’ve come across are guilty of papering over the cracks when it comes to statistics.

Test duration

The statistical problem starts with test duration. How long should the test last? Your A/B test tool doesn’t care! The reason to determine test duration is to stop you, the tester, from stopping the test when you feel like it i.e. when the results look good to you. That’s because results go up and down all the time:

Figure 1 test results fluctuate due to chance

Figure 1: test results fluctuate due to chance. Don’t randomly stop the test, pick a point in time when it will end.

Before launching the test, determine when it will end. Don’t determine when to stop during the test.

How do you determine, before a test, when it will end? Several variables determine test duration:

  1. Traffic (sample size). How much traffic does the test page receive? Tip: base this on users not pageviews or unique pageviews.
  2. Your conversion metric. For example, will you measure the test hypothesis based on add-to-basket clicks, or product pageviews, or what? Are you measuring sales or revenue? The former is a binomial metric (either a sale happened or it didn’t), the latter is a non-binomial metric (a sliding scale of revenue).
  3. Conversion rates. You’ve got your conversion metric, what is the test page conversion rate?
  4. Statistical confidence. How much risk of seeing a false positive (type I error) are you willing to accept (meaning the results look positive but it was a fluke)? If you choose 95% confidence, you are saying you’re happy to accept a 5% chance that results are down to chance (1 in 20).
  5. Statistical Power. The inverse of statistic confidence. A test that is 80% powered means you have an 80% chance of seeing a true result if there really is a difference, and a 20% chance of seeing a false negative (type II error), where the results look negative but it was down to chance.
  6. Minimum detectable effect. If you got a 5% uplift in conversion, is it even worth running the test? MDE should be the difference one would not like to miss, if it existed. Choosing an MDE is done with commercial viability in mind.

Use all of the above metrics to calculate test duration.

Are there any other risks with A/B testing

Sure there are! To name a few:

  • Slow Converter Effect. When you stop a test, some people have been counted as part of the test but didn’t get a chance to convert yet. There are ways to to mitigate this, but most A/B tests don’t.
  • Simpson’s Paradox. Imagine your test variation hugely benefited mobile users but had the reverse effect on desktop users. All in, you didn’t get an uplift (but you got an insight!).
  • Novelty effect. Imagine you’re a consumer and you’ve been visiting a website for a while. One day, you see a test variation (you don’t know it’s a test) and you say to yourself “hey, that shiny orange button is new, I’m gonna click on it!”. That’s the novelty effect. There are ways to mitigate for this.

Guilty as charged: A/B test tool providers

What they tell you (features) What they don’t tell you (risk)
✔ A/B testing & multivariate testing ❌ Test duration
 Segmentation & personalisation ❌ False positives
 Integration (e.g. with Google Analytics, HotJar) ❌ False negatives
 All-in-one CRO platform (e.g. session recording, heatmaps) ❌ Sample size
 Manage multiple experiments ❌ Binomial vs non-binomial metrics
 Multi-device ❌ Minimum detectable effect
 Speed ❌ Statistical confidence
 Uptime & infrastructure ❌ Statistical power
Security ❌ ‘Novelty’ effect
❌ ‘Slow Converter’ effect
❌ Simpson’s Paradox
❌ Other validity threats

Figure 2: A/B testing tools aren’t forthcoming with the nuances of statistics

The takeaway – A/B testing is risk management

Unfortunately, A/B test tool providers are more interested in promoting features than raising knowledge levels.

In reality, A/B testing is full of risk. We mitigate risk with effective use of statistics. Each and every A/B test is a balancing act between risk and growth (for example, accepting lower statistical confidence reduces sample size requirements and test duration).

It’s critical to understand how test metrics interrelate to increase chances that your testing programme is going to drive real growth for your business.

Struggling to get meaningful A/B test results that maximise website revenue and drive growth? Reticent to get knee-deep in statistics? Get in touch with me here or on LinkedIn, I’ll show the best practice approaches that world-class optimisers are using.