We no longer have to rely on our design instincts. We now have more data at our disposal than ever before to help us determine if our design choices really improve the user experience of our digital products and services and lead to more conversions. A/B testing — a technique of showing two or more variants of a design to users at random to find out which one performs better — is just one approach you can use.
A/B testing has many benefits. Zoe Gillenwater, lead designer at Booking.com, has learned that it democratizes design and distributes decision-making power not just across your organization but also outside of it — directly into the hands of your users.
“It forces you to stop making design decisions based on your personal preferences, biases, and ego, and instead lets your users ‘vote’ via their behavior,” Gillenwater explains. “When A/B testing is really integrated into your way of working, it also prevents executives from mandating specific changes from on high. At Booking.com, we like to say we avoid HiPPOs: the Highest Paid Person’s Opinion. Opinions and assumptions are fine, but you’ll have much more success if you validate them with data from a variety of sources, including A/B testing.”
But just how do you conduct efficient A/B tests? And how do you analyze the data and convert it into insights? We asked five UX designers for their “top things to remember” when planning and carrying out A/B tests.
Base your strategy on your traffic
UX designer Marisa Morby believes that any A/B test should first and foremost have a solid strategy based on website traffic.
“If you have high traffic, say 5000 or more unique hits per day, you can test prototypes or designs rapidly and continually to consistently pull in new data,” she explains. “For high traffic sites, your A/B test should be broken out to contain the smallest change possible, so you can narrow down what change is actually impacting your most important metrics.”
If you have low traffic (300 or less unique hits per day), Morby recommends your A/B test should be large and impactful. “You can do this by running one test at a time and testing two completely different designs. This helps you see clearly which design is more favorable.”
Gillenwater, meanwhile, says you have to have enough traffic to get statistically significant results. You can calculate how much traffic you need and for how long you need to run a test using an online A/B test power calculator.
“If you run a test with too little traffic or for too little time, the data that you get is likely to be wrong,” Gillenwater warns. “Basing your product decisions on wrong data is worse than basing it on no data at all.”
Create a strong hypothesis
The cornerstone of A/B testing, Gillenwater believes, is formulating a solid, user-centered hypothesis.
“Everything falls apart without it,” she warns. “A good hypothesis should tell you why you’re making a change — what the user problem is that you’re trying to solve — along with what the change is, who it’s for, what outcome you expect to happen for these users, and how you are going to measure that outcome. Without a hypothesis, you will be running tests blindly and end up with a big pile of data that you don’t know what to do with.”
When you do not have a strong hypothesis, Gillenwater says that you’ll be able to cherry-pick that data to support assumptions you already hold, rather than evaluating only the most relevant metrics against your pre-set hypothesis of what should happen to them to indicate success.
“Never, ever run a test just because you can,” Gillenwater cautions. “Only run a test because you have a strong hypothesis of why it will be worthwhile to make that change.”
Prioritize test ideas
Designer Nick Disabato, who runs interaction design consultancy Draft and has written a book on Value-based Design, points out that the industry success rate with A/B testing is around 12.5 percent. Draft’s success rate is over 60 percent, and ConversionXL’s is over 90 percent. What are they doing differently? Disabato says they know what to test and when.
“It’s not enough to test button colors or headlines,” he explains. “Sophisticated optimizers research what to test, and prioritize test ideas accordingly. Knowing what to test is the most important part of any optimization program.”
For every test idea, Disabato suggests evaluating its feasibility, impact, and strategic alignment:
- Feasibility. How hard is this to build out? Does it require development effort, new prototypes, wireframes, or feature flags?
- Impact. How likely is this change to make an impact on the metric that you’re measuring? Are you changing a major element above the fold, like a headline; or are you changing something small that few people pay attention to?
- Alignment. How much does the test idea align with the business’s long-term strategy?
“Add up each score and sort your test ideas in descending order,” Disabato advises. “Now you should have an idea of what to test — and why. Do this with any new test idea, and re-evaluate the whole list every couple of months.”
Embrace and prepare for failed tests
Ideally, the outcome of an A/B test is an overwhelming “aha!” moment but the truth is that most A/B tests fail to produce statistically significantly results. If that’s the case, UX designer Zoltan Kollin, senior design manager at IBM Watson Media, suggests you can always decide to run your experiment a bit longer.
“Maybe some extra traffic will get you into the statistically significant zone,” he says, “but there’s a point when you need to admit that your brilliant idea will not boost conversion rates. If you’re confident about your hypothesis, you might want to reconsider your execution. Maybe the design difference was too subtle for users to notice, let alone to lead to significant results. Maybe you didn’t iterate on the most impactful screen or design element. Why not go back and ideate on other design solutions, focusing on the big picture?”
If your hypothesis was completely disproved by an A/B test, don’t give up. Kollin points out that at least you managed to find out in time. More user research might reveal the reasons and discover some further opportunities.
“The good news is that an A/B test is a validation tool, so it literally cannot fail,” Kollin encourages. “Your preferred design will not always win but you can measure the impact of UI changes to make informed design decisions. And by the way, sometimes a ‘no difference’ result is just perfect: it gives you scientific proof that you can implement your preferred design at no risk.”
Gillenwater agrees and reveals that at Booking.com roughly nine out of 10 tests fail — a very normal rate in the industry. But failed tests are still incredibly valuable.
“Failed tests succeed in proving that something doesn’t work,” she points out. “The data you glean from them helps you formulate your next hypothesis for your next iteration, increasing your chances that it will succeed on the next try. This is one reason why we mostly run small, quick-to-set-up tests at Booking.com — when we ‘fail fast’ we can learn, iterate, and succeed more quickly.”
A/B tests don’t tell the whole story
Booking.com’s Gillenwater says it’s important to remember that the data you get from A/B testing is not the story.
“It’s just one insight among a whole host of insights that you should use to figure out the story of your users’ experiences,” she points out. “The same data can tell different stories. If you didn’t start out your test with a solid hypothesis based on other insights, you’ll have no reliable way to understand what the resulting data is telling you about the story. For instance, let’s say you make a change and the time spent on the page goes down. Is that good or bad? I can make up lots of stories to spin it either way. The story that’s closest to the truth will depend on what my hypothesis going into the test told me about what should happen to the time on the page, along with other supporting data, both from other quantitative metrics and qualitative research like user testing.”
Morby agrees and says that A/B testing works wonderfully in conjunction with customer research to help you understand why people are behaving a certain way.
Don’t lose focus of the user experience
A/B testing is a great tool for determining which variation of a design will perform better quantitatively. However, as Julian Gaviria, director of user experience at Thomasnet.com points out, A/B testing falls short when determining which variation provides a better user experience.
“With a surge in popularity during the era of ‘data-driven’ design, it’s becoming increasingly common to see websites over-optimize for conversions while simultaneously destroying their user experience,” he warns. “The problem with solely focusing on conversions is that you’re likely to sacrifice sustainable long-term growth for short term gains.”
To strike a balance between the two, the team at Thomasnet.com has come up with the following processes, which helps them pay as much attention to their quantitative KPIs (for example, user feedback and user recordings) as to their qualitative KPIs (for example, user registrations and form submissions) when running any type of A/B test.
Integrated A/B testing stack
“We have our A/B testing stack deeply integrated with our user feedback and user recording tools,” Gaviria explains. “Whenever a user leaves feedback, we’re able to see the respective A/B test and variation that the user was bucketed into along with a link to the recording of the session. This becomes extremely useful in identifying adverse effects and frustration points which data alone wouldn’t surface.”
Team-wide game film reviews
To make sure that the information collected through the testing stack is democratized, the team holds weekly “game film reviews” for team members across all departments to watch user recordings from tests launched that week.
“This has been one of the best habits we’ve been able to develop as an organization,” Gaviria points out. “Not only do they help us obtain qualitative insights on our A/B tests, these optional meetings usually spark engaging conversations resulting in new testing ideas, different perspectives, and shared common knowledge across all teams.”
User-centric design principles
Regardless of how much user feedback is being received and how many user recordings are being watched, there’s not much you can do if the tests you create don’t have the best intentions for your users.
“Our design decisions when crafting experiments are guided by a set of user-centric design principles specific to our organization,” Gaviria explains. “This has been a great time-saver by helping steer test ideas away from destructive dark patterns towards sustainable designs aimed to improve the overall user experience.”
The success of failure
Not every product can be A/B tested, and even if it can, the process takes some getting used to. Most of the tests are going to fail, and you’ll find that most of your assumptions were wrong and that your expertise isn’t as great as you thought it was, which can feel frustrating. It’s important not to give up, however, and learn how to get as much value out of failed user tests as you can. A/B testing is a powerful addition to the UX designer’s toolkit but remember not to test for the sake of testing and keep your users in mind at all times. When done right, measuring and testing real-time user behavior with A/B testing can dramatically improve the user experience and conversions alike.