The Value of A/B Testing
A/B testing, which is also known as bucket testing or split-run testing, is a randomized experiment that involves two variants (A and B). It serves as a way to compare two versions of a single variable in order to determine which of the two is more effective.
At Bigabid, we A/B test everything we can
What do we A/B test at Bigabid? Everything and anything! Here are some examples:
- Algorithms. Our media buying is done by a complex architecture of algorithms. Each component in the system is constantly being evaluated against alternatives. This is how we improve over time.
- Creatives. We constantly test different creatives against each other in an A/B testing type experiment, because different audiences respond to different things. This is key for delivering top performance.
- Audiences. We A/B test the audiences we target to measure the impact of different characteristics and help us fine-tune who the best target is.
- Features. Before a new feature finds itself in our global architecture, it’s measured separately in a controlled experiment.
Steps of A/B testing
The five steps outlined here offer a basic guideline that can be applied across the board when you want to perform A/B testing.
- Decide what to test.
What do you want to test? You need to be clear about this before you initiate an A/B test. Start with a single element, making sure that it’s relevant to the metric you want to improve. For example, if you’re interested in improving the quality of the audience you are targeting in terms of day 7 ROI (return on investment within the first 7 days of the install), consider testing different features within a given audience. This will help you keep only the high-quality users in the given audience, thus boosting your overall ROI. - Set goals or KPIs to be measured.
Before you initiate an A/B test, you also need to be clear on what you’re aiming to achieve. The above example aimed at improving day 7 ROI. You might be interested in anything from boosting CTR to increasing purchase rates and beyond. The success of your A/B test relies heavily on clearly setting the KPI or goal, as this will help ensure that you’ll generate cleaner data. Again, focus on a single metric, and remember that you can always do additional A/B test with other metrics later. - Design your test.
Now we’re starting to get creative. When you design your A/B test, you’ll need to create your A and B groups. When doing so, make sure that these groups are identical except for the metric you’re measuring. Also, be sure to avoid any biases that might skew your results. - Accumulate data.
Let your test run. Make sure you log or record all the data that might be relevant for you to later analyze the results. Also, make sure you give your A/B test enough time to run so it generates an ample amount of representative data. - Analyze the results.
Below, you’ll find information about how to draw conclusions based on which group “won” the test.
Frequentist or Bayesian?
The Frequentist method and Bayesian method are two approaches to statistics, each with its own unique view of, among many other things, analyzing the results of an A/B test. Each approach has its own set of advantages.
The Frequentist method
This method involves using p-values to choose between two hypotheses. The first hypothesis is the null hypothesis (there is no difference between A and B). The second is the alternative hypothesis (i.e. A!=B , A>B or A<B etc.).
A p-value measures the likelihood of observing a difference between A and B that is at least as extreme as what we actually observed. As soon as this value achieves statistical significance, the experiment is complete.
The Bayesian method
At Bigabid, we (for the most part) prefer the Bayesian approach in general, and also in A/B testing methodology as well. This involves modeling the metric for each variant as a random variable with some probability distribution.
In Bayesian A/B testing, we model each parameter as a random variable with some probability distribution. Based on prior experience, we might believe that that the CTR of some creative for a certain audience has some range of possible values, this belief is expressed in our prior distribution. After observing data from both variants, we update our prior beliefs about the most likely values for each variant. Below, I show an example of how the posterior distribution might look after observing data.
By calculating the posterior distribution for each variant, we can express the uncertainty about our beliefs through probability statements. For example, we can ask “What is the probability that the CTR of creative B is larger than the CTR of creative A for the same audience, given all else equal?”.
Conclusion
A/B testing is one of the most popular—and powerful—ways to gather information and make data-based decisions. At Bigabid, we use A/B testing constantly on practically anything we can measure. After all, data-driven decisions are always preferable to decisions based on hunches.
When you’re setting out to perform an A/B test, consider which method described above suits you better, and be sure to follow the five steps we outlined for executing your test.