The following is a beginners guide into A/B testing, best practices and the differences between A/B tests and multivariate testing.
At its core, A/B testing is a user experience research method. All A/B tests will always consist of two variations (A & B), but with no limit as to what can be measured. In the world of on-site technology, A/B testing forms the basis of measuring the uplift of one variant against another in order to ensure you are providing the best user experience.
These controlled experiments provide immediate access to a range of scaled data-driven testing environments to inform decisions on creative implementations.
Some of the world’s best companies have streamlined these tests and associated best practices to allow for constant optimization at scale over nearly all areas of the business.
Facebook, for example, runs over 10,000 Facebook versions each day. The result of testing at this scale provides Facebook with 100% certainty in 100% of these tests. This doesn’t mean there isn’t room for error, and bad practices exist more often than not.
So, what is an A/B test?
When conducting an A/B test, there has to be an agreement on what needs to be tested, and what the desired outcome of this test is. Once this is agreed, variants are created in order to test an element against another.
Like with all scientific tests, there must be a control group, or baseline.
In the use case of testing on a website, the website traffic then needs to be split into the variants. This means some users will experience one variant, while others will experience another.
The data (metrics) is then collected, and the result is then analyzed. The variant which performed the best, is then picked and applied to that website environment.
In order to maintain a view of performance, it is recommended that a control group/baseline is kept to monitor the environment going forward.
In the instance this process is not followed correctly, tests can produce meaningless results.
When a variation is selected, this often means to generalize the measures that have been collected up to that point to the entire pool of exposed visitors. This can be a barrier to accuracy if not done correctly.
This part of the process is known as hypothesis testing, and the desired outcome of this process is called statistical significance.
Some examples of A/B testing on web page:
- Running different ads to different users
- Displaying various landing pages to different users
- The wording or layout of a welcome email.
Most A/B tests come about through identifying a problem, and then crafting a hypothesis around this problem.
As in science, a hypothesis is a projected outcome that is compared against a null hypothesis. The null hypothesis is rejected if its probability falls below a predetermined significance level, in which case the hypothesis being tested is said to have that level of significance.
A simple A/B test
Traditionally, within a standard A/B test, the traffic of a website is split into variants of the content. One variant is the aforementioned control group, or baseline. The other group is the test environment where the new variant is added.
Further A/B tests
Tests are always able to be conducted with more than just two variants. This is known as a A/B/n test. These tests allow you to measure the performance of three or more variations rather than just testing against one controlled variant.
The more traffic to a website, for example, the larger the size of the data pool. The larger the size of the data pool, the more accurate the results will be.
When looking at most A/B/n tests, there is always going to be a recommended limit as to how many changes or variants there are within a test. It’s always best to make a few critical changes in the way of determining the best element to run for the majority of the environment going forward.
However, when multiple changes within a test are needed, one can run what is known as a multivariate test.
Multivariate Testing
Multivariate testing is a technique for testing a hypothesis in which multiple variables are modified, compared to the traditional 2-3 in an A/B or A/Bn test.
The overarching goal of multivariate testing is to ascertain which combination of variants performs the best.
This is best done on websites or mobile apps where elements are dynamic and can be easily modified. This might be changing something like a headline, while also combining these with changes in the variations of content. These tests are then analysed and the best performing element is then put in use on a larger scale.
Example
V1 - A (image) + A (widget)
V2 - B (image) + A (widget)
V3 - A (image) + B (widget)
V4 - A (image) + B (widget)
The process of running a multivariate test is similar to A/B testing, but a key difference being that A/B testing only tests 1-2 variants against a control. A minimum of one variant is tested to determine the effect of a change to one variable. When looking at a multivariate test, multiple variables are tested together, as seen in the example above.
Testing the success of an A/B testing platform
The most important way to ensure an accurate A/B test is to have valid methodology as the driving force behind the test itself. In turn, this will yield more accurate results in the long run. A/B testing simply provides us with the framework to allow us to measure the different patterns of behaviour on a page then analyze these differences to build the best possible user experience and provide the most value to a business.
This valid methodology is guided by A/B test best practices.