Two weeks ago, we discussed the issue of outliers in the context of designing your A/B test. If you need a quick reminder, you can find the relevant post here:
Today we’ll discuss a common problem with regards to A/B testing: What to expect when there’re outliers in your population? Two-sample t-tests are widely used for A/B testing when the primary objective is to compare the means of two groups (Group A and Group B) to determine if there is a statistically significant difference between them. It is also a special case of ANOVA (Analysis of Variance).
So what are your options?
There are several options you could choose when dealing with outliers.
The simplest solution is to use the trimmed-mean t-test.
Essentially this is a t-test using the trimmed means of each sample as the means and the Winsorized (trimmed) variances as the variances.
The idea behind this approach is straightforward: if you have data points that are either too big or too small, you throw them away.
Of course, if you have a strong statistics background, you might question the validity of this approach. Simply discarding data points that could potentially represent the distribution may seem hasty.
The answer to this concern is more philosophical than theoretical, and we will explore it in the next section.
But first, let’s examine what happens with an example.
In a marketing research study, a firm tested the effectiveness of a new flavoring for a leading beverage. They used a sample of 22 people, with half of them tasting the beverage with the old flavoring and the other half tasting the beverage with the new flavoring (although one person dropped out before the tasting began). Afterward, the participants were given a questionnaire to evaluate how enjoyable the beverage was. Based on the scores shown on the left side below, we will determine whether there is a significant difference between the perceptions of the two flavorings.