A/B ⚖ Testing with Multiple Metrics
A lot of literature references and guidance about A/B testing anchor on tests based on one single comparison or one single metric. Some of my friends working in tech also shared that they typically focus on one primary metric when performing experiment designs (e.g., study design and sample size calculation etc.).
I’m curious if this is a common practice in the tech industry (especially the non-biotech industries).
Do you factor in multiple metrics when performing experiment design at your company?
Yes, factor in more than 1 metric.
No, only focus on 1 primary metric.
What I learned from industry practitioners is that multiple metrics (usually 3 to 5) are monitored even though the sample size might be based on one primary metric.
The issue of multiplicity
There are two types of multiplicity issues in experiments. One is comparing more than two groups (e.g., one control, two different treatments); another is observing more than one metric.
Multiple groups of comparison:
There is a good interview question about this from DataInterviewPro:
We are running a test with 10 variants, trying different versions of our landing page. One treatment wins and the p-value is less than .05. Would you…