Simpsons Paradox: Fundamental but common mistakes when analyzing multiple A/B tests

April 20, 2010

Most marketers test various concepts through simple AB testing or other more advanced structured testing approaches (fractional factorials, Taguchi methods etc.).  A common mistake that most analysts make involves calculating the lift on the overall gains, rather then individual tests.  In cases where the test/control sizes are similar proportions, they lead to the same answer, however the reverse is not true.  The answers could even lead to contradicting conclusions, as they will be outlined below.

Suppose we have two groups within our file (this could be more, but for simplicity, we'll stick with two).  These two groups could be anything, for example, it could be

  • Male versus female,
  • People who bought X and who didn't buy X
  • People who are highly responsive to marketing or non-responsive
  • High value customers , Low value customers

Suppose we're trying to examine the effectiveness and lift of a specific campaign on these groups.

Examine the calculation below:

Test Control =C-F =(C-F)*A
Total Marketed Responders Response Rate Total Marketed Responders Response Rate Lift Incremental Responders
X Group 1 1000 60 6.0% 200 4 2.0% 4.0% 40
Y Group 2 500 10 2.0% 400 4 1.0% 1.0% 5
Group 1+2 45.00
X+Y Total 1500 70 4.7% 600 8 1.3% 3.3% 50.00

If you follow the above example, the incremental customers coming in from each group is 40+5 = 45.  However, when we sum up the test and control groups into one group, where we have 1500 test and 600 control subjects, then the incremental customers is calculated at 50.

How come?  Why are the two results different? Which one is correct?  Let me know at what you think.  Answer will come in a few days...