What's wrong with Scala? About Yate's Correction

Some months ago, I started a new job as a machine learning engineer for a german trading company. In the course of this, I began to deal more with the programming language Scala. Scala provides some benefits that makes the work easier for data engineers when building data pipelines and there's a good support for spark, which is an advantage in cluster computing. But Scala (of course) also has its idiosyncrasies.

This article shows the idiosyncrasy of Yate's correction in the Statistcs package of Scala and its consequences.

The Statistics Package

Like many other programming languages, Scala also comes with a statistical package called "Statistics". It includes many basic methods of the describing statistics, which you need for many purposes. One of these purposes is, for example, A/B testing.

So, what's wrong with Scala?

For an A/B test, I tested the significance by own calculations first (in old manner, as I had learned in my studies) before using the statistical package of Scala. Call me a control freak, but I want to understand when I'm possibly wrong and why. Anyway. The result was, that my own calculation differed from the one with the Scala Statistics package. My result was a Chi Squared value of 5.882 while the Statistics package said it is 5.510.

The test data was as follows: There have been two groups, A and B. In each group there was the possibility of the occurence of an event x - so x could have happend (x) or not (!x). Group B had a little bit different environment than group A (the control group). In group A, there were 9,500 visitors without event x and 950 with event x. In group B (the test group) were 500 visitors without event x and 32 with event x happened.

Event	A	B
x	9,500	500
!x	950	32

This means a probability of 10% for x in group A and 6.4% in group B. The question to answer was, if group B really had a smaller probability for x or was it just the effect of a too small sample? The Chi square independece test should answer this question.

The test needed to beat the Chi²-value of 3.84, so it was beaten anyway - but where did the difference come from? The most simple explanation was, that I was wrong. But I needed proof. So I tested the values with a short python code snippet below (the original Code from Jason Brownlee can be found here), which requires the module SciPy (installable with pip: pip install scipy).

from scipy.stats import chi2_contingency
from scipy.stats import chi2

# List 0: visitors, list 1: conversions
table = [[9500, 500],
         [950, 32]]

stat,p, dof, expected = chi2_contingency(table)

# Test-statistic
alpha = 0.05
prob = 1-alpha
critical = chi2.ppf(prob, dof)

print('probability=%.3f, critical=%.3f, stat=%.3f' % (prob, critical, stat))

The code above produced the following output below. As you can see, python also calculates the lower Chi²-value of 5.510. But the reason was not that I was wrong.

dof=1
[[9515.57093426  484.42906574]
 [ 934.42906574   47.57093426]]
probability=0.950, critical=3.841, stat=5.510
>>>

After some research, I found the source of the deviation. It is located in the python method chi2_contingency which calculates the Chi-squared contingency of the contingency table. It gets a little bit clearer, if we take a look at its signature:

scipy.stats.chi2_contingency(observed, correction=True, lambda_=None)

As we can see, there are more parameters available than used inside the code. The source of the deviation is the parameter "correction". By default, it's set to True and this causes, that in a special case (only in 2x2 cintingency tables) the so called Yate's Correction will be applied when calculation the Chi²-value. And this correction is responsible for the lower Chi²-value. (A detailed explanation of Yate's Correction can be found here.)

The Chi squared test has an upwards bias which causes too high values - especially on low frequencies of observations. Yate's Correction decreases this effect hence it "pushes down" chi-square and thus provides a cautious or "conservative" interpretation of the results. In higher frequencies it should take less effect. But: There's a discussion between experts, because Yate's correction is too strict (see Bradley et al, 1979).

In Python it is possible to set the correction parameter to False. After this, the result fits the expectation as below:

from scipy.stats import chi2_contingency
from scipy.stats import chi2

# List 0: visitors, list 1: conversions
table = [[9500, 500],
         [950, 32]]

# Correction parameter now set to False
stat, p, dof, expected = chi2_contingency(table, False)

# Test-statistic
alpha = 0.05
prob = 1-alpha
critical = chi2.ppf(prob, dof)

print('probability=%.3f, critical=%.3f, stat=%.3f' % (prob, critical, stat))

Finally, the result comes as expected:

probability=0.950, critical=3.841, stat=5.882
>>>

Unfortunately, in the Statistics package of Scala there's no option for disabling the Yate's correction and so we currently have to live with it.

Consequences of Yate's correction

What's the consequence of using the Yate's correction in A/B testing with high frequencies? Hence the Chi²-value is lower than it actually is, there is a possibility that the A/B test result is interpreted that there's no statistical significant difference between goup A and B though there is one. And it leads to the assumption, that the tested change had no effect on the test group (but which would wrong).

No Bytes

Dieses Blog durchsuchen