Statistical tests

Chi square test

Chi square, or chi 2, or χ², permits to compare two distributions.
In practice, we have an observed distribution, coming from real life, and an expected distribution, supposed to represent what we should observe in the hypothesis of "hazard".
The chi square test here is used to determine whether there is a statistically significant difference between the expected and observed distributions.
This use of the chi square test is called "Chi Square Goodness of Fit".

The chi square gives a measure of the difference between the observed and expected distributions.
The formula is simple to implement: Chi square formula

Σ (sigma) means sum, from i = 1 to i = n
O_i are the observed values
E_i are the expected values

This formula means that for each observed and corresponding expected value:

Take the difference between observed and expected
Square this difference
Divide the square by expected value

For example:

Expected	15	18	12
Observed	17	18	10

χ² = (17 - 15)² / 15 + (18 - 18)² / 18 + (10 - 12)² / 12
= 2² / 15 + 0² / 18 + (-2)² / 12
= 4 / 15 + 0 + 4 / 12 ≃ 0.959

p-value

Once the chi square is computed, it is possible to compute the associated probability, often called "p-value".
The p-value is a probability (then between 0 and 1): the probability that observed and expected distributions are independant.
For example, a p-value of 0.03 permits to say "There are 97 % of chances that observed and expected distributions are not independant".
An usual convention (industry, medecine) is to consider that if p < 0.05, then the observed and expected distributions are linked in a statistically significant way.

Applied to planetary positions, the computation of p-value is the main result to see if an anomaly (eventually related to astrology) is observed.

Degree of freedom

The computation of p-value needs the "degree of freedom" (often noted df).
The expected and observed distributions are bound by the fact that their sum must be equal.
In the chi square example, 15 + 18 + 12 = 17 + 18 + 10 = 45
If this was not the case, it would be meaningless to compare them.
The degree of freedom represent the number of quantities that can vary without breaking this constraint.

In the case of the distributions handled by this program, the degree of freedom is the number of bins minus 1.
In the chi square example, df = 3 - 1 = 2.
In a dim1 distribution of 360 bins (one bin per degree), df = 360 - 1 = 359.
In a dim2 distribution of 360 x 360 cells, df = 359 x 359 = 128 881.

The details of the p-value computation is too complicated to be explained here.

Size effect

Not implemented yet

Next: Studies