Chi square test
Chi square, or chi 2, or χ2, permits to compare two distributions.In practice, we have an observed distribution, coming from real life, and an expected distribution, supposed to represent what we should observe in the hypothesis of "hazard".
The chi square test here is used to determine whether there is a statistically significant difference between the expected and observed distributions.
This use of the chi square test is called "Chi Square Goodness of Fit".
The chi square gives a measure of the difference between the observed and expected distributions.
The formula is simple to implement:
- Σ (sigma) means sum, from i = 0 to i = n
- Oi are the observed values
- Ei are the expected values
- Take the difference between observed and expected
- Square this difference
- Divide the square by expected value
| Expected | 15 | 18 | 12 |
|---|---|---|---|
| Observed | 17 | 18 | 10 |
= 22 / 15 + 02 / 18 + (-2)2 / 12
= 4 / 15 + 0 + 4 / 12 ≃ 0.959
p-value
Once the chi square is computed, it is possible to compute the associated probability, often called "p-value".The p-value is a probability (then between 0 and 1): the probability that observed and expected distributions are independant.
For example, a p-value of 0.03 permits to say "There are 97 % of chances that observed and expected distributions are not independant".
An usual convention (industry, medecine) is to consider that if
p < 0.05, then the observed and expected distributions are linked in a statistically significant way.
Applied to planetary positions, the computation of p-value is the main result to see if an anomaly (eventually related to astrology) is observed.
Q: What does independant mean ?
Degree of freedom
The computation of p-value needs the "degree of freedom" (often noteddf).
The expected and observed distributions are bound by the fact that their sum must be equal.
In the chi square example,
15 + 18 + 12 = 17 + 18 + 10 = 45
If this was not the case, it would be meaningless to compare them.
The degree of freedom represent the number of quantities that can vary without breaking this constraint.
In the case of the distributions handled by this program, the degree of freedom is the number of bins minus 1.
In the chi square example,
df = 3 - 1 = 2.
In a distribution observed on 360 ° (with one bin per degree),
df = 360 - 1 = 359.
The details of the p-value computation is too complicated to be explained here.
Size effect
Variance
Next: Studies