Definition
A control group is a fictional group built by shuffling the data of the observed group.For example, for a group of 500 persons with birth and death dates (B and D):
- take the birth date of a record,
- randomly select an other record,
- build a record with the birth date of the first record and the death date of the second record.
And questions arise:
- Do we want to keep properties of the observed group ?
- For example, should the control groups respect the seasonal distribution of births ?
- Should it respect the age at death distribution ?
Can it affect the results of the satistical tests ?
More rigorous statistical knowledge is needed to build correct control groups and avoid unintentional bias due to arbitrary choices.
Expected distributions
To build the expected distributions,- several control groups are computed ;
- for each distribution, an average distribution is computed ;
- these average distributions are called the expected distributions.
File hierarchy
For each control, the full hierarchy of the observed distributions is reproduced.In the example of a group containing birth and death dates:
observed
├── birth
│ ├── aspects
│ │ ├── JU-NE.csv
│ │ ├── ...
│ │ └── VE-UR.csv
│ ├── planets
│ │ ├── JU.csv
│ │ ├── ...
│ │ └── UR.csv
│ ├── day.csv
│ └── year.csv
├── birth-death
│ ├── interaspects
│ │ ├── JU-JU.csv
│ │ ├── ...
│ │ └── VE-VE.csv
│ └── age.csv
└── death
├── aspects
│ ├── JU-NE.csv
│ ├── ...
│ └── VE-UR.csv
├── planets
│ ├── JU.csv
│ ├── ...
│ └── UR.csv
├── day.csv
└── year.csv
Then the hierarchy of controls will be:
controls
├── control-001
│ ├── birth
│ ├── birth-death
│ └── death
├── ...
│
└── control-100
├── birth
├── birth-death
└── death
And the expected distributions, containing the average of all control distributions:
expected
├── birth
├── birth-death
└── death
Next: Statistical tests