Abbreviations used in this page:
M = Mother, F = Father, C = Child, W = Wedding
M = Mother, F = Father, C = Child, W = Wedding
The raw data
| Lines with M F C W | 321 838 | 54.4 % |
| Lines with M F C | 270 098 | 45.6 % |
| Total number of lines | 591 936 | 100 % |
| Number of birth dates | 1 775 808 |
| Number of wedding dates | 321 838 |
| Total number of dates | 2 097 646 |
Execution
For this dataset,data.csv.bz2 can be loaded in memory. The computation of control groups doesn't use an auxiliary database.
To build a control group, the program loops on mother birth dates, randomly selects a father, then a child (a test is done to be sure that the child is born after the father and the mother), then a wedding date (a test is done to ensure that the wedding occurs after mother and father birth).