Gauquelin LERRCP - file D6

D6 file was published by Gauquelin in 1979. It contains data used for a replication of tests on athletes.
The title of the original booklet indicates 432 European sport champions. Inclusion of these data in g5 database uses page 902gdD6.html from Cura web site, anouncing 450 records (this difference has not been explained yet).
The main problem with this file is that it contains neither place name nor country. So timezone offset computation is impossible without restoration. Country restoration is done using reverse geocoding. This also permits to compute place names, but does not always give the exact birth place. This restoration needs to be bettered, birth place restoration is currently not included in the resulting file, but timezone offset and date UTC could be computed.

G5 integration

To import raw file in database, the following commands must be issued, in this order :
php run-g5.php gauq D6 raw2tmp
php run-g5.php gauq D6 addGeo
php run-g5.php gauq D6 addTzo
php run-g5.php gauq D6 tmp2db

Preparatory code

Cura file contains 32 names that can't be easily split to family / given names by program. To handle these cases, the command
php run-g5.php gauq D6 look emptyGiven
was used to build the array D6::NAMES_CORRECTIONS.
One case could not be fixed : 115 Crossalexander 1919-06-16 15:30

raw2tmp

This step extracts and reformats the informations of 902gdD6.html to a csv file in directory data/tmp/gauq/lerrcp.
  • Notice of page 902gdD6.html says that the file contains 450 persons, but it contains in fact 449 (NUM 234 is missing).
    Fortunately, this record 234, Léo Lacroix, is present in Ertel's file (num 2318) and is added in a further step.
  • For record 356 Ruiz Bernardo, a check on wikipedia shows that he is born in Orihuela, ES, which shows that there is an error in cura file (latitude should be 38N05 instead of 36N05). The correction was added to step raw2tmp().
  • Fixes on names are also included in raw2tmp code.

prepareGeo

This step calls geonames.org web service to build a file, data/db/init/geonames/D6.csv.
This file is versioned with the program, so it does not need to be executed at each generation of the database.
This file is a copy of data/tmp/gauq/lerrcp/D6.csv, completed by prepareGeo with geographical informations (columns PLACE, CY, C2 and GEOID are computed).
In prepareGeo, for each row of D6.csv, geonames.org web service is called to perform reverse geocoding (compute place from longitude and latitude) and the result is stored in the file. The updated version of the file is rewritten on disk at each iteration, which permits to stop the execution and re-execute this step without calling geonames for rows already processed.

addGeo

The initial purpose of prepareGeo was to find birth place, but tests showed that the longitude and latitude given by Cura web site (one arc minute) are not precise enough to spot the exact place.
This step finally copies only the fields CY (country) and C2 (admin code level 2) from data/db/init/geonames/D6.csv to data/tmp/gauq/lerrcp/D6.csv.

addTzo

The addition of fields fields CY and C2 permits to compute timezone offset and universal time.
Fields DATE-UT, TZO and NOTES-DATE are computed in this step.