G5 integration
To import raw file in database, the following commands must be issued, in this order :php run-g5.php gauq D6 raw2tmp php run-g5.php gauq D6 addGeo php run-g5.php gauq D6 addTzo php run-g5.php gauq D6 tmp2db
Preparatory code
Cura file contains 32 names that can't be easily split to family / given names by program. To handle these cases, the commandphp run-g5.php gauq D6 look emptyGivenwas used to build the array
D6::NAMES_CORRECTIONS.
One case could not be fixed :
115 Crossalexander 1919-06-16 15:30
raw2tmp
This step extracts and reformats the informations of902gdD6.html to a csv file in directory data/tmp/gauq/lerrcp.
-
Notice of page
902gdD6.htmlsays that the file contains 450 persons, but it contains in fact 449 (NUM 234 is missing).
Fortunately, this record 234, Léo Lacroix, is present in Ertel's file (num 2318) and is added in a further step. -
For record
356 Ruiz Bernardo, a check on wikipedia shows that he is born in Orihuela, ES, which shows that there is an error in cura file (latitude should be38N05instead of36N05). The correction was added to stepraw2tmp(). -
Fixes on names are also included in
raw2tmpcode.
prepareGeo
This step calls geonames.org web service to build a file,data/db/init/geonames/D6.csv.
This file is versioned with the program, so it does not need to be executed at each generation of the database.
This file is a copy of
data/tmp/gauq/lerrcp/D6.csv, completed by prepareGeo with geographical informations (columns PLACE, CY, C2 and GEOID are computed).
In
prepareGeo, for each row of D6.csv, geonames.org web service is called to perform reverse geocoding (compute place from longitude and latitude) and the result is stored in the file. The updated version of the file is rewritten on disk at each iteration, which permits to stop the execution and re-execute this step without calling geonames for rows already processed.
addGeo
The initial purpose ofprepareGeo was to find birth place, but tests showed that the longitude and latitude given by Cura web site (one arc minute) are not precise enough to spot the exact place.
This step finally copies only the fields
CY (country) and C2 (admin code level 2) from data/db/init/geonames/D6.csv to data/tmp/gauq/lerrcp/D6.csv.
addTzo
The addition of fields fieldsCY and C2 permits to compute timezone offset and universal time.
Fields
DATE-UT, TZO and NOTES-DATE are computed in this step.