G5 integration
To import raw file in database, the following commands must be issued, in this order :php run-g5.php gauq D6 raw2tmp php run-g5.php gauq D6 addGeo php run-g5.php gauq D6 addTzo php run-g5.php gauq D6 tmp2db
Preparatory code
Cura file contains 32 names that can't be easily split to family / given names by program. To handle these cases, the commandphp run-g5.php gauq D6 look emptyGivenwas used to build the array
D6::NAMES_CORRECTIONS
.
One case could not be fixed :
115 Crossalexander 1919-06-16 15:30
raw2tmp
This step extracts and reformats the informations of902gdD6.html
to a csv file in directory data/tmp/gauq/lerrcp
.
-
Notice of page
902gdD6.html
says that the file contains 450 persons, but it contains in fact 449 (NUM 234 is missing).
Fortunately, this record 234, Léo Lacroix, is present in Ertel's file (num 2318) and is added in a further step. -
For record
356 Ruiz Bernardo
, a check on wikipedia shows that he is born in Orihuela, ES, which shows that there is an error in cura file (latitude should be38N05
instead of36N05
). The correction was added to stepraw2tmp()
. -
Fixes on names are also included in
raw2tmp
code.
prepareGeo
This step calls geonames.org web service to build a file,data/db/init/geonames/D6.csv
.
This file is versioned with the program, so it does not need to be executed at each generation of the database.
This file is a copy of
data/tmp/gauq/lerrcp/D6.csv
, completed by prepareGeo
with geographical informations (columns PLACE
, CY
, C2
and GEOID
are computed).
In
prepareGeo
, for each row of D6.csv
, geonames.org web service is called to perform reverse geocoding (compute place from longitude and latitude) and the result is stored in the file. The updated version of the file is rewritten on disk at each iteration, which permits to stop the execution and re-execute this step without calling geonames for rows already processed.
addGeo
The initial purpose ofprepareGeo
was to find birth place, but tests showed that the longitude and latitude given by Cura web site (one arc minute) are not precise enough to spot the exact place.
This step finally copies only the fields
CY
(country) and C2
(admin code level 2) from data/db/init/geonames/D6.csv
to data/tmp/gauq/lerrcp/D6.csv
.
addTzo
The addition of fields fieldsCY
and C2
permits to compute timezone offset and universal time.
Fields
DATE-UT
, TZO
and NOTES-DATE
are computed in this step.