G5 integration
To import raw file in database, the following commands must be issued, in this order :php run-g5.php gauq D6 raw2tmp php run-g5.php gauq D6 addGeo php run-g5.php gauq D6 tmp2db
Preparatory code
Cura file contains 32 names that can't be easily split to family / given names by program. To handle these cases, the commandphp run-g5.php gauq D6 look emptyGivenwas used to build array
D6::NAMES_CORRECTIONS
.
One case could not be fixed :
115 Crossalexander 1919-06-16 15:30
raw2tmp
This step copies and reformats the informations of902gdD6.html
to a csv file in directory data/tmp/gauq/lerrcp
.
-
Notice of page
902gdD6.html
says the file contains 450 persons, but it contains in fact 449 (NUM 234 is missing).
Fortunately, this record 234, Léo Lacroix, is present in Ertel's file (num 2318) and is added in a further step. -
For record
356 Ruiz Bernardo
, a check on wikipedia shows that he is born in Orihuela, ES, which shows that there is an error in cura file (latitude should be38N05
instead of36N05
). The correction was added to stepraw2tmp()
. -
Fixes on names are also included in
raw2tmp
code.
addGeo
Current code does not use anymore Geonames web service.
Result of previous calls to geonames web service was copied to
This step computes country code and might be bettered to also compute place name.
Result of previous calls to geonames web service was copied to
data/db/init/geonames/D6.csv
addGeo
contains temporary code using this file to restore country code.
The principle is to call geonames.org web service to perform reverse geocoding (compute place from longitude and latitude).
Results of calls to geonames.org web service are stored in
data/tmp/geonames
.
For each row of D6,
addGeo
:
- Checks in
data/tmp/geonames/D6.csv
if geocoding was already computed. - If not, it calls geonames.org web service, add informations to
data/tmp/geonames/D6.csv
and rewrites this updated file on disk. - Transfers the informations of
data/tmp/geonames/D6.csv
todata/tmp/gauq/lerrcp/D6.csv
Rewriting
data/tmp/geonames/D6.csv
after each call to geonames.org permits to stop the execution and start it again without re-calling geonames.org web service for records that have already been computed.
Reverse geocoding gave a result for all records except for
356 Ruiz Bernardo
(see correction above).
The fix on Bernardo Ruiz shows that longitudes and latitudes are expressed by removing the seconds, and not rounded to the nearest minute, which generates a loss of precision : latitude of
356 Ruiz Bernardo
is 00W56
and the real value is 0°56'49''
- a value of 00W57
in cura file would have been more precise.
Tests
The purpose is to check the cities given by geonames.org web service. Some examples are given in902gdD6.html
, which permit a first estimation.
Generated records :
179;Gimondi Felice;1942-09-29 18:10+02:00;Bergamo;IT;3182164;9.66667;45.7 310;Ocana Luis;1945-06-09 21:30+02:00;Priego;ES;3112841;-2.31667;40.43333 397;Thevenet Bernard;1948-01-10 12:00+02:00;Changy;FR;3026931;4.23333;46.43333 403;Van Impe Lucien;1946-10-20 04:30+02:00;Cauwenberg;BE;2800617;3.96667;50.91667 432;Zoetemelk Gerardus;1946-12-03 17:15+02:00;Bomen- en Bloemenbuurt;NL;11525080;4.26667;52.08333
Examples of
902gdD6.html
:
Felice GIMONDI, born 29 SEP 1942, 18:10, in Sedrina (It) Luis OCAÑA, born 9 JUN 1945, 21:30, in Priego (Sp) Bernard THEVENET, born 10 JAN 1948, 12:00, in St Julien de Civry (Fr) Lucien VAN IMPE, born 20 OCT 1946, 4:30, in Mere (Bel) Joop (Gerardus) ZOETEMELK, born 3 DEC 1946, 17:15, in The Hague (Ne)
These records were analyzed to understand this difference :
Person | City given by cura | City obtained from geonames.org | Comments |
---|---|---|---|
Felice GIMONDI | Sedrina | Bergamo |
According to Wikipedia, Gimondi is born in Sedrina
Sedrina is a small town near Bergamo Coordinates (lat, long) given in 902gdD6.html : (45N42, 9E40)
Coordinates given by geonames.org : Sedrina : (45.78178, 9.62405) = (45°46′54.408″ N, 9°37′26.58″ E Bergamo : (45.69798, 9.66895) = (45°41′52.728″ N, 9°40′8.22″ E) The problem seems to come from coordinates given in 902gdD6.html, which correspond to Bergamo, and not Sedrina. |
Bernard THEVENET | St Julien de Civry | Changy |
According to Wikipedia, Thevenet is born in Saint-Julien-de-Civry.
Coordinates (lat, long) given in 902gdD6.html : (46N26, 04E14)
Coordinates given by geonames.org : Saint-Julien-de-Civry : (46.36635, 4.23121) = (46°21′58.86″ N, 4°13′52.356″ E) Changy : (46.4146, 4.23654) = (46°24′52.56″ N, 4°14′11.544″ E) Here also the problem seems to come from the coordinates given in 902gdD6.html. |
Lucien VAN IMPE | Mere | Cauwenberg |
According to Wikipedia, Van Impe is born in Mere.
Coordinates (lat, long) given in 902gdD6.html : (50N55, 03E58)
Coordinates given by geonames.org : Mere : (50.9, 3.86667) = (50°53′60″ N, 3°52′0.012″ E) Cauwenberg : (50.92, 3.96684) = (50°55′12″ N, 3°58′0.624″ E) Here also the problem seems to come from the coordinates given in 902gdD6.html. |
Joop ZOETEMELK | The Hague | Bomen- en Bloemenbuurt |
According to Wikipedia, Zoetemelk is born in The Hague.
Coordinates (lat, long) given in 902gdD6.html : (52N05, 04E16)
Coordinates given by geonames.org : The Hague : (52.07667, 4.29861) = (52°04′36″ N, 4°17′55″ E) Bomen- en Bloemenbuurt : (52.07765, 4.26342) = (52°04′40″ N, 4°15′48″ E) Here also the problem seems to come from the coordinates given in 902gdD6.html. |
Temporary conclusion
From these examples, it seems that- Birth places given in Cura file correspond to wikipedia.
- Coordinates contained in Cura files are not precise enough to reach the good birth place using geonames.org reverse geocoding.
addGeo
has the interest to restore the country, giving the ability to compute the timezone offset.