Gauquelin LERRCP - file D6

The main problem of page 902gdD6.html is that it contains neither place name nor country. So it is impossible to compute timezone offset without restoration. Country restoration is done using reverse geocoding. This also permits to compute place names, but does not always give the exact birth place. This restoration needs to be bettered, birth place restoration is currently not included in the resulting file.
Accoding to cura.free.fr : D6 contains 450 New famous European Sports Champions but only 449 were found in the file.
Computation of timezone offset is also possible but hasn't been coded yet.
To import raw file in database, the following commands must be issued :
php run-g5.php cura D6 raw2tmp
php run-g5.php cura D6 addGeo
php run-g5.php cura D6 tmp2db

Preparatory code

Cura file contains 32 names that can't be easily split to family / given names by program. To handle these cases, the command
php run-g5.php cura D6 look emptyGiven
was used to build array $NAMES_CORRECTIONS of raw2tmp.
One case could not be fixed : 115 Crossalexander 1919-06-16 15:30

raw2tmp

This step copies and reformat the informations of 902gdD6.html to a csv file in directory data/tmp/cura.
  • Notice of page 902gdD6.html says the file contains 450 persons, but it contains in fact 449 (NUM 234 is missing).
    So the generated file contains 449 persons.
  • For record 356 Ruiz Bernardo, a check on wikipedia shows that he is born in Orihuela, ES, which shows that there is an error in cura file (latitude should be 38N05 instead of 36N05). The correction was added to step raw2tmp().
  • Fixes on names are also included in raw2tmp code.

addGeo

Current code does not use anymore Geonames web service.
Result of previous calls to geonames web service was copied to
data/build/geonames/D6.csv
addGeo contains temporary code using this file to restore country code.
Documentation of this paragraph is obsolete.
This step computes country code and might be bettered to also compute place name.
The principle is to call geonames.org web service to perform reverse geocoding (compute place from longitude and latitude).
In practice, a new directory was added in config : 5-geonames, located by default in data/5-tmp/geonames.
For each row of D6, addGeo :
  • Checks in 5-geonames/D6.csv if geocoding was already computed.
  • If not, it calls geonames.org web service, add informations to 5-geonames/D6.csv and rewrites this updated file on disk.
  • Transfers the informations of 5-geonames/D6.csv to 5-tmp/D6.csv
Rewriting 5-geonames/D6.csv after each call to geonames.org permits to stop the execution and start it again without re-calling geonames.org web service for records that have already been computed.
Reverse geocoding gave a result for all records except for 356 Ruiz Bernardo (see correction above).

The fix on Bernardo Ruiz shows that longitudes and latitudes are expressed by removing the seconds, and not rounded to the nearest minute, which generates a loss of precision : latitude of 356 Ruiz Bernardo is 00W56 and the real value is 0°56'49'' - a value of 00W57 in cura file would have been more precise.

Tests

The purpose is to check the cities given by geonames.org web service. Some examples are given in 902gdD6.html, which permit a first estimation.
Generated records :
179;Gimondi Felice;1942-09-29 18:10+02:00;Bergamo;IT;3182164;9.66667;45.7
310;Ocana Luis;1945-06-09 21:30+02:00;Priego;ES;3112841;-2.31667;40.43333
397;Thevenet Bernard;1948-01-10 12:00+02:00;Changy;FR;3026931;4.23333;46.43333
403;Van Impe Lucien;1946-10-20 04:30+02:00;Cauwenberg;BE;2800617;3.96667;50.91667
432;Zoetemelk Gerardus;1946-12-03 17:15+02:00;Bomen- en Bloemenbuurt;NL;11525080;4.26667;52.08333
Examples of 902gdD6.html :
Felice GIMONDI, born 29 SEP 1942, 18:10, in Sedrina (It)
Luis OCAÑA, born 9 JUN 1945, 21:30, in Priego (Sp)
Bernard THEVENET, born 10 JAN 1948, 12:00, in St Julien de Civry (Fr)
Lucien VAN IMPE, born 20 OCT 1946, 4:30, in Mere (Bel)
Joop (Gerardus) ZOETEMELK, born 3 DEC 1946, 17:15, in The Hague (Ne)
All cities but one differ !
These records were analyzed to understand this difference :
PersonCity given
by cura
City obtained
from geonames.org
Comments
Felice GIMONDI Sedrina Bergamo According to Wikipedia, Gimondi is born in Sedrina
Sedrina is a small town near Bergamo
Coordinates (lat, long) given in 902gdD6.html : (45N42, 9E40)
Coordinates given by geonames.org :
Sedrina : (45.78178, 9.62405) = (45°46′54.408″ N, 9°37′26.58″ E
Bergamo : (45.69798, 9.66895) = (45°41′52.728″ N, 9°40′8.22″ E)
The problem seems to come from coordinates given in 902gdD6.html, which correspond to Bergamo, and not Sedrina.
Bernard THEVENET St Julien de Civry Changy According to Wikipedia, Thevenet is born in Saint-Julien-de-Civry.
Coordinates (lat, long) given in 902gdD6.html : (46N26, 04E14)
Coordinates given by geonames.org :
Saint-Julien-de-Civry : (46.36635, 4.23121) = (46°21′58.86″ N, 4°13′52.356″ E)
Changy : (46.4146, 4.23654) = (46°24′52.56″ N, 4°14′11.544″ E)
Here also the problem seems to come from the coordinates given in 902gdD6.html.
Lucien VAN IMPE Mere Cauwenberg According to Wikipedia, Van Impe is born in Mere.
Coordinates (lat, long) given in 902gdD6.html : (50N55, 03E58)
Coordinates given by geonames.org :
Mere : (50.9, 3.86667) = (50°53′60″ N, 3°52′0.012″ E)
Cauwenberg : (50.92, 3.96684) = (50°55′12″ N, 3°58′0.624″ E)
Here also the problem seems to come from the coordinates given in 902gdD6.html.
Joop ZOETEMELK The Hague Bomen- en Bloemenbuurt According to Wikipedia, Zoetemelk is born in The Hague.
Coordinates (lat, long) given in 902gdD6.html : (52N05, 04E16)
Coordinates given by geonames.org :
The Hague : (52.07667, 4.29861) = (52°04′36″ N, 4°17′55″ E)
Bomen- en Bloemenbuurt : (52.07765, 4.26342) = (52°04′40″ N, 4°15′48″ E)
Here also the problem seems to come from the coordinates given in 902gdD6.html.

Temporary conclusion

From these examples, it seems that
  • Birth places given in Cura file correspond to wikipedia.
  • Coordinates contained in Cura files are not precise enough to reach the good birth place using geonames.org reverse geocoding.
But step addGeo has the interest to restore the country, giving the ability to compute the timezone offset.

TODO

  • Compute timezone offset for all records of D6
  • Birth place restoration