Gauquelin LERRCP - file D6

D6 file was published by Gauquelin in 1979. It contains data used for a replication of
The title of the original booklet indicates 432 European sport champions. Inclusion of these data in g5 database uses page 902gdD6.html from Cura web site, anouncing 450 records (this difference has not been explained yet).
The main problem with this file is that it contains neither place name nor country. So timezone offset computation is impossible without restoration. Country restoration is done using reverse geocoding. This also permits to compute place names, but does not always give the exact birth place. This restoration needs to be bettered, birth place restoration is currently not included in the resulting file.

G5 integration

To import raw file in database, the following commands must be issued, in this order :
php run-g5.php gauq D6 raw2tmp
php run-g5.php gauq D6 addGeo
php run-g5.php gauq D6 tmp2db

Preparatory code

Cura file contains 32 names that can't be easily split to family / given names by program. To handle these cases, the command
php run-g5.php gauq D6 look emptyGiven
was used to build array D6::NAMES_CORRECTIONS.
One case could not be fixed : 115 Crossalexander 1919-06-16 15:30

raw2tmp

This step copies and reformats the informations of 902gdD6.html to a csv file in directory data/tmp/gauq/lerrcp.
  • Notice of page 902gdD6.html says the file contains 450 persons, but it contains in fact 449 (NUM 234 is missing).
    Fortunately, this record 234, Léo Lacroix, is present in Ertel's file (num 2318) and is added in a further step.
  • For record 356 Ruiz Bernardo, a check on wikipedia shows that he is born in Orihuela, ES, which shows that there is an error in cura file (latitude should be 38N05 instead of 36N05). The correction was added to step raw2tmp().
  • Fixes on names are also included in raw2tmp code.

addGeo

Current code does not use anymore Geonames web service.
Result of previous calls to geonames web service was copied to
data/db/init/geonames/D6.csv
addGeo contains temporary code using this file to restore country code.
This step computes country code and might be bettered to also compute place name.
The principle is to call geonames.org web service to perform reverse geocoding (compute place from longitude and latitude).
Results of calls to geonames.org web service are stored in data/tmp/geonames.
For each row of D6, addGeo :
  • Checks in data/tmp/geonames/D6.csv if geocoding was already computed.
  • If not, it calls geonames.org web service, add informations to data/tmp/geonames/D6.csv and rewrites this updated file on disk.
  • Transfers the informations of data/tmp/geonames/D6.csv to data/tmp/gauq/lerrcp/D6.csv

Rewriting data/tmp/geonames/D6.csv after each call to geonames.org permits to stop the execution and start it again without re-calling geonames.org web service for records that have already been computed.
Reverse geocoding gave a result for all records except for 356 Ruiz Bernardo (see correction above).

The fix on Bernardo Ruiz shows that longitudes and latitudes are expressed by removing the seconds, and not rounded to the nearest minute, which generates a loss of precision : latitude of 356 Ruiz Bernardo is 00W56 and the real value is 0°56'49'' - a value of 00W57 in cura file would have been more precise.

Tests

The purpose is to check the cities given by geonames.org web service. Some examples are given in 902gdD6.html, which permit a first estimation.
Generated records :
179;Gimondi Felice;1942-09-29 18:10+02:00;Bergamo;IT;3182164;9.66667;45.7
310;Ocana Luis;1945-06-09 21:30+02:00;Priego;ES;3112841;-2.31667;40.43333
397;Thevenet Bernard;1948-01-10 12:00+02:00;Changy;FR;3026931;4.23333;46.43333
403;Van Impe Lucien;1946-10-20 04:30+02:00;Cauwenberg;BE;2800617;3.96667;50.91667
432;Zoetemelk Gerardus;1946-12-03 17:15+02:00;Bomen- en Bloemenbuurt;NL;11525080;4.26667;52.08333
Examples of 902gdD6.html :
Felice GIMONDI, born 29 SEP 1942, 18:10, in Sedrina (It)
Luis OCAÑA, born 9 JUN 1945, 21:30, in Priego (Sp)
Bernard THEVENET, born 10 JAN 1948, 12:00, in St Julien de Civry (Fr)
Lucien VAN IMPE, born 20 OCT 1946, 4:30, in Mere (Bel)
Joop (Gerardus) ZOETEMELK, born 3 DEC 1946, 17:15, in The Hague (Ne)
All cities but one differ !
These records were analyzed to understand this difference :
PersonCity given
by cura
City obtained
from geonames.org
Comments
Felice GIMONDI Sedrina Bergamo According to Wikipedia, Gimondi is born in Sedrina
Sedrina is a small town near Bergamo
Coordinates (lat, long) given in 902gdD6.html : (45N42, 9E40)
Coordinates given by geonames.org :
Sedrina : (45.78178, 9.62405) = (45°46′54.408″ N, 9°37′26.58″ E
Bergamo : (45.69798, 9.66895) = (45°41′52.728″ N, 9°40′8.22″ E)
The problem seems to come from coordinates given in 902gdD6.html, which correspond to Bergamo, and not Sedrina.
Bernard THEVENET St Julien de Civry Changy According to Wikipedia, Thevenet is born in Saint-Julien-de-Civry.
Coordinates (lat, long) given in 902gdD6.html : (46N26, 04E14)
Coordinates given by geonames.org :
Saint-Julien-de-Civry : (46.36635, 4.23121) = (46°21′58.86″ N, 4°13′52.356″ E)
Changy : (46.4146, 4.23654) = (46°24′52.56″ N, 4°14′11.544″ E)
Here also the problem seems to come from the coordinates given in 902gdD6.html.
Lucien VAN IMPE Mere Cauwenberg According to Wikipedia, Van Impe is born in Mere.
Coordinates (lat, long) given in 902gdD6.html : (50N55, 03E58)
Coordinates given by geonames.org :
Mere : (50.9, 3.86667) = (50°53′60″ N, 3°52′0.012″ E)
Cauwenberg : (50.92, 3.96684) = (50°55′12″ N, 3°58′0.624″ E)
Here also the problem seems to come from the coordinates given in 902gdD6.html.
Joop ZOETEMELK The Hague Bomen- en Bloemenbuurt According to Wikipedia, Zoetemelk is born in The Hague.
Coordinates (lat, long) given in 902gdD6.html : (52N05, 04E16)
Coordinates given by geonames.org :
The Hague : (52.07667, 4.29861) = (52°04′36″ N, 4°17′55″ E)
Bomen- en Bloemenbuurt : (52.07765, 4.26342) = (52°04′40″ N, 4°15′48″ E)
Here also the problem seems to come from the coordinates given in 902gdD6.html.

Temporary conclusion

From these examples, it seems that
  • Birth places given in Cura file correspond to wikipedia.
  • Coordinates contained in Cura files are not precise enough to reach the good birth place using geonames.org reverse geocoding.
But step addGeo has the interest to restore the country, giving the ability to compute the timezone offset.