Integrating geonames.org

Relating places to a Geonames identifier can be seen as a way to prepare the merge of historical data with Wikidata.
Places expressed with a string are ambiguous because orthograph vary from one source to another (typos, abbreviations, truncated names, misspelling...). An id is needed to handle places by program.
G5 uses two ways to do associate data to Geonames : a local database and geonames.org web service.
In both cases, code can be developed to better the matching between Gauquelin data and geonames.

Local database

The program uses geonames data stored in a local postgres database, as done by the program located at github.com/tig12/geonames2postgres.
You need to :
  • have postgres installed on your machine,
  • in config.yml, put the correct values in the geonames / postgresql section,
  • run geonames2postgres.py for each country.
    Gauquelin5 needs :
    python geonames2postgres.py BE
    python geonames2postgres.py CH
    python geonames2postgres.py DE
    python geonames2postgres.py DZ
    python geonames2postgres.py FR
    python geonames2postgres.py IT
    python geonames2postgres.py LU
    python geonames2postgres.py MA
    python geonames2postgres.py MC
    python geonames2postgres.py NL
    python geonames2postgres.py US
    
The interest of a solution with data stored in a local database comes from the possibility to make lots of requests without worrying about the limitations of distant calls.
For the moment, this has been used to try a quite restrictive matching :
  • CY (country code) must be exactly the same.
  • C2 (département in France, State in the USA, Province in Italy...) must be exactly the same.
  • The "slug" must be exactly the same (the slug is a name with all letters lowercased, accents removed and all non alpha-numeric characters converted to hyphen ; eg the slug of Saint-Jean de Védas is saint-jean-de-vedas).
This gives low matching rates but prevents accidental association to wrong geonames ids.
This is used in serie A, E1 and E3 (step addGeo).

Better matching rates could be obtained using approximate string matching (like Levenstein), but this may need human validation to prevent wrong associations.

Geonames web service

This has been used for file D6, which doesn't contain place name. The question asked to the web service is "given a longitude and a latitude, give me a place name (reverse geocoding). This gives very poor results for place names, but was useful to restore the country, see page on file D6.

Here also, approximate matching could be tried to get better results.

Note : this kind of request could be done on local database, the only reason to use the web service is to avoid coding something that already exists.