It is used through the command line.
Open a terminal and clone the repository on your local machine :
git clone https://github.com/tig12/gauquelin5(or download the code).
- Install php (version 7.2 or higher) on your machine.
Install PECL extension "yaml".
On debian-based systems :
sudo apt install php-yamlFor other systems, see php manual.
- Install postgresql on your machine (see below for configuration).
- Geonames.org matching uses a postgresql database filled with python code, see page geonames (only useful for some commands).
Wikidata retrieval also needs curl and sqlite3 PECL extensions :
This is not necessary for data restoration, only to retrieve wikidata on local machine.
sudo apt install php-curl sudo apt install php-sqlite3
gauquelin5/ ├── data/ │ ├── build/ │ ├── output/ │ ├── raw/ │ └── tmp/ ├── docs/ ├── src/ ├── vendor/ ├── config.yml.dist └── run-g5.phpIn the rest of this doc, directory
gauquelin5/is called the root directory.
All the commands issued to run the program are done from the root directory.
The files you need to know about are :
run-g5.phpis the entry point to use the program.
data/contains the data generated and manipulated by the program (see below).
config.yml.distneeds to be copied (see below).
cp config.yml.dist config.ymlEdit
config.ymland adapt some values :
The values can contain either absolute paths or paths relative to root directory.
Default values are all relative to root directory :
dirs: output: data/output tmp: data/tmpAt programm installation,
data/directory contains 3 sub-directories :
These directories contain data necessary to g5, and are versioned with the program. Their locations are imposed and not configurable.
Other sub-directories of
data/, are not versioned, ignored by git.
It contains only one section :
postgresql. Specify here the parameters used to connect to a local postgresql database. page about geonames for details.
postgresqlpermits to specify the connection parameters, which can be identical or different from main g5 database.
namepermits to specify the user name used to call geonames web service.
php run-g5.phpA message saying that you must provide supplementary arguments is displayed.
WRONG USAGE - run-g5.php needs at least 3 arguments ------- Usage : php run-g5.phpThe program uses 3 argument :
[optional arguments] Example : php run-g5.php cura A2 raw2csv ------- Possible values for argument1 : acts, csicop, cura, db, g55, newalch, wd
argument1 : represents in general an information source, like
- argument2 : represents in general one or several files contained in a given information source.
- argument3 : represents in general a treatment done on a given file.
Each time an incomplete command is given to the program, it prints the general error message and prints the possible values for the next missing argument.
php run-g5.php cura
WRONG USAGE - need at least 3 arguments ... (general message) ... Possible argument2 for argument1 = cura : all, look, A, A1, A2, A3, A4, A5, A6, D6, D10, E1, E3Example 2
php run-g5.php cura A3
WRONG USAGE - need at least 3 arguments ... (general message) ... Possible argument3 for cura / A3 : build, export, look, raw2tmp, tmp2db, tweak2tmpExample 3
php run-g5.php cura A3 raw2tmpThis does a real transformation (converts A3 raw html file to a csv file in data/tmp/cura). the page about g5 organisation, the program first converts raw data to temporary data, and then imports temporary files in database.
The different steps must be executed in a precise order, because some steps need the result of previous executions to work.
The order of execution is given by the code of class
php run-g5.php db init all
PARAMETER MISSING Possible values for parameter : tmp : Build files in data/tmp db : Fill database with tmp files all : Build tmp files and fill dbIf 'db' or 'all' are choosen, it also drops existing tables and creates empty ones.
Then the following command builds the database from scratch :
php run-g5.php db init all all
A specific export was written for each historical file, because some fields coming from the raw files are copied in the output.
So each file has a specific command to generate a csv file, for example :
php run-g5.php cura A2 export php run-g5.php newalch muller1083 exportGeneric exports also permit to generate files from database (currently only by profession code).
Profession codes and target file must be specified, for example :
php run-g5.php db export occu SP data/output/new/sport/sportsmen.csv php run-g5.php db export occu WR+JO data/output/new/letters/writers+journalists.csvA more flexible mechanism needs to be developed to specify precisely what to output.