Data transformationsHere is a summary of the data manipulated by the program.
Auxiliary dataNot directly used by the program - Useful as a reference, to check if g5 has not introduced errors.
Contain copies of original documents, like scans or files.Versioned in another repository: github.com/tig12/g5-aux
Raw dataInput of g5.
Contain usable version of auxiliary data, like files converted to UTF8
or scans transformed to lists through OCR and human corrections.
Conversion between auxiliary to raw data is done by humans, not by program.Versioned with g5 code, in
Raw data are sanitized, corrected, standardized and stored in temporary CSV files.
These CSV files are stored by default in directory
Human corrections and additions
The conversion between
tmpuses human corrections stored in YAML files.Versioned with g5, in
- Temporary data
data/tmpare then loaded and merged in a postgresql database.
gauquelin5 └── data ├── auxiliary # can be absent or removed ├── raw # location imposed (because versioned with g5) ├── tmp # location set in config.yml ├── db # location imposed (because versioned with g5) └── output # location set in config.ymlNote: The fact that raw data are versioned with the program has an interesting consequence:
Cloning g5 repository permits to build the database from scratch.