Output format

This page describes the format of generated csv files, available for download.
G5 database contains structured information, richer than the generated files. Arbitrary choices were made to build these csv file. Other files can be generated on demand.
In the csv files, the first line contains field names ; other lines contain data.
The field separator used in csv file is ; (semicolon).
Character encoding is UTF-8.

Common fields

Some csv files contain fields that are not listed here
In this case, these fields are documented in the README files accompanying the generated files.
Field name Comments
Ids Used to merge historical data
GQID Gauquelin id
Looks like A2-721 ; built using Cura file name and field NUM
See the definition of GQID.
Corresponds to fields QUEL and GNR of Ertel 4391 raw file.
MUID Müller id
Corresponds to field NR of file Müller 1083.
See the definition of MUID.
ERID Ertel id ; only useful for sportsmen.
Corresponds to field NR of file Ertel 4391.
CSID CSICOP id ; only useful for sportsmen.
Corresponds to field CSINR of file Ertel 4391.
WDID Wikidata id ; not integrated yet.
Looks like Q41390.
Names
FNAME Family name.
GNAME Given name.
NOB Nobiliary particle.
Dates
DATE Legal date = date and time as written in the birth certificate.
Format ISO 8601 of this form : YYYY-MM-DD HH:MM.
Example : 2017-05-03 09:30.
Timezone offset is not included ; hour precision is minute (seconds omited).
TZO Timezone offset.
Format sHH:MM:SS, where s is the sign (+ or -).
Ex: +01:00:00 ; -08:30:00.
Seconds can be omitted
Ex: +01:00 ; -08:30.
Negative for places west of Greenwhich, positive for places east.

WARNING : this offset corresponds to the standard definition of timezone offset, but some books (like "Problèmes de l'heure résolus pour le monde entier", Françoise Schneider-Gauquelin) indicate opposite offset.
For the offset used in g5,
legal time = UTC + offset
or
UTC = legal time - offset
More details on page Time and timezone.
DATE-UT Date and time converted to universal time.
This field is in theory useless, as it can be computed from DATE and TZ.
Its presence is necessary because Cura files are sometimes expressed in UT. This problem is detailed in page Time and timezone.
Geographical information
PLACE Birth place name.
CY country code ; ISO 3166, 2 letters format.
Corresponds to field COU in cura.free.fr files.
C2 Administrative division level 2
Means "département" in France ; means "state" in the USA.
Corresponds to field COD in cura.free.fr files.
Corresponds to ADM2 in geonames.org ("second-order administrative division").
C3 Administrative division level 3
Used for arrondissements (Paris, Lyon, Marseille) = division of a big city in different parts.
Does NOT correspond to ADM3 in geonames.org ; in geonames.org, different arrondissements are modeled as different PLACE, and have different geonames id.
GEOID geonames.org unique identifier of the place.
LG Longitude in decimal degrees.
LAT Latitude in decimal degrees.
Other fields
G Gender
F or M in general.
OCCU Occupation code
Corresponds to field PRO in cura files.
When a person has more than one occupation code, the codes are separated by "+". Ex : PO+WR means that the person is identified as a politician and a writer.
See below for codes used in g5.
NOTES Free notes
Only when useful. In files of serie A, E1, E3

Occupation codes

I didn't find a convenient standard for occupation codes, so I built arbitrary codes.
Here is a complete list of occupations codes, used in all generated files.

2 letters are used for general categories (ex AR = artists)
3 letters are used for more precise occupations (ex MUS = musicians).

Note : these lists were generated from data/build/occu.yml by command
php run-g5.php db look occu

General codes

CodeLabel (fr)Label (en)
ARArtisteArtist
EXDirigeantExecutive
JOJournalisteJournalist
MIMilitaireMilitary
PHMédecinPhysician
POPoliticienPolitician
SCScientifiqueScientist
SPSportifSport champion
WREcrivainWriter
XXDiversVarious

Artists

CodeLabel (fr)Label (en)
ACTActeurActor
CARRéalisateur de dessins animésCartoonist
CMBChef d'orchestre militaireConductor of military band
DANDanseurDancer
MUSMusicienMusician
OPEChanteur d'opéraOpera singer
PAIPeintrePainter
PHOPhotographePhotographer

Sports

CodeLabel (fr)Label (en)
ATHAthlétismeTrack and field
AUTAuto-motoAuto-moto
AVIAviationAviation
AVRAvironRowing
BASBasketballBasketball
BILBillardBillard
BOXBoxeBoxing
CANCanoë-kayakCanoe-kayak
CYCCyclismeCyclism
EQUEquitationEquestrian
ESCEscrimeFencing
FEMSports fémininsFemale sports
FOOFootballFootball
GLASports de glaceBobsleigh and Skating
GOLGolfGolf
GYMGymnastiqueGymnastic
HALHaltérophilieWeightlifting
HANHandballHandball
HOCHockeyHockey
LUTLutteWrestling
MARMarcheWalking
NATNatationSwimming
PATPatin à roulettesroller skate
PELPelote basquePelote basque
RUGRugby et Jeu à XIIIRugby and Rugby league
SKISkiSki
TENTennisTennis
TIRTirShooting
VOIVoileSailing
VOLVolley ballVolley ball
Note : code FEM was introduced to restore Gauquelin 1955 original list, but is not part of cura codes ; in file A1, the concerned women are classified in their discipline.