The raw data
Raw files
From data.gouv.fr, we download one file per year:
data.gouv.fr
└── datasets
└── fichier-des-personnes-decedees
├── deces-1970.txt
├── ...
└── deces-2025.txt
Each file contains one line per personne, a total of 28 917 511 lines.
, here are the first lines of deces-1970.txt
DUCRET*MARIE ANTOINETTE/ 21922010901004AMBERIEU-EN-BUGEY 19701210014216 GRANGEON*ERIC JEAN REMY/ 11969032901004AMBERIEU-EN-BUGEY 19700425693831059 VELLET*PHILIPPE/ 11970020101004AMBERIEU-EN-BUGEY 197002030100412 PRESSAVIN*LYDIE/ 21970040601004AMBERIEU-EN-BUGEY 197004060100433 DOUAT*MARIE-SYLVIA MARTINE/ 21970070801004AMBERIEU-EN-BUGEY 1970070801053457 ROSIER*FELIX/ 11891112501004AMBERIEU-EN-BUGEY 197011143001215 BOUVEYRON*PIERRE/ 11900042701005AMBERIEUX-EN-DOMBES 19701211693832094 MILLET*MARIE-LOUISE/ 21900082901017ARGIS 19701225060885310 GIVORD*JACQUES/ 11910081201026BAGE-LE-CHATEL 19701124060884880 CROZET*MARIE CECILE/ 21904092101029BEAUPONT 19701102392093
death-fr.sqlite3
These text files are first loaded in a sqlite database,death-fr.sqlite3.
It contains one table,
person with this structure:
create table person(
fname varchar(80),
gname varchar(80),
sex character(1),
bday character(8),
bcode character(5),
bname character(30),
bcountry varchar(80),
dday character(8),
dcode character(5),
dact character(9)
);
create index idx_bday ON person(bday);
create index idx_dday ON person(dday);
Build death-fr.sqlite3
death-fr.sqlite3 is built by another program called g5, github.com/tig12/g5.
The build process is decribed on github.com/tig12/g5/tree/main/src/commands/enrich/deathfr.
php run-g5.php enrich deathfr raw2sqlite 1970-2025 > data/tmp/enrich/death-fr/sqlite-build-report.log
------------------------------------------------------- Total Execution time: 547.3 s - 00:12:07 ------------------------------------------------------- 28 917 511 lines parsed 28 803 832 lines inserted ----------------------- ERRORS ------------------------ ERR_NAME: 67 incorrect name - inserted anyway ERR_BDAY: 112 808 incorrect birth day - not inserted ERR_DDAY: 798 incorrect death day - not inserted ERR_POSTERIOR: 71 birth posterior to death - not inserted ERR_EXCEPTION: 2 exception => skipped 113 677 lines because of date problemThis shows that the files of data.gouv.fr (1970 - 2025) contain 28 917 511 dates but around 114 000 lines are incorrect.