Deaths in France | Observe software

The raw data

Raw files

From data.gouv.fr, we download one file per year:

data.gouv.fr
    └── datasets
        └── fichier-des-personnes-decedees
            ├── deces-1970.txt
            ├── ...
            └── deces-2025.txt

Each file contains one line per personne, a total of 28 917 511 lines.
Here are the first lines of deces-1970.txt

DUCRET*MARIE ANTOINETTE/                                                        21922010901004AMBERIEU-EN-BUGEY                                           19701210014216                              
GRANGEON*ERIC JEAN REMY/                                                        11969032901004AMBERIEU-EN-BUGEY                                           19700425693831059                           
VELLET*PHILIPPE/                                                                11970020101004AMBERIEU-EN-BUGEY                                           197002030100412                             
PRESSAVIN*LYDIE/                                                                21970040601004AMBERIEU-EN-BUGEY                                           197004060100433                             
DOUAT*MARIE-SYLVIA MARTINE/                                                     21970070801004AMBERIEU-EN-BUGEY                                           1970070801053457                            
ROSIER*FELIX/                                                                   11891112501004AMBERIEU-EN-BUGEY                                           197011143001215                             
BOUVEYRON*PIERRE/                                                               11900042701005AMBERIEUX-EN-DOMBES                                         19701211693832094                           
MILLET*MARIE-LOUISE/                                                            21900082901017ARGIS                                                       19701225060885310                           
GIVORD*JACQUES/                                                                 11910081201026BAGE-LE-CHATEL                                              19701124060884880                           
CROZET*MARIE CECILE/                                                            21904092101029BEAUPONT                                                    19701102392093

death-fr.sqlite3

These text files are first loaded in a sqlite database, death-fr.sqlite3.
It contains one table, person with this structure:

create table person(
    fname varchar(80),
    gname varchar(80),
    sex character(1),
    bday character(8),
    bcode character(5),
    bname character(30),
    bcountry varchar(80),
    dday character(8),
    dcode character(5),
    dact character(9)
);
create index idx_bday ON person(bday);
create index idx_dday ON person(dday);

death-fr.sqlite3 is built by another program called g5, github.com/tig12/g5.
The build process is described on github.com/tig12/g5/tree/main/src/commands/enrich/deathfr.

php run-g5.php enrich deathfr raw2sqlite 1970-2025 > data/tmp/enrich/death-fr/sqlite-build-report.log

-------------------------------------------------------
Total Execution time: 547.3 s - 00:12:07
-------------------------------------------------------
28 917 511 lines parsed
28 803 832 lines inserted
----------------------- ERRORS ------------------------
ERR_NAME:       67 incorrect name - inserted anyway
ERR_BDAY:       112 808 incorrect birth day - not inserted
ERR_DDAY:       798 incorrect death day - not inserted
ERR_POSTERIOR:  71 birth posterior to death - not inserted
ERR_EXCEPTION:  2 exceptions
=> skipped      113 677 lines because of date problem (0.4 %)

Cleaning the data

For this first try, the elimination of rows containing errors is incomplete:

The day of birth distribution shows an excess of persons born on january 1st.
The age at death distribution shows that some persons lived 144 years (world record is 122 years)!

This should be fixed, see the related question.

Two variants

Full dataset

When looking at the interaspects between same planets the distributions of the full dataset, an anomaly is visible for all planets: a significant excess of their position at birth are the same as their positions at death.
But the distribution of age at death shows a peak of death just after birth, which could explain this anomaly.

Filtered dataset

That's why distributions were computed for a second variant, where all persons deceased before their first birthday were removed. In this case, the interaspects between same planets don't show anomalies, which permits to conclude that the effect was due to demography.

Execution

To build the control groups, the choice was made to use directly death-fr.sqlite3 to avoid loading data.csv.bz2 in memory.
death-fr.sqlite3 is queried by packets of 1000 rows ; the distributions are computed for these rows and stored in an intermediate database, var/studies/death-fr/tmp.sqlite3 (the distributions are encoded in json and stored in a text field).
At the end of an iteration of 1000 rows, current distributions are added to the distributions stored in tmp.sqlite3.
This permits to stop and resume the computation of a control without losing the computations already done.

Output generated by observe software (full dataset)	observe.tig12.net/output/studies/death-fr
Output generated by observe software (dataset without persons deceased before their first birthday)	observe.tig12.net/output/studies/death-fr2