Check data
Precision, reliability

The big part of the job is to achieve data reliability. Ideally, every information stored in g5 database should be checked against an official document. This is a huge work because it involves human work of transcription and verification.
For persons born before world war 2, we don't know the exact birth time. The most precise we can hope is to have a certificate from the hospital (HC). In practice, we have birth certificates (BC) from the civil registries, which are often rounded to the hour.
BC are not ideal but usable for statistical tests.
So in g5 context, a birth time is considered reliable if :
  • It is related to a BC available and verifiable by anyone.
  • The transcription act has been approved by 2 persons or more.
Currently very few birth times have been verified, only to resolve questions raised by g5 development.
See also page Acts.

Trust - data reliability

Trust = level of reliability of an information.
Five main levels of reliability are defined in g5 :
  • 1 - Hospital Certificate (HC)
    - Original document available and verifiable by anyone.
    - Transcription approved by 2 persons or more.
  • 2 - Birth Certificate (BC)
    - Original document available and verifiable by anyone.
    - Transcription approved by 2 persons or more.
  • 3 - Birth Record (BR)
    (= copy of the BC by an officer - may contain mistakes)
    - Original document available and verifiable by anyone.
    - Transcription approved by 2 persons or more.
  • 4 - to check
    Data à priori serious (Cura, Newalchemypress, Astrodatabank) but containing errors.
    Need to be matched against BC.
  • 5 - the rest
    Data without birth time or grabbed from the web (wikidata, web sites).
Precision are constants of class g5\model\DB5

One supplementary level was introduced : 2.5, meaning that one person has checked the BC, but the information has not been confirmed by a second person. Not really level 2 but better than 3.

Most data handled by g5 are level 4, very few are level 2.5.

Note : as far as France is concerned, it's possible today to check BCs online. But at Gauquelin and Müller epoch, they had 2 possibilities : go physically to the archives and consult BCs, or sent a letter and receive BRs. It means that Gauquelin and Müller data are mostly based on BRs. And a BR may differ from the original BC because the officer can make an error of copy, or copy the time of registration instead of the time of birth.
Raw data used by g5 may contain errors from different origins :
  • Copy error from the officer who established a BR.
  • Gauquelin or Müller error when integrating the BR in their files.
  • Error when original paper files where put in an electronic form (for example the error on GNR in Müller 1083 physicians).
  • Bugs in g5 program should be added to this list...

G5 integration

Persons have 2 fields to express reliability :
  • trust permits to specify the default trust level of the person.
  • trust-details is an array associating specific fields and trust level.

This model permits to indicate separately the reliability of each field.
Ex : a person has trust = 4 and trust-details = {"name.fame": 2.5} means that all fields are trust level 4, but one field, person.data['name']['official'] is level 2.5.
When a person is imported in database, it takes by default the trust level of its source.