Ertel 4391
(4384 sportsmen)

This page deals with the file 3a_sports.txt downloaded from newalchemypress.com.
It contains the sportmen gathered by Ertel, published in 1988: Raising the Hurdle for the Athletes' Mars Effect: Association Co-Varies With Eminence.
This file is the central piece to merge all files related to the mars effect: it contains ids of Gauquelin LERRCP A1, D6 and D10, Comité para, CSICOP, CFEPP.
  • Brings 1828 new birth dates not present in other datasets (but no birth times) .
  • Permits to fix all missing names of Gauquelin A1.
  • It contains eminence information, with the precise source list permitting to check and reproduce eminence tests.

The good surprise was the ability to precisely reconstitute the groups cited by Ertel in his "Raising the Hurdle" article of 1988 - see Ertel's subsamples.

Ertel 4391 generalities

From newalchemypress.com :
In the last year of his life, Professor Ertel kindly sent his main data-collection, of the 4,391 sports champion.
I also found information about this file in these places :
  • "Raising the Hurdle for the Athletes' Mars Effect: Association Co-Varies With Eminence" by Suibert Ertel, published in Journal of Scientific Exploration. Vol. 2. No. 1. pp. 53-82. 1988, available on scientificexploration.org web site.
    Reffered as [Ertel 88] in this page.
  • "The Tenacious Mars Effect" by Suitbert Ertel and Kenneth Irving, 1996.
    Reffered as [TME 96] in this page.
  • "Is the "Mars Effect" Genuine?" by Paul Kurtz, Jan Willem Nienhuys , Ranjit Sandhu, published in Journal of Scientific Exploration, Vol. 11 , No. 1, pp. 19-39, 1997, available on scientificexploration.org web site.
    Reffered as [KNS 97] in this page.

An incomplete and approximate file

One very strange feature of this file : many records don't contain birth time. Surprising because among the missing dates (~800) are records published by Gauquelin (LERRCP series), so known by Ertel. As noted by newalchemypress.com, birth places, longitudes and latitudes are not given.

Analysis of this file shows several errors and omissions, for example :
  • The file contains 4384 records ; 7 records are missing to reach 4391.
  • The file contains 553 records from Gauquelin 1955 list (instead of 568).
  • In [TME 96], Ertel talks about 192 records common to CSICOP and Gauquelin (file D10).
    In the file, 192 records have a CSICOP id, but only 190 match.
    Not fixed.
  • 5 associations with CSICOP file are erroneous (out of 190 - error rate 2.63 %).
    Fixed in step tweak2tmp.
  • 26 associations with file D10 are erroneous (out of 349 - error rate 7.45 %).
    Fixed in step tweak2tmp.
  • American football players and european football (soccer) players have the same code FOOT.

Integration to g5

This paragraph is obsolete.
Current code extracts information from the input file
and generates a corrected file.
But data are not yet imported in g5 database.
The raw file is data/raw/newalchemypress.com/03-ertel/3a_sports-utf8.txt" (an UTF-8 version of 3a_sports.txt).

Full execution

The full set of transformations can be executed with the command :
php run-g5.php ertel sport all
This is equivalent to :
php run-g5.php ertel sport raw2csv
php run-g5.php ertel sport tweak2tmp
php run-g5.php ertel sport export dl

raw2csv

The first step is to generate the file 5-newalch-csv/4391SPO.csv :
php run-g5.php ertel sport raw2csv
This step copies the contents of 3a_sports-utf8.txt to 4391SPO.csv with the following modifications ;
  • Column GNUM is created, containing a Gauquelin id (a string like "A1-123") from columns QUEL and G_NR for records originating from A1, D6 and D10.
    Left empty for records of other origin.
  • Column NAME is renamed FNAME.
  • Column VORNAME is renamed GNAME.
  • Columns GEBDATUM and STUND produce a column DATE, ISO 8601.
  • Column NATION produces column CY, ISO 3166.
  • Column MF produces column G (gender), containing "M" or "F" (column is empty for males in 3a_sports-utf8.txt).
  • Column SPORTART is renamed SPORT and is converted to.
  • Column INDGRUP is renamed IG (individual or collective sport) ; contains the same value as in 3a_sports-utf8.txt.
  • In column QUEL, "*G:D10" is replaced by "G:D10" (for records with NR = 2872 and 4080).
Other columns are copied as is.

Fix Gauquelin A1 names

This step is included in the restoration process of file A1, not part of Ertel 4391 process.

Fixes 100% of the remaining unidentified names in A1.
php run-g5.php ertel sport fixA1
WRONG USAGE - fixA1 needs one parameter. Can be :
  'report' : echoes the list of names that will be modified by 'update'
  'update' : updates file A1
php run-g5.php ertel sport fixA1 update
Nb missing names in Gauquelin A1 : 117
Ertel 4391 contains : 2084 lines from A1
Nb corrections : 117

Generate skeptics' files

Draft code ; in 4391SPO.csv, the associations between lines and skeptic ids are sometimes wrong or missing.
Can come from a problem in 4391SPO.csv, a misunderstanding of column meanings or a bug in the code.
php run-g5.php ertel sport ertel2skeptics 
PARAMETER MISSING
Possible values for parameter :
  all          : Generate all skeptic files
  cpara        : Generate 5-cpara/535-cpara.csv
  cpara-full   : Generate 5-cpara/611-cpara-full.csv
  cpara-lowers : Generate 5-cpara/76-cpara-lowers.csv
  cfepp        : Generate 5-cfepp/925-cfepp.csv
  csicop       : Generate 5-csicop/192-csicop.csv
php run-g5.php ertel sport ertel2skeptics all
CPARA : 535 records saved - stored in data/5-tmp/cpara/535-cpara.csv
CPARA full : 611 records saved - stored in data/5-tmp/cpara/611-cpara-full.csv
CPARA lowers : 76 records saved - stored in data/5-tmp/cpara/76-cpara-lowers.csv
CSICOP : 192 records saved - stored in data/5-tmp/csicop/192-csicop.csv
CFEPPP : 925 records saved - stored in data/5-tmp/cfepp/925-cfepp.csv

Comité Para

In 4391SPO.csv, 611 records have a PARA_NR value.

In [TME 96] p SE-18, Ertel talks about 611 records :
  • 535 records published in the official test.
  • 76 records that were computed but not retained for the test because not eminent enough. Ertel called them Para Lowers and used these records to show a selection bias in Comité Para data.

It corresponds with 4391SPO.csv : out of the 611 PARA_NR,
  • 535 records have column QUEL = G:A01 (come from file A1) ; they have birth date and time
  • 76 records have column QUEL = GCPAR ; they have birth day but not birth time.

Links to CSICOP

Date comparisons done to build CSICOP test showed errors in Ertel's file.
Two records have a CSINR = 0 ; they correspond to existing records in file D10, but are absent from CSICOP file.
Ertel IdGauquelin idPerson
2285D10-726Kono Tom (Tomio) 1930-07-27
2873D10-894Miller John L. 1947-04-29
Ertel file also contains 5 wrong associations to CSICOP, fixed in step tweak2tmp (see file 3-edited/newalch-tweaked/4391SPO.yml for details).

One record without CSINR could be identified during CSICOP merge :
Miller Freddie 1911-04-03 ; NR = 2872 ; CSID = 254
It brings to 191 the number of CSICOP records present in Ertel's file.

Looking at the file

NOTE : the following code was written to try to understand the content of the file.
I don't understand all the columns, so consider the informations about the file as suppositions, there are possibly mistakes.
php run-g5.php ertel sport look 
PARAMETER MISSING
Possible values for parameter : sport, quel, date, eminence, ids, mars
For example :
php run-g5.php ertel sport look sport
An interest of this file is the presence of columns indicating the ids of the records in other data sets.
The information is contained in the following columns :
  • QUEL : origin of the record
  • NR : id (number) in Ertel's reference - unique id within this file.
  • PARA_NR : id in Comité Para test
  • CFEPNR : id in CFEPP test
  • CSINR : id in CSICOP test
  • G55 : Presence in Gauquelin 1955 experience
  • G_NR has a different meaning, depending on the value of QUEL :
    • For records published by Gauquelin's LERRCP (A1 D6 D10), G_NR is the id of the records within these files.
    • For unpublished records (QUEL = GCPAR, GMINI, GMING, G_ADD, GMINV, GMIND, G_79), it contains a unique id within a given QUEL value.
      (for QUEL = GCPAR, G_NR = PARA_NR).
    G_NR Looks like an id given by Ertel when a Gauquelin id was not available.
php run-g5.php ertel sport look ids
lists the number of records associated to the external datasets :
Gauquelin        G_NR 	   : 4384 (100 %)
Gauquelin 1955 	 G55       : 553 (12.61 %)
Comité Para 	 PARA_NR   : 611 (13.94 %)
CSICOP           CSINR 	   : 192 (4.38 %)
CFEPP 	         CFEPNR    : 925 (21.1 %)
Column QUEL seems to indicate the origin of the records.
php run-g5.php ertel sport look quel
gives the different values of QUEL and corresponding number of records :
[G:A01] => 2087
[G:D06] => 450
[G:D10] => 351
[GCPAR] => 76
[GMIND] => 453
[GMING] => 115
[GMINI] => 599
[GMINV] => 24
[G_79F] => 27
[G_ADD] => 202

Ertel's subsamples

Combining the output of the two previous commands permits a reconstitution of Table 1 given in [Ertel 88], p 59.
This table and accompanying notes describe the samples used by Ertel to build his pool of 4391 records.

Notes :
  • New indicates the number of new records brought by this file, not present in Gauquelin or skeptics' files.
    This file brings 1828 new records.
  • P : Published / Unpublished
  • QUEL : value of column QUEL in the orginal file.
  • Id : Name of the column concerning the sample in the original file.
  • Ner = number claimed by Ertel.
    Ng5 = number found by g5, using columns QUEL and Id.
  • Ertel subsample : name of the subsample in Ertel article.

New P QUELNQUELIdNIdErtel subsampleComments
G:A01 2087
Ner = 2087 Ng5 = 2087
QUEL = G:A01 => records come from Gauquelin file A1.
0 P G55 1 - First French
Ner = 567 Ng5 = 553
Records where column GAUQ1955 is not empty.
Records coming from Michel Gauquelin "L'influence des astres", 1955.
Gauquelin 1955 restoration identifies 564 Gauquelin 1955 records present in file A1.
0 P 1202 2 - First European
Ner = 1189 Ng5 = 1202 (1202 = 2087 - 553 - 332)
Records where QUEL = G:A01 and GAUQ1955 empty and PARA_NR empty.

From [Ertel 88] :
  • 915 non-French athletes used for Gauquelin 1960 european replication
  • 274 casual data gathering
Total of 915 + 274 = 1189 records.
332 P PARA_NR 535 6 - Para champions
Ner = 535 Ng5 = 535
Records with QUEL = G:A01 and PARA_NR = number from 1 to 535.
List published by Comité Para for its 1976 test.
From [Ertel 88] : "Since Gauquelin had already 203 athletes from the Para sample in his earlier studies (1955, 1960), only 332 are gained".
76 U GCPAR 76
(76)
PARA_NR 76 7 - Para lowers
Ner = 76 Ng5 = 76
Records with QUEL = GCPAR and PARA_NR = string from *1 to *76
535 + 76 = 611 ; numbers given in [Ertel 88] correspond to the content of the file.
No birth time - only birth day
From [Ertel 88] : These 76 are part of a group of 241 soccer players gathered for the Comité Para test (1976). They were not retained in Comité Para experiment because considered less eminent and remained unpublished. They were copied by Ertel when he visited Gauquelin laboratory.
Ertel copied only 76 because it was for only 76 out of 241 (ranks 1-76) that mars sector was computed.
G:D10
*G:D10
351
Ner = 351 Ng5 = 351
G:D10 records come from Gauquelin file D10
Gauquelin file contains 352 sportsmen, not 351.
(Code *G:D10 is a typo concerning 2 records, present in D10 Gauquelin file)
0 P CSINR 192 8 - CSICOP-U.S.
Ner = 192 Ng5 = 192
Related to the 1979 CSICOP test (US skeptics)
Out of the 408 (Ertel cites 409) CSICOP records, Ertel uses only 192 also gathered by Gauquelin because mars 36 sector information was not available for CSICOP data.
In file Ertel 4391, 2 records don't match D10, leading to 190 effective matches.
TODO CHECK if these 2 records are new data.
0 P 159 12 - GAUQ-U.S.
Ner = 158 Ng5 = 159
Remaining sportsmen of D10, not already included in CSICOP test.
159 = 351 - 192
599 U GMINI 599 3 - Italian football
Ner = 600 Ng5 = 599
Unpublished by Gauquelin (not famous enough), copied manually by Ertel.
No birth time - only birth day
In file Ertel 4391, all records marked GMINI are marked sport = FOOT and country = IT.
Possible meaning : Gauquelin MINor Italian
115 U GMING 115 4 - German various
Ner = 117 Ng5 = 115
Possible meaning : Gauquelin MINor German
No birth time - only birth day
In file Ertel 4391, all records marked GMING have country = DE.
202 U G_ADD 202 5 - French occasionals
Ner = 204 Ng5 = 202
Copied manually by Ertel in Gauquelin's laboratory.
No birth time - only birth day
Considered as "low-low-ranking" by Gauquelin.
0 P G:D06 450 9 - Second European
Ner = 450 Ng5 = 450
G:D06 records come from Gauquelin file D6.
No birth time - only birth day
Gauquelin file contains 449 sportsmen, not 450.
=> TODO : check (understand Ertel note: "In an appendix to D6 he listed 15 additional athletes whose birth dates had been received too late for inclusion. They were added to the present pool").
24 U GMINV 24 10 - Italian cyclists
Ner = 24 Ng5 = 24
Copied manually by Ertel in Gauquelin's laboratory.
Supposition from [KNS 97] p 25 : GMINV could mean Gauquelin MINor Vélo
No birth time - only birth day
In file Ertel 4391, all records marked GMINV are marked sport = CYCL and country = IT.
453 U GMIND 453 11 - Lower French
Ner = 455 Ng5 = 453
Copied manually by Ertel in Gauquelin's laboratory.
Supposition from [KNS 97] p 25 : GMIND could mean Gauquelin MINor Dictionary
No birth time - only birth day
27 U G_79F 27 13 - Plus special
Ner = 27 Ng5 = 27
Supplementary data sent by Gauquelin to Ertel after his visit in Paris.
No birth time - only birth day
TOTAL
1828 new birth dates without time
Check that numbers match :
Counted in file Ertel 4391 = 553 + 1202 + 332 + 76 + 192 + 159 + 599 + 115 + 202 + 450 + 24 + 453 + 27 = 4384
Numbers coming from [Ertel 88] = 567 + 1189 + 332 + 76 + 192 + 158 + 600 + 117 + 204 + 450 + 24 + 455 + 27 = 4391
Totals match, which is an indication that sample restoration is correct.

Data sources

18 distinct sources can be extracted from column ZITATE :
These lists associate the source code with the number of records found in this source.
[A] => 91
[B] => 90
[C] => 31
[D] => 1564
[E] => 65
[F] => 185
[G] => 84
[H] => 141
[J] => 28
[K] => 353
[M] => 21
[O] => 590
[R] => 28
[S] => 327
[T] => 164
[W] => 137
[X] => 143
[Y] => 67
[D] => 1564
[O] => 590
[K] => 353
[S] => 327
[F] => 185
[T] => 164
[X] => 143
[H] => 141
[W] => 137
[A] => 91
[B] => 90
[G] => 84
[Y] => 67
[E] => 65
[C] => 31
[R] => 28
[J] => 28
[M] => 21
This can be precisely matched with [Ertel 88] : in Table 2, he lists 18 sources. Appendix 3 provides a more detailed list of 21 sources, with their codes. 4 sources have code O, which makes 18 codes.
I checked, there is an exact matching between column ZITATE and Ertel's list.

Birth dates

php run-g5.php ertel sport look date
BUG in date : 46 Albani Peppino : -  -
N total : 4384
N with birth time : 2086 (47.58 %)
N without birth time : 2297 (52.4 %)
N without birth time from Gauquelin LERRCP : 802
Among the 802 missing times coming from Gauquelin, 3 come from A1 and 799 from D6 and D10.

Eminence

php run-g5.php ertel sport look eminence
Columns ZITRANG ZITSUM ZITATE ZITSUM_OD deal with eminence.
  • ZITRANG is the eminence rank (1 - 6).
  • ZITSUM is the number of citations.
  • ZITATE is the list of sources where the person is cited.
  • ZITSUM_OD : I don't know - Equals to ZITSUM or ZITSUM - 1

Here are ranks and citation counts, associated with the number of records.
Ranks
[1] => 2242
[2] => 1108
[3] => 549
[4] => 251
[5] => 101
[6] => 133
Counts
[0] => 2242
[1] => 1108
[2] => 549
[3] => 251
[4] => 101
[5] => 75
[6] => 37
[7] => 18
[8] => 3

Sport codes

php run-g5.php ertel sport look sport
Two columns are related to the sport of the persons : SPORTART and INDGRUP.
INDGRUP contains 'I' or 'G', indicating if this is an individual or collective sport.
The file contains 5 mistakes :
Incoherent association sport / IG, line Cachemire Jacques : BASK I
Incoherent association sport / IG, line David Wilfried : CYCL G
Incoherent association sport / IG, line Frey Andre : FOOT I
Incoherent association sport / IG, line Richard René : HAND I
Incoherent association sport / IG, line Windal Claude : HOCK I
Errors on INDGRUP are fixed in step tweak2tmp.

Sport codes are mostly composed by 4 letters.
This list shows IG and SPORT columns, and the number of records associated with each sport :
I ICES : 16
I JUDO : 5
I MOTO : 4
G PELOT : 18
I RODE : 1
I ROLL : 3
I ROWI : 22
G RUGB : 413
I SHOO : 13
I SKII : 86
I SWIM : 62
I TENN : 89
I TRA : 1
I TRAC : 407
I TRAV : 1
G VOLL : 4
I WALK : 6
I WEIG : 25
I WRES : 19
I YACH : 11
I AIRP : 396
I ALPI : 9
I AUTO : 109
I AVIR : 1
I BADM : 1
G BASE : 25
G BASK : 80
I BILL : 10
I BOBSL : 2
I BOWL : 3
I BOXI : 252
I CANO : 4
I CYCL : 669
I FENC : 41
G FOOT : 1465
I GOLF : 32
I GYMN : 24
G HAND : 12
G HOCK : 21
I HORS : 22
Several codes were corrected :
  • One code is composed by 3 letters (TRA), and this looks like a mistake. It corresponds to NR 4348 Charles Young (wikipedia page), and his sport is american football.
    This code is changed to FOOT in step tweak2tmp
  • Code TRAV corresponds to NR 2378 Martin Lauer (wikipedia page), and his sport is track and fields.
    This code is changed to TRAC in step tweak2tmp

Mars sectors

Concerned columns are
  • MARS contains the sectors when the circle is divided in 36.
  • MA12 contains the sectors when the circle is divided in 12.
  • MA_ importance of the sector in the observation of effects.

Combinig these informations with eminence rank, it should be possible to reproduce Ertel's famous curves of 1988.

Command :
php run-g5.php ertel sport look mars
generates the following table.

This table lists the different values found in the 3 columns.
Interesting because it shows the difference between 12 and 36 sectors systems :
In 36-sectors system, sector 9 and 36 are also considered as important, and they are outside sectors 1 and 4 of 12-sectors system.
This shows that 12-sectors system does not catch all the information about the observed statistical effect.
This argument is used to say that using 36-sectors system is more efficient to observe the effects.
MARSMA12MA_
(importance)
1971
2071
2171
2280
2380
2480
2590
2691
2791
28101
29101
30101
31111
32110
33110
34120
35121
36122
MARSMA12MA_
(importance)
112
212
312
421
521
621
730
830
932
1042
1142
1242
1351
1451
1550
1660
1760
1861