+ All Categories
Home > Documents > Evaluating the Scientific Veracity of Publications by Dr. Jens Förster

Evaluating the Scientific Veracity of Publications by Dr. Jens Förster

Date post: 08-Nov-2015
Category:
Upload: folianl
View: 425 times
Download: 3 times
Share this document with a friend
Description:
Het onderzoeksrapport dat drie statistici schreven naar aanleiding van hun onderzoek naar de waarheidsgetrouwheid van de voormalige UvA-hoogleraar Jens Försters artikelen.
Popular Tags:
109
Carel F.W. Peeters Chris A.J. Klaassen Mark A. van de Wiel May 15, 2015 Evaluating the Scientific Veracity of Publications by dr. Jens F¨orster
Transcript
  • Carel F.W. Peeters Chris A.J. Klaassen Mark A. van de Wiel

    May 15, 2015

    Evaluating the Scientific Veracityof Publications by dr. Jens Forster

  • II

    Carel F.W. [email protected]

    Chris A.J. [email protected]

    Mark A. van de [email protected]

    Note that the authors do not represent any departmental or institutional affiliation with this report.

    Typeset in LATEX 2 by the authors using Springer Verlags svmono.cls document class.

  • Executive Summary

    On the request of the board of the University of Amsterdam we have investigated the scientificveracity of 24 publications (co)authored by prof. dr Jens Forster. These 24 publications are of theempirical kind and were produced during dr. Forsters affiliation to the University of Amsterdam.The results of our investigation are presented in this report.

    Several psychological experiments conducted in these publications have a rather rare, linearrelation as their outcome. It would be quite surprising if the population that is supposed to berepresented in such an experiment, would exhibit such a linear relation. Moreover, even under thehypothesis that such a linear relation holds within the population, the linearity as seen in theoutcomes often is too good to be true and is in conflict with the unavoidable randomness theseoutcomes should have. We have quantified the extent to which these features of the outcomes areconflicting. Too strong such a conflict between linearity and randomness undermines the scientificveracity of the investigated experiment.

    Our investigation has resulted in Tables 17.1 through 17.4, which can be found in Chapter17. Table 17.1 lists 8 publications that show strong evidence for low scientific veracity, Table 17.2lists 3 publications that show inconclusive evidence for low scientific veracity, and Table 17.3lists 4 publications that show no evidence for low scientific veracity. The cumulative evidence ofthese tables renders a coincidence hypothesis extremely unlikely. Table 17.4 lists the 9 remainingpublications that could not be scrutinized with our present methods.

  • Contents

    Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III

    1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Background to the Forster Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Terms of Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.6 Disclaimer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.7 Employing the Methods to Reference Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.8 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    Part I Publications as Sole Author

    2 JF11.JEPG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    2.2.1 Participant Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2.2 Expert Ratings: Global vs Local Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2.3 Expert Ratings: Local Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    2.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    3 JF10.EJSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    3.2.1 Experiment 1: Word recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2.2 Experiment 1: Face recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    3.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    4 JF09.JEPG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    4.2.1 First set of independent samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.2.2 Secondary dependent variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    4.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    Part II Publications as First Author

  • VI Contents

    5 JF.D12.JESP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    5.2.1 Exemplar 1, liking ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.2.2 Exemplar 2, liking ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.2.3 Exemplar 3, liking ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.2.4 Study 1, Exemplar 1, typicality ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.2.5 Study 1, Exemplar 2, typicality ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.2.6 Study 1, Exemplar 3, typicality ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.2.7 Study 1, Exemplar 1, reaction times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.2.8 Study 1, Exemplar 2, reaction times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.2.9 Study 1, Exemplar 3, reaction times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    5.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    6 JF.D12.SPPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    6.2.1 Participant scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396.2.2 Expert ratings (on creativity) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    6.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    7 JF.EO09.PSPB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    7.2.1 Analytic task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437.2.2 Creative task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447.2.3 Global\local processing task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    7.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    8 JF.LS09.JEPG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    8.2.1 First set of independent samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478.2.2 Second set of independent samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498.2.3 Remaining dependent variables Experiment 1b . . . . . . . . . . . . . . . . . . . . . . . . . . 508.2.4 Follow-up: Collapsing atypical exemplar ratings of Experiment 4a over

    valence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

    9 JF.LK08.JPSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    9.2.1 Analysis first set of independent samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539.2.2 Analysis second set of independent samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559.2.3 Analysis third dependent variable Experiment 5 . . . . . . . . . . . . . . . . . . . . . . . . . 559.2.4 Analysis reported Pooled results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    9.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    Part III Publications as Co-authorData Collected (Partly) in Amsterdam

  • Contents VII

    10 WCY.JF11.JESP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6110.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6110.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6110.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    11 L.JF09.JPSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6311.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6311.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

    11.2.1 Analysis independent samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6311.2.2 Analysis secondary set independent samples Study 4 . . . . . . . . . . . . . . . . . . . . . 66

    11.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

    Part IV Publications as Co-authorData Collected (Partly) in Bremen or Wurzburg

    12 D.JF.LR10.PSPB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6912.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6912.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

    12.2.1 Analysis first set of independent samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6912.2.2 Analysis second set of independent samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7012.2.3 Analysis reaction times Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7112.2.4 Analysis control question ratings Experiment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    12.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    13 K.JF.D10.SPPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7513.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7513.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

    13.2.1 Analysis first set of independent samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7513.2.2 Analysis second set of independent samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7613.2.3 Analysis remaining samples Experiment 3 (reaction times) . . . . . . . . . . . . . . . . 77

    13.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

    14 D.JF.L09.JESP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7914.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7914.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

    14.2.1 Reaction times Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7914.2.2 Reaction times Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8114.2.3 Behavioral aggression measure Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

    14.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

    15 L.JF09.CS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8315.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8315.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

    15.2.1 Analysis first set of independent samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8315.2.2 Analysis second set of independent samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

    15.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

    Part V Publications as Co-authorData Indicated as Collected by Other Authors

  • VIII Contents

    16 FG.JF12.MP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8916.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8916.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8916.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

    Part VI Concluding Remarks

    17 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9317.1 Classification of investigated publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9317.2 Cumulative evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

    Appendix: Some Technical Details on the Methods Employed . . . . . . . . . . . . . . . . . . . 97A.1 The F Test and Fishers Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

    A.1.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97A.1.2 Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97A.1.3 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

    A.2 The Evidential Value V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98A.2.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98A.2.2 Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99A.2.3 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

    References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

  • 1Introduction

    1.1 Preamble

    Untruthful publications that pretend to present scientifically justified results, undermine the build-ing of scientific knowledge. Moreover, they erode public trust in scientific research and may blackenan institute as well as complete fields of study. Furthermore, such publications hamper the cumu-lative nature of the scientific endeavor as studies that build on untruthful results, are useless.From an economic standpoint, untruthful research may be viewed in terms of wastage of publicfunds. From these perspectives, eradication of the tumors that untruthful publications are, is ofthe utmost importance.

    In September of 2012 a report (Whistle Blower Report, dated September 7, 2012) was filed tothe University of Amsterdam (UvA) regarding the suspicion of misconduct by social psychologistdr. Jens Forster (JF). At the time, JF was affiliated with the UvA. This whistle blower (WB)report investigated three publications by JF and argued that the reported results are much toonice to be true, i.e., that it is very unlikely that they have been generated by real life data. Thecurrent report evaluates the evidence for scientific irregularities in all publications (co)authoredby JF while affiliated to the University of Amsterdam. It does so from a statistical perspective,meaning that the likelihood is quantified that the reported results are based on unnatural data.

    1.2 Background to the Forster Case

    After receiving the WB report on September 11, 2012, the board of the University of Amsterdaminstalled an ad hoc committee on September 29, 2012, that investigated the publications analyzed inthe WB report. This committee expressed concern about these publications, but stated that therewas insufficient evidence to conclude that JF had acted against the mores of scientific integrity. In2013 the Dutch National Board for Research Integrity (LOWI) accepted the conclusion by the adhoc committee. Subsequently, the WB filed a complaint at LOWI against this decision on July 31,2013, spurring an official investigation into the Forster and Denzler (2012) publication included inthe original WB report. On March 28, 2014, the LOWI finished its new investigation, concludingthat the data underlying the results in Forster and Denzler (2012) must have been manipulated forwhich JF is to be held responsible (LOWI, 2014). To date, JF denies the allegations of misconduct(see the blogroll at http://www.socolab.de/main.php?id=66 for JFs personal defense).

    1.3 Terms of Reference

    In October 2014 the Rector of the UvA has invited the authors of the current report to investigateall JF papers published under UvA affiliation. The investigation ideally should shed light on thestatus of these publications in terms of judgments on their veracity. This report provides thestatistical evidence for such judgments. Table 1.3 in Section 1.8 lists all publications that havebeen investigated.

  • 2 1 Introduction

    1.4 Procedure

    The procedure applied in the present report may be characterized by the following statements:

    Evaluation will focus on individual publications. The Whistleblower Report (2012) presents,next to results on 3 investigated publications individually, also summary measures (regardingthe deviance of the results reported by JF) over this complete batch of investigated publications.Here the focus lies with judgments regarding the veracity of individual manuscripts.

    Evaluation will be of a statistical nature only. The weight of evidence lies solely with anomaliesin the reported data and results.

    Two methods are employed to each publication to assess its veracity. These methods are (a)F testing paired with Fishers method and (b) the evidential value V. They are described inSection 1.5.

    These methods are geared towards (anomalies in reported patterns for) one-way ANOVA-typedesigns with 3 factor levels. Such designs form a staple for study-setup in JF publications.

    Input for these methods are the results reported in the individual JF publications, i.e., onlysummary measures (means and standard deviations) are at disposal.

    Results from the methods employed form the basis for qualitative judgments regarding theveracity of a publication. These qualitative judgments take the following form (note that eachJF publication is composed of multiple (sets of) independent (sub)experiments):

    Strong evidence: The evidence for low scientific veracity is strong when the F test pairedwith Fishers method gives left-tail probabilities of at least 0.999 and/or when the numberof substantial evidential values in relation to the number of constituent (sub)experimentsin a publication abides one of the following classifications:

    no. of constituent (sub)experiments no. of substantial Vs

    1 12 5 26 11 3

    12 21 4An evidential value is deemed substantial when it is greater than or equal to 6. For example,when a publication has four constituent studies and at least two of those studies sort anevidential value of at least 6, then the evidence for low scientific veracity is consideredstrong.

    Inconclusive evidence: When there is no strong evidence for low scientific veracity (according tothe judgment above), but there are multiple constituent (sub)experiments with a substantialevidential value, then the evidence for low scientific veracity of a publication is consideredinconclusive.

    No evidence: When there is no strong evidence nor inconclusive evidence for low scientificveracity (according to the judgments above), then the publication is considered to show noevidence for low scientific veracity.

    These guidelines should be applied with care.

    1.5 Methods

    Most publications by JF have many (independent) constituent (sub)experiments. The WhistleblowerReport (2012) pointed, for some of these publications, attention to the linearity of the trend acrossexperimental conditions in one-way ANOVA-type designs with 3 factor levels (a setup used inmany JF papers). The effects deviate, given the reported standard deviations and sample sizes,too little from linearity across studies, even under the assumption of perfect linearity in the pop-ulation. Focus here lies with the evaluation, from multiple angles, of such anomalistic trends in

  • 1.5 Methods 3

    comparable study-designs in JF publications. Usage of different methods may counter the critiquethat outcomes are (partially) driven by the chosen approach towards the evaluation of the veracityof the publication. In addition, when multiple methods yield evidence for low veracity, the basis forqualitative judgments regarding untruthfulness is strengthened. Below one can find a non-technicaldescription of the methods employed. Technical details on the methods can be found in AppendixA.

    1. F testing paired with Fishers method : The first method is taken from the WhistleblowerReport (2012) and pairs nested F -testing with the Fisher method. The ANOVA F -model forone-way factorial designs with 3 levels of an experimental factor has 2 regression parameters. Alinear regression between the low and high levels of the experimental factor has only 1 regressionparameter. This linear regression can be viewed as a reduced model and is nested within theANOVA model with 2 regression parameters. One can then perform a nested F -test (F test)to assess if the more complex model significantly contributes to model fit. The null hypothesisin this situation states that the means for the factor-levels have a perfect linear relation. If theempirical results approach linearity, the p-value for the F test approaches the value 1. Whenthe null hypothesis is true, i.e., when perfect linearity holds in the population, the p-valuesfor the F test are distributed uniformly between 0 and 1. When the null hypothesis doesnot hold, i.e., when linearity does not hold in the population, the p-values for the F testtend to take values close to 0. Observing p-values that consistently creep towards 1 then raisessuspicion. The deviance of consistently high p(F )-values can then be formalized with theFisher method. Combining results on independent samples and usage of left-tail probabilitiesthen indicates how strongly the accumulation of tests favors the shared null. This enablesprobabilistic judgments regarding the extremity of the observed consistency w.r.t. linearityunder the assumption that the null hypothesis is true. This setup is of use when a publicationcontains many independent studies.

    2. The evidential value V: This method can be found in Klaassen (2015). It is based on the basicpremise that humans tend to underestimate variation due to randomness when fabricatingdata. Within the framework of the ANOVA model this is incorporated by allowing for depen-dence between the measurement errors of the respective factor-levels. The evidential value thenassesses the hypothesis of a dependence structure in the underlying data, which indicateslow scientific veracity, versus the hypothesis of independence, which is the ANOVA modelassumption. Klaassen (2015). The evidential value can take values ranging from 1 to infinity.Honest experiments can be expected to have a V near 1, while experiments with unnaturaldata will sort higher values for V. A V of at least 6 is deemed substantial and thus indicativeof a dependence structure that proper experiments should not exhibit. The evidential valuemay thus be used to assess individual constituent (sub)experiments within a publication. Whenmultiple independent (sub)experiments are available within a publication, an overall eviden-tial value can be obtained by multiplication. The probability, under independence of the testpersons, that an experiment yields a substantial evidential value V equals at most 0.0809, ap-proximately. The maximum probability of 0.0809 is attained only if exact linearity of the meansholds in the population; see Section A.2.2 in the Appendix. The results of any experiment witha substantial evidential value should be handled with caution.

    Assuming that the results of all (sub)experiments within a publication are independent,quod non, and under independence of the test persons within all experiments, we may boundthe probability of strong evidence as follows:

    no. of constituent (sub)experiments no. of substantial Vs probability of strong evidence

    1 1 0.080942 5 2 0.055546 11 3 0.05344

    12 21 4 0.08475In judging the p(F )-values and the evidential values in order to classify the publications, itmakes sense to choose simplifying thresholds. Of course, information is lost in this way, but it

  • 4 1 Introduction

    enables one to compute probabilities as in the table above. Note that the actual probabilitiesfor this table will be much smaller than the values given, since these probabilities have beencomputed under the assumption that exact linearity of the means holds in all (sub)experimentsinvolved.

    Note that left-tail probabilities and overall evidential values are allowed to grow more extremewhen the number of independent samples increases. When the number of independent samples islow, the weight of evidence shifts towards evidential values for individual experiments. The Ftesting approach paired with Fishers method gives a frequentist perspective. The evidential valueis based on a forensic/Bayesian perspective.

    1.6 Disclaimer

    Note that the methods employed cannot demarcate witting practices (such as fraud and manipu-lation) from unwitting practices (such as erroneous or questionable research practices) leading tolow veracity of the reported data. The question is if the veracity of the data on which a given pub-lication is based can be deemed sufficient. If the data patterns are, from a statistical standpoint,extremely unlikely, the veracity of the reported data is in doubt. Whether such data patterns aredue to witting or unwitting practices then, is of secondary importance: Of main import is that thedata are to be met with distrust, calling into question the scientific value of the publication.

    Importantly, it must be emphasized that the empirical trustworthiness of publications by JF isunder scrutiny, not the integrity of his co-authors. The report does not imply, nor does it intendto imply, that the collaborators of JF were involved in problematic or dubious practices.

    1.7 Employing the Methods to Reference Publications

    For purposes of comparison and interpretation of the numerical results obtained by employing thedescribed methods to the JF publications it is deemed useful to employ them also to a collectionof similar publications in the same field of study (social psychology). Such a collection of similarpublications would then serve as a reference or control group. Table 1.1 lists the digital object iden-tifiers (DOIs) of ten publications bearing 21 independent samples. These are the same publicationsand samples that served as the reference group in the WB report (Whistleblower Report, 2012)and in Klaassen (2015). We confine by referring to Whistleblower Report (2012) for informationon how these reference publications were obtained.

    Table 1.2 lists the necessary data (cell sizes, cell means and corresponding standard deviations)as well as the corresponding results on the F test and the evidential value for the samples fromthe control publications. Figures 1.1 and 1.2 depict the corresponding trend lines. We note that(a) the results of the F test comply with those expected under the null hypothesis of linearity,and (b) the majority of evidential values V are below 2 (and close to unity). These results mayserve as a reference in evaluating the analogous quantities in the JF publications.

    1.8 Overview

    Table 1.3 contains all JF publications under his UvA affiliation that carry empirical results (originallist provided by UvA). The articles are grouped according to type: Publications as sole author,publications as first author, publications as co-author with data collected in either Amsterdamor Bremen/Wurzburg, publications as co-author with data collected in online experiments, andpublications as co-author with data indicated as having been collected by authors other thanJF. Within these groups, the publications are ordered in descending fashion according to yearof publication. The listed order is the order in which the publications will be evaluated in theremainder of this report. Publications marked by an asterisk are not included in the report. All

  • 1.8 Overview 5

    Table 1.1. DOIs of the control/reference publications. There are ten publications carrying 21 independentsamples. Source: Whistleblower Report, 2012

    sample DOI

    Hagtvedt 1 10.1177/0146167211415631Hagtvedt 2 10.1177/0146167211415631Hunt 10.1002/acp.1352Jia 10.1016/j.jesp.2009.05.015Kanten 1 10.1016/j.jesp.2011.04.005Kanten 2 10.1016/j.jesp.2011.04.005Lerouge 1 10.1086/599047Lerouge 2 10.1086/599047Lerouge 3 10.1086/599047Lerouge 4 10.1086/599047Malkoc 10.1016/j.obhdp.2010.07.003Polman 10.1177/0146167211398362Rook 1 10.1080/10400419.2011.621844Rook 2 10.1080/10400419.2011.621844Smith 1 10.1037/0022-3514.90.4.578Smith 2 10.1037/0022-3514.90.4.578Smith 3 10.1037/0022-3514.90.4.578Smith 4 10.1037/0022-3514.90.4.578Smith 5 10.1037/0022-3514.90.4.578Smith 6 10.1037/0022-3514.90.4.578Smith 7 10.1016/j.jesp.2006.12.005

    Table 1.2. Results for F and V on the reference/control publications. The number of observations percell is indicated by n, p(F ) denotes the p-value of the F test, and SD = standard deviation.

    means SDs

    sample n low/high medium high/low low/high medium high/low F p(F ) V

    Hagtvedt1 141/6 4.39 3.97 3.84 0.76 1.26 1.14 0.2852 0.5951 1.3955Hagtvedt2 141/6 3.22 3.84 4.11 0.98 1.02 1.46 0.3483 0.5570 1.1741Hunt 75/3 1.48 1.04 1.04 0.82 0.68 0.68 1.5152 0.2224 1.0000Jia 132/3 1.09 0.70 0.59 0.89 0.69 0.62 1.0437 0.3089 1.0000Kanten1 269/6 3.29 3.14 2.66 1.11 0.94 0.71 0.9318 0.3362 1.0014Kanten2 269/6 3.02 2.99 2.85 0.80 0.84 0.70 0.1478 0.7013 1.7535Lerouge1 63/3 4.24 2.48 2.14 1.51 2.16 2.13 1.8439 0.1796 1.0000Lerouge2 63/3 2.95 2.81 2.62 2.44 1.81 2.25 0.0018 0.9660 12.226513.0148Lerouge3 54/3 4.90 3.31 2.79 2.22 2.09 1.66 0.8550 0.3595 1.0094Lerouge4 54/3 3.69 2.67 2.50 2.78 2.51 1.66 0.3874 0.5364 1.2055Malkoc 521/3 4.72 5.36 6.19 4.96 9.08 10.58 0.0143 0.9048 5.25585.2663Polman 65/3 4.69 3.50 2.91 2.37 2.09 2.42 0.2462 0.6215 1.3369Rook1 168/6 6.22 6.13 4.73 3.05 2.19 1.95 1.3421 0.2501 1.0000Rook2 168/6 5.39 5.22 4.61 2.14 2.58 2.28 0.1649 0.6857 1.6933Smith1 73/3 4.38 4.26 3.55 1.53 1.36 1.07 0.7938 0.3760 1.0146Smith2 76/3 14.83 12.69 11.88 4.62 4.95 4.75 0.3275 0.5689 1.2640Smith3 113/3 0.42 0.53 0.56 0.20 0.19 0.19 1.0743 0.3023 1.0000Smith4 140/3 4.70 7.90 11.80 7.40 11.40 20.40 0.0190 0.8905 4.0388Smith5 125/3 14.52 13.43 12.85 2.81 3.27 3.94 0.1588 0.6909 1.6268Smith6 97/3 10.85 8.64 8.32 5.07 3.61 4.17 1.0289 0.3130 1.0000Smith7 144/3 4.64 4.84 5.49 1.30 1.56 1.28 0.8435 0.3600 1.0200

  • 6 1 Introduction

    Fig. 1.1. Trend lines for the Hagtvedt1 to Polman samples from the control publications. The error barsrepresent one standard deviation from the cell mean.

    experiments studied in such a publication have designs that fall outside the scope of our methods.We do not have the means to assess these publications formally at current.

    Each following chapter considers a single JF publication. Each chapter then elaborates on thespecific design of the studies employed, reports on the results obtained with the methods discussed

  • 1.8 Overview 7

    Fig. 1.2. Trend lines for the Rook1 to Smith7 samples from the control publications. The error barsrepresent one standard deviation from the cell mean.

    in Section 1.5 according to the procedure stated in Section 1.4. Also, the results obtained areevaluated in light of the results on the control papers of Section 1.7. Chapter 17 contains anoverview of the investigated publications that, according to the statistical evidence, appear to bescientifically compromised.

    We note that the terms study, and experiment are not used consistently throughout theJF papers, but that we will use them in accordance with the individual publications evaluated.We also note that each chapter is self-contained (in conjunction with this introduction and theAppendix), implying that there is some redundancy in presentation. This text is accompaniedby two R scripts: DataVeracity.R and Analysis.R. The former contains functions implementingthe methods of Section 1.5. The latter script then contains the annotated code for obtaining thepresented results.

  • 8 1 Introduction

    Table 1.3. The 24 JF publications under UvA affiliation that carry empirical results. The abbreviationsare used as a shorthand to denote the respective papers either in this text or in the accompanying R code.The 9 publications marked with an asterisk (*) were not assessed formally with the methods described inSection 1.5.

    Abbreviation Publication

    As sole author :JF11.JEPG Forster, J. (2011).

    Journal of Experimental Psychology: General, 140: 364-389JF10.EJSP Forster, J. (2010).

    European Journal of Social Psychology, 40: 524-535JF09.JEPG Forster, J. (2009).

    Journal of Experimental Psychology: General, 138: 88-111JF09.JESP* Forster, J. (2009).

    Journal of Experimental Social Psychology, 45: 444-447

    As first author among co-authors:JF.B12.EJSP* Forster, J. and Becker, D. (2012).

    European Journal of Social Psychology, 42: 334-341JF.D12.JESP Forster, J. and Denzler, M. (2012)

    Journal of Experimental Social Psychology, 48: 416-419JF.D12.SPPS Forster, J. and Denzler, M. (2012).

    Social Psychological and Personality Science, 3: 108-117

    JF.OE10.JESP* Forster, J., Ozelsel, A., and Epstude, K. (2010).Journal of Experimental Social Psychology, 46: 237-246

    JF.EO09.PSPB Forster, J., Epstude, K., and Ozelsel, A. (2009).Personality and Social Psychology Bulletin, 35: 1479-1491

    JF.LS09.JEPG Forster, J., Liberman, N., and Shapira, O. (2009).Journal of Experimental Psychology: General, 138: 383-399

    JF.LK08.JPSP Forster, J., Liberman, N., and Kuschel, S. (2008).Journal of Personality and Social Psychology, 94: 579-599

    As co-author, data collected (partly) in Amsterdam:WCY.JF11.JESP Woltin, K.-A., Corneille, O., Yzerbyt, V.Y., and Forster, J. (2011).

    Journal of Experimental Social Psychology, 47: 418-424L.JF09.JPSP Liberman, N. and Forster, J. (2009).

    Journal of Personality and Social Psychology, 97: 203-216

    As co-author, data collected (partly) in Bremen or Wurzburg:D.JF.LR10.PSPB Denzler, M., Forster, J., Liberman, N., and Rozenman, M. (2010).

    Personality and Social Psychology Bulletin, 36: 1385-1396K.JF.D10.SPPS Kuschel, S., Forster, J., and Denzler, M. (2010).

    Social Psychological and Personality Science, 1: 4-11D.JF.L09.JESP Denzler, M., Forster, J., and Liberman, N. (2009).

    Journal of Experimental Social Psychology, 45: 90-100L.JF09.CS Liberman, N. and Forster, J. (2009).

    Cognitive Science, 33: 1330-1341L.JF08.SC* Liberman, N. and Forster, J. (2008).

    Social Cognition, 26: 515-533S.JF08.PACA* Schimmel, K. and Forster, J. (2008).

    Psychology of Aesthetics, Creativity, and the Arts, 2: 53-60W.JF07.JASP* Werth, L. and Forster, J. (2007).

    Journal of Applied Social Psychology, 37: 2764-2787

    As co-author, data collected in online experiment:VE.JF08.HR* Voelpel, S.C., Eckhoff, R.A., and Forster, J. (2008).

    Human Relations, 61: 271-295

    As co-author, data indicated as having been collected by other authors:GV.JF.MS12.EJSP* Gervais, S.J., Vescio, T.K., Forster, J., Maass, A., and Suitner, C. (2012).

    European Journal of Social Psychology, 42: 743-753FG.JF12.MP Friedman, R.S., Gordis, E., and Forster, J. (2012).

    Media Psychology, 15: 249-266DH.JF11.PSPB* Denzler, M., Hafner, M., and Forster, J. (2011).

    Personality and Social Psychology Bulletin, 37: 1644-1654

  • Part I

    Publications as Sole Author

  • 2JF11.JEPG

    Publication Investigated

    Forster, J. (2011). Local and global cross-modal influences between vision and hearing, tasting,smelling, or touching. Journal of Experimental Psychology: General, 140: 364389.

    2.1 Synopsis

    This publication was also included in the Whistleblower Report (2012). It features 16 studies. Stud-ies 5A to 5D feature participant scores as well as expert ratings. The expert ratings actually implynested data (participants rated by experts), however, they will be evaluated from the perspective(as in the publication investigated) of a between factorial design. The participant scores and expertratings are treated separately in the construction of sets of independent samples. Tables 2.1 and2.2 provide an overview of the design of the studies regarding participant scores and expert ratings,respectively. The publication reports that in each study the participants were assigned randomlyto a local, control, or global condition. Studies 2C, 3C, and 4C feature 2 factor levels and are notanalyzed here.

    Table 2.1. Design studies regarding participant scores.

    Study Design Dependent variables

    1A 3 between 11B 3 between 11C 3 between 12A 3 between 12B 3 between 12C 2 between 13A 3 between 13B 3 between 13C 2 between 14A 3 between 14B 3 between 14C 2 between 15A 3 between 15B 3 between 15C 3 between 15D 3 between 1

  • 12 2 JF11.JEPG

    Table 2.2. Design studies regarding expert ratings.

    Study Design Dependent variables

    5A 3 between 15B 3 between 25C 3 between 15D 3 between 1

    2.2 Results

    2.2.1 Participant Scores

    Trend lines for the independent samples regarding participant scores can be found in Figure 2.1.The trend lines indicate very consistent linear effects. They may hint that (at least for the samplesizes reported) the variation in group means may deviate too little from linearity given the spreadreported for the respective conditions. Table 2.3 lists the corresponding data (cell sizes, cell meansand corresponding standard deviations) as well as the corresponding results on the F test andthe evidential value.

    Table 2.3. Results on the independent samples regarding participant scores. The number of observationsper cell is indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation, NaN =not a number.

    means SDs

    Study n low control high low control high F p(F ) V

    study1A 48/3 27.00 31.00 38.00 11.00 13.00 10.00 0.1846 0.6695 1.6403study1B 58/3 19.00 27.00 37.00 13.00 14.00 12.00 0.0760 0.7839 2.3664study1C 57/3 20.00 28.00 39.00 12.00 11.00 10.00 0.2342 0.6303 1.4074study2A 61/3 16.00 25.00 33.00 14.00 15.00 13.00 0.0172 0.8960 4.8226study2B 45/3 19.00 31.00 40.00 15.00 9.00 10.00 0.1663 0.6855 1.4756study3A 44/3 22.00 30.00 36.00 14.00 13.00 14.00 0.0523 0.8203 2.6599study3B 44/3 21.00 30.00 39.00 13.00 11.00 9.00 0.0000 1.0000 NaNstudy4A 44/3 22.00 29.00 37.00 15.00 11.00 13.00 0.0142 0.9056 4.7322study4B 43/3 21.00 30.00 39.00 13.00 9.00 10.00 0.0000 1.0000 15.2314NaNstudy5A 42/3 6.90 9.74 12.38 3.06 3.71 3.23 0.0083 0.9277 6.17197.0386study5B 42/3 2.79 3.79 4.86 1.31 1.19 1.51 0.0063 0.9370 6.46447.2233study5C 42/3 3.00 5.05 7.00 1.20 2.22 3.61 0.0036 0.9524 9.4973study5D 42/3 2.96 6.14 9.50 1.26 3.80 5.96 0.0044 0.9475 8.7926

    The p-values for the F test are consistently high (most are above .8), while under the nullhypothesis of perfect linearity in the population these p-values (by definition) are expected to beuniformly distributed between 0 and 1. Employing Fishers method in combining these p-valuesgives a left-tail probability of 1 4.255229e-7 .9999996. Thus, the accumulation of tests onthe similar null hypotheses of linearity very strongly favors the shared null. Or, roughly speaking,under the assumption of perfect linearity in the population, the probability of finding results atleast as consistent w.r.t. linearity amounts to 1 in 2, 350, 050.

    The instances of NaN for the evidential value in Table 2.3 are due to divisions by 0 (in itscalculation). In a sense, one could conceive of as being a lower-bound to NaN in these instances.Many evidential values in Table 2.3 lie above 6. The overall V is found to have a lower-bound(when leaving out V for Study 3B) of 24, 833, 154.

  • 2.3 Remarks 13

    2.2.2 Expert Ratings: Global vs Local Descriptions

    Trend lines for the independent samples regarding expert ratings can be found in Figure 2.2. Thesetrend lines also display very consistent linear effects. Table 2.4 lists the corresponding data andresults on the F test and the evidential value. Again, consistently high p-values for the F testand substantial (ranges for) evidential values are found. Fishers method gives a left-tail probabilityof 1 .0004790825 = .9995209 giving a probability of finding results at least as consistent w.r.t.linearity of 1 in 2, 087. The overall V has a lower-bound of 294.4692.

    Table 2.4. Results on the independent samples regarding expert ratings. The number of observations percell is indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation.

    means SDs

    Study n low control high low control high F p(F ) V

    study5A.2 42/3 2.75 3.91 4.89 1.38 1.10 1.33 0.0464 0.8305 2.6986study5B.2 42/3 2.21 3.54 4.90 1.37 2.07 1.60 0.0007 0.9787 3.953025.1064study5C.2 42/3 2.55 4.14 5.55 1.87 2.04 1.73 0.0213 0.8847 4.3744study5D.2 42/3 2.33 3.71 5.12 1.34 1.93 1.82 0.0007 0.9788 6.310324.1770

    2.2.3 Expert Ratings: Local Descriptions

    A second dependent variable is reported for the expert ratings regarding Study 5B: local descrip-tions. The trend line can be found in Figure 2.3 while Table 2.5 lists the corresponding data andresults. Again, a high p-value for the F test is found as well as a substantial range for theevidential value.

    Table 2.5. Results on the expert ratings regarding local descriptions. The number of observations per cellis indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation.

    means SDs

    Study n low control high low control high F p(F ) V

    study5B.3 42/3 2.36 3.57 4.71 1.38 1.93 1.17 0.0049 0.9445 3.19589.8906

    2.3 Remarks

    Note that the JF11.JEPG publication was also included in the Whistleblower Report (2012).The results on the F test reported above concur with this report. They convey that the linearpattern seems too consistent. In addition, the evidential values imply the presence of a dependencestructure between test persons. Comparison to the results obtained on the control publications (seeTable 1.2) may further strengthen the notion of deviance of the results reported in JF11.JEPG.The evidence for low scientific veracity of this publication is considered strong according to thecriterion of Section 1.4.

  • 14 2 JF11.JEPG

    Fig. 2.1. Trend lines for the independent samples regarding participant scores. The error bars representone standard deviation from the cell mean.

  • 2.3 Remarks 15

    Fig. 2.2. Trend lines for the independent samples regarding expert ratings. The error bars represent onestandard deviation from the cell mean.

    l

    l

    l

    2

    4

    6

    global control local

    Study 5B: Local descriptions (expert ratings)

    Fig. 2.3. Trend line for the expert ratings regarding local descriptions. The error bars represent onestandard deviation from the cell mean.

  • 3JF10.EJSP

    Publication Investigated

    Forster, J. (2010). How love and sex can influence recognition of faces and words: A processingmodel account. European Journal of Social Psychology, 40: 524535.

    3.1 Synopsis

    This publication features 2 experiments. Table 3.1 provides an overview of their design. Experiment2 features 4 levels for the between-factor and is not analyzed here. Experiment 1 can be analyzedas a one-way design with 2 dependent variables (word recognition and face recognition).

    Table 3.1. Design experiments.

    Experiment Design Dependent variables

    1 3 between 2 within 12 4 between 2 within 1

    3.2 Results

    3.2.1 Experiment 1: Word recognition

    The trend line for the word-recognition part of Experiment 1 can be found in Figure 3.1. It conveysa perfect linear effect for the experimental condition (supraliminal priming). Table 3.2 lists thecorresponding data (cell size, cell means and corresponding standard deviations) as well as thecorresponding results on the F test and the evidential value.

    The p-value for the F test amounts to unity (1) due to the perfect linearity. Also, there isa substantial lower-bound for the evidential value (10.4326). The upper-bound for the evidentialvalue may be termed extreme.

  • 18 3 JF10.EJSP

    Table 3.2. Results on Experiment 1: Word recognition. The number of observations per cell is indicatedby n, p(F ) denotes the p-value of the F test, and SD = standard deviation.

    means SDs

    Study n low control high low control high F p(F ) V

    Exp1.words 45/3 0.14 0.29 0.44 0.23 0.22 0.16 0 1 10.43261.4716e+15

    l

    l

    l

    0.0

    0.2

    0.4

    0.6

    love control sex

    Experiment 1: Words recognition

    Fig. 3.1. Trend line for Experiment 1: Word recognition. The error bars represent one standard deviationfrom the cell mean.

    3.2.2 Experiment 1: Face recognition

    The trend line for the face-recognition part of Experiment 1 can be found in Figure 3.2. It conveysa very linear effect for the experimental condition. Table 3.3 lists the corresponding data as wellas the corresponding results on the F test and the evidential value. Again, a high p-value for theF test (0.9307) and a substantial evidential value (6.2330) are found.

    Table 3.3. Results on Experiment 1: Face recognition. The number of observations per cell is indicatedby n, p(F ) denotes the p-value of the F test, and SD = standard deviation.

    means SDs

    Study n low control high low control high F p(F ) V

    Exp1.faces 45/3 0.38 0.51 0.65 0.23 0.14 0.16 0.0076 0.9307 6.2330

  • 3.3 Remarks 19

    l

    l

    l

    0.2

    0.4

    0.6

    0.8

    sex control love

    Experiment 1: Faces recognition

    Fig. 3.2. Trend line for Experiment 1: Face recognition. The error bars represent one standard deviationfrom the cell mean.

    3.3 Remarks

    Due to the low number of independent samples the weight of evidence lies, as indicated in Chapter1, with the evidential value. The obtained evidential values imply the presence of a dependencestructure between test persons. Indeed, the evidence for low scientific veracity of this publicationis considered strong according to the criterion of Section 1.4.

  • 4JF09.JEPG

    Publication Investigated

    Forster, J. (2009). Relations between perceptual and conceptual scope: How global versus local pro-cessing fits a focus on similarity versus dissimilarity. Journal of Experimental Psychology: General,138: 88111.

    4.1 Synopsis

    This publication was also included in the Whistleblower Report (2012). It features 12 experiments.Study 4 features expert evaluations, implying nested data (participants evaluated by experts).Study 4 will however be analyzed without regard to this hierarchical structure (as in the publicationinvestigated). Experiment 8a features 2 factor-levels and is not analyzed here. Table 4.1 providesan overview of the design of the experiments. Note that most experiments have a second (betweenor within) factor. This means that different sets of independent samples can be constructed (seealso Section 4.3 below).

    Table 4.1. Design Experiments.

    Experiment Design Dependent variables

    1 3 between 2 within 12 3 between 2 between 13a 3 between 2 between 13b 3 between 2 between 14 3 between 2 within 15 3 between 2 within 16 3 between 17a 3 between 2 between 17b 3 between 2 within 18a 2 between 18b 3 between 2 between 19 3 between 2 between 1

  • 22 4 JF09.JEPG

    4.2 Results

    4.2.1 First set of independent samples

    Trend lines for the first set of independent samples can be found in Figure 4.1. The trend linesindicate very consistent linear effects. They may hint that (at least for the sample sizes reported)the variation in group means may deviate too little from linearity given the spread reported for therespective conditions. Table 4.2 lists the corresponding data (cell sizes, cell means and correspond-ing standard deviations) as well as the corresponding results on the F test and the evidentialvalue.

    Table 4.2. Results on the first set of independent samples. The number of observations per cell is indicatedby n, p(F ) denotes the p-value of the F test, SD = standard deviation, NaN = not a number, sim =similarities, dis = dissimilarities, glob = global.

    means SDs

    Study n low control high low control high F p(F ) V

    exp1.sim 54/3 4.67 6.56 8.67 2.35 2.53 2.25 0.0256 0.8734 3.9565exp2.sim 88/6 5.43 6.60 7.71 1.83 3.16 3.93 0.0009 0.9760 12.586320.2388exp2.dis 88/6 5.00 6.31 7.36 2.08 2.77 1.86 0.0321 0.8588 3.23583.8275exp3a.sim 75/6 3.00 4.00 5.00 1.29 1.54 0.71 0.0000 1.0000 3.1610NaNexp3a.dis 75/6 3.75 5.23 7.00 2.18 1.83 0.95 0.0584 0.8105 2.6542exp3b.sim 71/6 4.72 6.42 8.00 1.42 1.88 2.49 0.0073 0.9327 6.9735exp3b.dis 71/6 2.46 3.64 5.50 1.56 1.43 1.62 0.3852 0.5392 1.1600exp4.sim 55/3 0.83 1.17 1.79 1.29 1.04 1.44 0.1491 0.7010 1.5706exp5.glob 50/3 594.00 689.00 759.00 88.00 91.00 138.00 0.1485 0.7017 1.5867exp6 42/3 7.10 8.00 8.93 1.14 1.62 0.83 0.0014 0.9707 2.772919.1029exp7a.sim 101/6 4.76 6.76 8.59 2.39 2.46 2.09 0.0151 0.9028 5.1282exp7a.dis 101/6 6.24 7.00 7.35 3.56 2.80 3.14 0.0466 0.8300 2.7174exp7b.sim 60/3 8.20 9.90 11.00 2.84 2.63 2.37 0.1748 0.6775 1.5858exp8b.sim 45/6 5.00 7.40 8.53 2.00 2.32 1.73 0.4887 0.4927 1.1513exp8b.dis 45/6 6.27 6.73 7.13 2.02 1.87 2.20 0.0011 0.9740 9.409417.6775exp9.sim 90/6 4.87 7.00 8.67 2.17 2.23 2.06 0.1140 0.7374 1.9318exp9.dis 90/6 5.67 6.67 7.67 2.97 2.47 3.11 0.0000 1.0000 10.2393NaN

    The p-values for the F test are consistently high, while under the null hypothesis of perfectlinearity in the population these p-values (by definition) are expected to be uniformly distributedbetween 0 and 1. Employing Fishers method in combining these p-values gives a left-tail probabilityof 1 3.504679e-7 .9999996. Thus, the accumulation of tests on the similar null hypotheses oflinearity very strongly favors the shared null. Or, roughly speaking, under the assumption of perfectlinearity in the population, the probability of finding results at least as consistent w.r.t. linearityamounts to 1 in 2, 853, 328.

    The instances of NaN for the evidential value in Table 4.2 are due to divisions by 0 (in itscalculation). In a sense, one could conceive of as being a lower-bound to NaN in these instances.Many experiments are represented by a substantial (lower-bound for the) evidential value. Theoverall V is found to have a lower-bound of 357, 847, 863.

    4.2.2 Secondary dependent variables

    The experiments that have a second within-factor can be viewed as carrying a secondary dependentvariable. Trend lines for these secondary dependent variables can be found in Figure 4.2. These

  • 4.2 Results 23

    Fig. 4.1. Trend lines for the first set of independent samples. The error bars represent one standarddeviation from the cell mean.

    trend lines also display very consistent linear effects. Table 4.3 lists the corresponding data andresults on the F test and the evidential value. Again, consistently high p-values for the F

  • 24 4 JF09.JEPG

    Fig. 4.2. Trend lines for the secondary dependent variables. The error bars represent one standard devi-ation from the cell mean.

    test and substantial (lower-bounds for) evidential values are found. Fishers method gives a left-tail probability of 1 .0005404186 = .9994596, giving a probability of finding results at least asconsistent w.r.t. linearity of 1 in 1, 850. The overall V has a lower-bound of 1, 506.134.

    Table 4.3. Results on the secondary dependent variables. The number of observations per cell is indicatedby n, p(F ) denotes the p-value of the F test, SD = standard deviation, dis = dissimilarities, loc =local.

    means SDs

    Study n low control high low control high F p(F ) V

    exp1.dis 54/3 6.17 6.72 7.33 3.54 2.74 3.22 0.0011 0.9741 9.799817.3454exp4.dis 55/3 0.42 1.50 2.56 0.90 1.15 2.36 0.0005 0.9827 14.084124.2029exp5.loc 50/3 675.00 735.00 786.00 63.00 63.00 86.00 0.0440 0.8347 2.7929exp7b.dis 60/3 8.00 9.15 10.05 1.95 3.01 3.25 0.0267 0.8708 3.9071

  • 4.3 Remarks 25

    4.3 Remarks

    Note that the JF09.JEPG publication was also included in the WB report (2012). The resultson the F test reported above concur with this report. Note, however, that the reported left-tail probabilities obtained by Fishers method differ due to slightly differing sets of independentsamples (for example, here the exp1.sim sample is included in the first set of independent sampleswhile the WB report includes it in the set of secondary dependent variables). The overall results arequalitatively similar though: They convey that the linear pattern seems too consistent. In addition,the evidential values imply, for many experiments, the presence of a dependence structure betweentest persons. Comparison to the results obtained on the control publications (see Table 1.2) mayfurther strengthen the notion of deviance of the results reported in JF09.JEPG. The evidence forlow scientific veracity of this publication is considered strong according to the criterion of Section1.4.

  • Part II

    Publications as First Author

  • 5JF.D12.JESP

    Publication Investigated

    Forster, J. and Denzler, M. (2012). When any worx looks typical to you: Global relative to localprocessing increases prototypicality and liking. Journal of Experimental Social Psychology, 48:416419.

    5.1 Synopsis

    This publication contains 2 studies. For study 1, typicality ratings, liking ratings, and reaction timesare reported as dependent measures. For study 2, only liking ratings are reported as a dependentmeasure. Table 5.1 provides an overview of the design of the experiments. The purpose of Study2 was to replicate Study 1 (w.r.t. liking ratings). The between factor (processing style with levels:global, local, control) and within factor (typicality with levels: exemplar 1, exemplar 2, exemplar3) thus concur for Studies 1 and 2.

    Table 5.1. Design Studies.

    Study Design Dependent variables

    1 3 between 3 within 32 3 between 3 within 1

    5.2 Results

    5.2.1 Exemplar 1, liking ratings

    Trend lines for the liking ratings on the first within factor-level (Exemplar 1) can be found inFigure 5.1. The trend lines indicate very linear effects. Table 5.2 lists the corresponding data (cellsizes, cell means and corresponding standard deviations) as well as the corresponding results onthe F test and the evidential value.

    The p-values for the F test are high. Employing Fishers method in combining these p-valuesgives a left-tail probability of 1 0.00136641 = .9986336. Roughly speaking, under the assumptionof perfect linearity in the population, the probability of finding results at least as consistent w.r.t.linearity amounts to approximately 1 in 732. The listed evidential values have substantial lower-bounds. The overall V has a lower-bound of 61.71324.

  • 30 5 JF.D12.JESP

    Table 5.2. Results on liking ratings for Exemplar 1. The number of observations per cell is indicated byn, p(F ) denotes the p-value of the F test, SD = standard deviation.

    means SDs

    Study n low medium high low medium high F p(F ) V

    study1.ex1.liking 60/3 1.65 1.95 2.20 3.0 1.40 1.44 0.0019 0.9652 7.543711.8122study2.ex1.liking 68/3 2.14 2.30 2.48 1.7 1.58 1.95 0.0005 0.9823 8.180826.0208

    Fig. 5.1. Trend lines for the liking ratings on Exemplar 1. The error bars represent one standard deviationfrom the cell mean.

    5.2.2 Exemplar 2, liking ratings

    Trend lines for the liking ratings on the second within factor-level (Exemplar 2) can be found inFigure 5.2. The linear effects seem less extreme in comparison with the trend lines of Exemplar 1.Table 5.3 lists the corresponding data and results on the F test and the evidential value.

    Employing Fishers method in combining the F p-values gives a left-tail probability of1 0.1432497 = .8567503. Roughly speaking, under the assumption of perfect linearity in thepopulation, the probability of finding results at least as consistent w.r.t. linearity amounts toapproximately 1 in 7. The listed evidential values are below 2. The overall V amounts to 2.639857.

    Table 5.3. Results on liking ratings for Exemplar 2. The number of observations per cell is indicated byn, p(F ) denotes the p-value of the F test, SD = standard deviation.

    means SDs

    Study n low medium high low medium high F p(F ) V

    study1.ex2.liking 60/3 -0.05 0.80 2.00 2.61 1.51 1.41 0.1106 0.7407 1.7564study2.ex2.liking 68/3 0.48 1.09 2.09 2.79 1.44 1.13 0.1548 0.6953 1.5030

  • 5.2 Results 31

    Fig. 5.2. Trend lines for the liking ratings on Exemplar 2. The error bars represent one standard deviationfrom the cell mean.

    5.2.3 Exemplar 3, liking ratings

    Trend lines for the liking ratings on the third within factor-level (Exemplar 3) can be found in Figure5.3. Table 5.4 lists the corresponding data and results. Employing Fishers method in combiningthe F p-values gives a left-tail probability of 1 0.09028803 = .909712. That is, roughly, underthe assumption of perfect linearity in the population, the probability of finding results at least asconsistent w.r.t. linearity amounts to approximately 1 in 11. The listed evidential value for Study1 has a substantial lower-bound (6.6032). The overall V has a lower-bound of 9.926733.

    Table 5.4. Results on liking ratings for Exemplar 3. The number of observations per cell is indicated byn, p(F ) denotes the p-value of the F test, SD = standard deviation.

    means SDs

    Study n low medium high low medium high F p(F ) V

    study1.ex3.liking 60/3 -2.35 -0.45 1.55 2.43 1.85 2.09 0.0073 0.9322 6.60326.6669study2.ex3.liking 68/3 -1.48 -0.05 1.91 2.39 2.36 2.02 0.2072 0.6505 1.5033

    5.2.4 Study 1, Exemplar 1, typicality ratings

    The trend line for the typicality ratings of Study 1 on the first within factor-level (Exemplar 1)can be found in Figure 5.4. Table 5.5 lists the corresponding data and results.

    5.2.5 Study 1, Exemplar 2, typicality ratings

    The trend line for the typicality ratings of Study 1 on the second within factor-level (Exemplar 2)can be found in Figure 5.5. Table 5.6 lists the corresponding data and results.

    5.2.6 Study 1, Exemplar 3, typicality ratings

    The trend line for the typicality ratings of Study 1 on the third within factor-level (Exemplar 3)can be found in Figure 5.6. Table 5.7 lists the corresponding data and results.

  • 32 5 JF.D12.JESP

    Fig. 5.3. Trend lines for the liking ratings on Exemplar 3. The error bars represent one standard deviationfrom the cell mean.

    Table 5.5. Results on typicality ratings for Exemplar 1 in Study 1. The number of observations per cellis indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation.

    means SDs

    Study n low medium high low medium high F p(F ) V

    study1.ex1.typ 60/3 1.95 2.75 2.85 2.4 1.97 1.93 0.3666 0.5473 1.1786

    Table 5.6. Results on typicality ratings for Exemplar 2 in Study 1. The number of observations per cellis indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation.

    means SDs

    Study n low medium high low medium high F p(F ) V

    study1.ex2.typ 60/3 -0.7 0.7 2.95 2.39 2.34 1.54 0.5328 0.4684 1.1118

    Table 5.7. Results on typicality ratings for Exemplar 3 in Study 1. The number of observations per cellis indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation.

    means SDs

    Study n low medium high low medium high F p(F ) V

    study1.ex3.typ 60/3 -2.6 -1.2 2.25 2.04 2.51 2.65 2.4036 0.1266 1.0000

  • 5.2 Results 33

    l

    ll

    0

    1

    2

    3

    4

    5

    control global local

    Study 1: Exemplar 1, typicality ratings

    Fig. 5.4. Trend line for the typicality ratings on Exemplar 1 in Study 1. The error bars represent onestandard deviation from the cell mean.

    l

    l

    l

    2

    0

    2

    4

    local control global

    Study 1: Exemplar 2, typicality ratings

    Fig. 5.5. Trend line for the typicality ratings on Exemplar 2 in Study 1. The error bars represent onestandard deviation from the cell mean.

    l

    l

    l

    5.0

    2.5

    0.0

    2.5

    5.0

    local control global

    Study 1: Exemplar 3, typicality ratings

    Fig. 5.6. Trend line for the typicality ratings on Exemplar 3 in Study 1. The error bars represent onestandard deviation from the cell mean.

  • 34 5 JF.D12.JESP

    5.2.7 Study 1, Exemplar 1, reaction times

    The trend line for the reaction times of Study 1 on the first within factor-level (Exemplar 1) canbe found in Figure 5.7. Table 5.8 lists the corresponding data and results.

    Table 5.8. Results on reaction times for Exemplar 1 in Study 1. The number of observations per cell isindicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation.

    means SDs

    Study n low medium high low medium high F p(F ) V

    study1.ex1.reac 60/3 4771 4779 4801 159 124 128 0.0344 0.8536 3.1713

    5.2.8 Study 1, Exemplar 2, reaction times

    The trend line for the reaction times of Study 1 on the second within factor-level (Exemplar 2)can be found in Figure 5.8. Table 5.9 lists the corresponding data and results.

    Table 5.9. Results on reaction times for Exemplar 2 in Study 1. The number of observations per cell isindicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation.

    means SDs

    Study n low medium high low medium high F p(F ) V

    study1.ex2.reac 60/3 4770 4887 5034 143 353 568 0.0192 0.8902 4.1917

    ll

    l

    4600

    4700

    4800

    4900

    global control local

    Study 1: Exemplar 1, reaction times

    Fig. 5.7. Trend line for the reaction times on Exemplar 1 in Study 1. The error bars represent one standarddeviation from the cell mean.

  • 5.3 Remarks 35

    l

    l

    l

    4500

    4800

    5100

    5400

    global control local

    Study 1: Exemplar 2, reaction times

    Fig. 5.8. Trend line for the reaction times on Exemplar 2 in Study 1. The error bars represent one standarddeviation from the cell mean.

    5.2.9 Study 1, Exemplar 3, reaction times

    The trend line for the reaction times of Study 1 on the third within factor-level (Exemplar 3) canbe found in Figure 5.9. Table 5.10 lists the corresponding data and results. The evidential value(8.8874) is deemed substantial.

    Table 5.10. Results on reaction times for Exemplar 3 in Study 1. The number of observations per cell isindicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation.

    means SDs

    Study n low medium high low medium high F p(F ) V

    study1.ex3.reac 60/3 4733 5056 5361 375 491 529 0.0049 0.9445 8.8874

    5.3 Remarks

    Left-tail probabilities and overall evidential values are allowed to grow more extreme when thenumber of independent samples increases. When the number of independent samples is low, theweight of evidence shifts towards evidential values for individual (sub)studies. There are multiple(sub)studies with a substantial (lower-bound for the) evidential value (studies regarding likingratings and the reactions times of Study 1). These evidential values imply the presence of a depen-dence structure between test persons. The evidence for low scientific veracity of this publication isconsidered strong according to the criterion of Section 1.4.

  • 36 5 JF.D12.JESP

    l

    l

    l

    4400

    4800

    5200

    5600

    global control local

    Study 1: Exemplar 3, reaction times

    Fig. 5.9. Trend line for the reaction times on Exemplar 3 in Study 1. The error bars represent one standarddeviation from the cell mean.

  • 6JF.D12.SPPS

    Publication Investigated

    Forster, J. and Denzler, M. (2012). Sense creative! The impact of global and local vision, hear-ing, touching, tasting and smelling on creative and analytic thought. Social Psychological andPersonality Science, 3: 108117.

    6.1 Synopsis

    This publication was also investigated in the Whistleblower Report (2012) and in Klaassen (2015).It features 12 studies. Studies 6 to 10b feature participant scores (analytic performance) as wellas expert ratings (on participant creativity). The expert ratings actually imply nested data (par-ticipants rated by experts). However, they will be evaluated from the perspective of a secondarydependent variable. The participant scores and expert ratings are treated separately in the con-struction of sets of independent samples. Table 6.1 provides an overview of the design of thestudies.

    Table 6.1. Design studies.

    Study Design Dependent variables

    1 3 between 12 3 between 13 3 between 14 3 between 15 3 between 16 3 between 27 3 between 28 3 between 29a 3 between 29b 3 between 210a 3 between 210b 3 between 2

  • 38 6 JF.D12.SPPS

    Fig. 6.1. Trend lines for the participant scores. The error bars represent one standard deviation from thecell mean.

  • 6.2 Results 39

    6.2 Results

    6.2.1 Participant scores

    Trend lines for the independent samples regarding participant scores can be found in Figure 6.1.The trend lines indicate very consistent linear effects. They may hint that (at least for the samplesizes reported) the variation in group means may deviate too little from linearity given the spreadreported for the respective conditions. Table 6.2 lists the corresponding data (cell sizes, cell meansand corresponding standard deviations; see Section 6.3 for information on how these standarddeviations were obtained) as well as the corresponding results on the F test and the evidentialvalue.

    Table 6.2. Results on participant scores. The number of observations per cell is indicated by n, p(F )denotes the p-value of the F test, SD = standard deviation, NaN = not a number, ana = analyticperformance.

    means SDs

    Study n low control high low control high F p(F ) V

    study1 60/3 2.47 3.04 3.68 1.21 0.72 0.68 0.0200 0.8879 3.9228study2 60/3 2.51 2.95 3.35 0.71 0.49 0.64 0.0139 0.9067 4.6815study3 60/3 2.40 2.90 3.45 0.86 0.51 0.80 0.0152 0.9022 4.2635study4 60/3 2.41 2.98 3.64 1.07 0.51 0.95 0.0351 0.8520 2.71732.7184study5 60/3 2.14 2.82 3.41 1.20 0.78 0.71 0.0317 0.8592 3.2118study6.ana 60/3 1.00 1.75 2.50 0.86 1.21 1.20 0.0000 1.0000 7.8744NaNstudy7.ana 60/3 0.95 1.75 2.50 1.10 1.21 1.10 0.0064 0.9363 7.8271study8.ana 60/3 0.85 1.65 2.35 0.93 1.09 1.31 0.0265 0.8712 3.7232study9a.ana 60/3 0.75 1.50 2.15 0.85 1.19 0.81 0.0358 0.8506 3.08273.6508study9b.ana 45/3 1.13 2.00 2.80 1.13 1.00 0.94 0.0116 0.9146 5.5861study10a.ana 60/3 0.95 1.70 2.40 1.00 1.30 0.99 0.0068 0.9345 4.54468.0421study10b.ana 45/3 0.93 1.73 2.67 0.70 1.28 0.98 0.0476 0.8284 2.70843.2234

    The p-values for the F test are consistently high (all are above .8), while under the nullhypothesis of perfect linearity in the population these p-values (by definition) are expected to beuniformly distributed between 0 and 1. Employing Fishers method in combining these p-valuesgives a left-tail probability of 1 2.079375e-8. Thus, the accumulation of tests on the similar nullhypotheses of linearity favors the shared null hypothesis too strongly. Or, roughly speaking, underthe assumption of perfect linearity in the population, the probability of finding results at least asconsistent w.r.t. linearity amounts to 1 in 48, 091, 374.

    The instances of NaN for the evidential value in Table 6.2 are due to divisions by 0 (in itscalculation). In a sense, one could conceive of as being a lower-bound to NaN in these instances.The lions share of evidential values in Table 6.2 lies above 3. The overall V has a lower-bound of33, 235, 148.

    6.2.2 Expert ratings (on creativity)

    Trend lines for the independent samples regarding expert ratings can be found in Figure 6.2.These trend lines also display very consistent linear effects. Table 6.3 lists the corresponding dataand results on the F test and the evidential value. Again, consistently high p-values for theF test and substantial (lower-bounds for) evidential values are found. Fishers method gives aleft-tail probability of 1 6.053435e-6 .9999939, giving a probability of finding results at leastas consistent w.r.t. linearity of approximately 1 in 165, 196. The overall V has a lower-bound of127, 200.2

  • 40 6 JF.D12.SPPS

    Fig. 6.2. Trend lines for the expert ratings. The error bars represent one standard deviation from the cellmean.

    Table 6.3. Results on expert ratings. The number of observations per cell is indicated by n, p(F ) denotesthe p-value of the F test, SD = standard deviation, NaN = not a number, cre = creativity rating.

    means SDs

    Study n low control high low control high F p(F ) V

    study6.cre 60/3 3.19 4.01 4.79 1.07 1.21 0.82 0.0049 0.9446 4.94769.4121study7.cre 60/3 2.63 3.73 4.73 1.49 1.21 1.55 0.0164 0.8985 4.4324study8.cre 60/3 2.87 3.83 4.79 1.24 1.09 1.53 0.0000 1.0000 13.9492NaNstudy9a.cre 60/3 2.35 3.66 4.76 1.01 1.19 1.71 0.0823 0.7753 2.0960study9b.cre 45/3 2.55 3.72 4.78 1.16 1.00 1.47 0.0201 0.8878 3.9481study10a.cre 60/3 2.66 3.69 4.81 1.21 1.30 1.54 0.0147 0.9041 4.9429study10b.cre 45/3 2.42 3.73 5.02 0.82 1.28 1.45 0.0007 0.9793 10.166123.9234

  • 6.3 Remarks 41

    6.3 Remarks

    Note that the JF.D12.SPPS publication was also investigated in the Whistleblower Report (2012)and in Klaassen (2015). The results on the F test and the evidential value reported above concurwith these references. Also note that the standard deviations in Tables 6.2 and 6.3 were not obtainedfrom the JF.D12.SPPS publication (as they were not reported). These standard deviations wereobtained from the Whistleblower Report (2012). This report indicates that standard deviationsand cell sizes were communicated by JF through email.

    The reported left-tail probabilities obtained by Fishers method differ from the WB report dueto differing sets of independent samples (here, participant scores and expert ratings are demar-cated). The overall results are, however, qualitatively similar: They convey that the linear patternseems too consistent. In addition, the evidential values imply the presence of a dependence structurebetween test persons. Comparison to the results obtained on the control publications may furtherstrengthen the notion of deviance of the results reported in JF.D12.SPPS. The evidence for lowscientific veracity of this publication is considered strong according to the criterion of Section 1.4.

  • 7JF.EO09.PSPB

    Publication Investigated

    Forster, J., Epstude, K., and Ozelsel, A. (2009). Why love has wings and sex has not: How remindersof love and sex influence creative and analytic thinking. Personality and Social Psychology Bulletin,35: 14791491.

    7.1 Synopsis

    This publication features 2 studies. Study 1 has 2 dependent variables: analytic and creative taskperformance (all participant scores). Study 2 has 3 dependent variables: analytic and global\localprocessing task performance (participant scores), and creative task performance (expert ratings).Table 7.1 provides an overview of the design of the studies. The purpose of Study 2 was to replicateStudy 1 with subliminal instead of supraliminal priming.

    Table 7.1. Design studies.

    Study Design Dependent variables

    1 3 between 22 3 between 3

    7.2 Results

    7.2.1 Analytic task

    Trend lines for the analytic task can be found in Figure 7.1. The trend lines convey very lineareffects. Table 7.2 lists the corresponding data (cell sizes, cell means and corresponding standarddeviations) as well as the corresponding results on the F test and the evidential value.

    The p-values for the F test are high. Employing Fishers method in combining these p-valuesgives a left-tail probability of 1 0.01092307 = .9890769. Roughly speaking, under the assumptionof perfect linearity in the population, the probability of finding results at least as consistent w.r.t.linearity amounts to approximately 1 in 92. The listed (lower-bound to the) evidential value forstudy2.ana is substantial. The overall V has a lower-bound of 33.39086.

  • 44 7 JF.EO09.PSPB

    Table 7.2. Results on the analytic task. The number of observations per cell is indicated by n, p(F )denotes the p-value of the F test, SD = standard deviation, ana = analytic task.

    means SDs

    Study n low control high low control high F p(F ) V

    study1.ana 60/3 1.55 2.1 2.70 0.83 0.60 1.10 0.0111 0.9166 4.9937study2.ana 60/3 0.80 1.5 2.25 1.06 0.95 1.25 0.0070 0.9338 6.68666.8333

    Fig. 7.1. Trend lines for the analytic task. The error bars represent one standard deviation from the cellmean.

    7.2.2 Creative task

    Trend lines for the creative task can be found in Figure 7.2. Again, the trend lines convey quitelinear effects. Table 7.3 lists the corresponding data and results on the F test and the evidentialvalue. Employing Fishers method in combining the F p-values gives a left-tail probability of1 0.03422873 = .9657713. Roughly speaking, under the assumption of perfect linearity in thepopulation, the probability of finding results at least as consistent w.r.t. linearity amounts toapproximately 1 in 29. The overall V amounts to 12.0694.

    Table 7.3. Results on the creative task. The number of observations per cell is indicated by n, p(F )denotes the p-value of the F test, SD = standard deviation, cre = creative task.

    means SDs

    Study n low control high low control high F p(F ) V

    study1.cre 60/3 0.25 0.75 1.30 0.44 0.64 0.92 0.0172 0.8960 4.4808study2.cre 60/3 3.59 4.23 4.98 1.21 0.75 0.90 0.0427 0.8371 2.6936

  • 7.3 Remarks 45

    Fig. 7.2. Trend lines for the creative task. The error bars represent one standard deviation from the cellmean.

    7.2.3 Global\local processing taskThe trend line for the global\local processing task can be found in Figure 7.3. Table 7.4 lists thecorresponding data and results. Again, a high evidential value is encountered.

    Table 7.4. Results on the global\local processing task. The number of observations per cell is indicatedby n, p(F ) denotes the p-value of the F test, SD = standard deviation, gl = global\local processingtask.

    means SDs

    Study n low control high low control high F p(F ) V

    study2.gl 60/3 26.3 33.4 40.1 8.87 8.58 9.61 0.0065 0.9358 7.3405

    7.3 Remarks

    Left-tail probabilities and overall evidential values are allowed to grow more extreme when thenumber of independent samples increases. When the number of independent samples is low, theweight of evidence shifts towards evidential values for individual (sub)studies. Two (sub)studieshave a substantial (lower-bound for the) evidential value. These evidential values imply the presenceof a dependence structure between test persons. The evidence for low scientific veracity of thispublication is considered strong according to the criterion of Section 1.4.

  • 46 7 JF.EO09.PSPB

    l

    l

    l

    20

    30

    40

    50

    lust control love

    Study 2: Global/local processing task

    Fig. 7.3. Trend line for the global\local processing task. The error bars represent one standard deviationfrom the cell mean.

  • 8JF.LS09.JEPG

    Publication Investigated

    Forster, J., Liberman, N., and Shapira, O. (2009). Preparing for novel versus familiar events: Shiftsin global and local processing. Journal of Experimental Psychology: General, 138: 383-399.

    8.1 Synopsis

    This publication features 10 experiments. The publication reports that in each experiment theparticipants were randomly assigned to experimental conditions. Table 8.1 provides an overview ofthe design of the experiments. Experiments 2 and 6 feature 4 and 2 factor levels respectively andare not analyzed here.

    Table 8.1. Design experiments.

    Experiment Design Dependent variables

    1a 3 between 2 within 11b 3 between 2 within 32 4 between 13a 3 between 13b 3 between 24a 3 between 2 between 24b 3 between 25a 3 between 15b 3 between 16 2 between 2 within 2 within 1

    8.2 Results

    8.2.1 First set of independent samples

    Trend lines for a first set of independent samples can be found in Figure 8.1. The trend lines do notconvey very consistent linear effects. Table 8.2 lists the corresponding data (cell sizes, cell meansand corresponding standard deviations) as well as the corresponding results on the F test andthe evidential value.

    Employing Fishers method in combining the p-values for the F test gives a left-tail probabilityof 1 0.1371673 = .8628327. Thus, the accumulation of tests on the similar null hypotheses of

  • 48 8 JF.LS09.JEPG

    linearity does not very strongly favor the shared null. Or, roughly speaking, under the assumptionof perfect linearity in the population, the probability of finding results at least as consistent w.r.t.linearity amounts to approximately 1 in 7. The overall V is found to be 591.6182 which incomparison with the overall V reported in preceding chapters is not to be deemed high. However,for Experiments 5a and 5b the evidential values are substantial (above 7).

    Table 8.2. Results on the first set of independent samples. The number of observations per cell is indicatedby n, p(F ) denotes the p-value of the F test, SD = standard deviation, glob = global letters, RT =reaction times, GCT = gestalt completion task, PA = positive valence, atypical exemplars, NA = negativevalence, atypical exemplars, Aty = atypical exemplars.

    means SDs

    Study n low medium high low medium high F p(F ) V

    exp1a.glob 45/3 668.00 698.00 756.00 70.00 123.00 156.00 0.1325 0.7176 1.7891exp1b.globRT 48/3 496.00 526.00 582.00 78.00 86.00 104.00 0.2226 0.6394 1.4125exp3a 42/3 6.50 7.50 8.80 1.90 1.20 0.08 0.1246 0.7260 1.7696exp3b.GCT 53/3 6.30 7.50 8.60 0.24 0.11 0.11 1.0799 0.3037 1.0000exp4a.PA 72/6 2.08 2.48 3.33 0.76 0.36 0.99 0.7201 0.4022 1.0000exp4a.NA 72/6 1.91 2.70 2.95 0.35 0.49 0.49 2.9029 0.0978 1.0000exp4b.Aty 36/3 2.19 2.44 2.74 0.47 0.22 0.26 0.0445 0.8342 2.5083exp5a 60/3 0.02 0.13 0.25 0.23 0.22 0.20 0.0071 0.9333 7.2848exp5b 42/3 0.00 0.14 0.29 0.19 0.18 0.18 0.0069 0.9340 7.2405

    Fig. 8.1. Trend lines for the first set of independent samples. The error bars represent one standarddeviation from the cell mean.

  • 8.2 Results 49

    Fig. 8.2. Trend lines for the second set of independent samples. The error bars represent one standarddeviation from the cell mean.

    8.2.2 Second set of independent samples

    Trend lines for a second set of independent samples can be found in Figure 8.2. Table 8.3 lists thecorresponding data and the corresponding results on the F test and the evidential value. Fishersmethod gives a left-tail probability of 10.008662376 = 0.9913376, roughly implying a probabilityof finding results at least as consistent w.r.t. linearity of approximately 1 in 115. The overall Vhas a lower-bound of 137.9026. At least 2 reported experiments have substantial (lower-bounds forthe) evidential values (exp4a.PT and exp4b.Typ).

    Table 8.3. Results on the second set of independent samples. The number of observations per cell isindicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation, loc = local, RT =reaction times, Gen = general knowledge task, PT = positive valence, typical exemplars, NT = negativevalence, typical exemplars, Typ = typical exemplars.

    means SDs

    Study n low medium high low medium high F p(F ) V

    exp1a.loc 45/3 775.00 816.00 912.00 139.00 103.00 112.00 0.5342 0.4689 1.0533exp1b.locRT 48/3 583.00 596.00 631.00 77.00 54.00 86.00 0.2384 0.6277 1.2721exp3b.Gen 53/3 5.10 5.50 5.80 0.15 2.40 0.20 0.0152 0.9025 1.07946.9625exp4a.PT 72/6 6.91 7.03 7.12 0.55 0.51 0.65 0.0055 0.9414 6.53897.7784exp4a.NT 72/6 6.95 7.07 7.29 0.53 0.41 0.31 0.1101 0.7422 1.8992exp4b.Typ 36/3 6.66 6.71 6.75 0.31 0.22 0.21 0.0032 0.9554 7.677610.1341

  • 50 8 JF.LS09.JEPG

    8.2.3 Remaining dependent variables Experiment 1b

    Trend lines for the remaining dependent variables on Experiment 1b can be found in Figure 8.3.Table 8.4 lists the corresponding data and results (note that these do not constitute independentsamples). One conjunction of dependent variable and within factor-level (exp1b.locER) is found tohave a substantial evidential value (6.0519).

    Table 8.4. Results on the remaining dependent variables of Experiment 1b. The number of observationsper cell is indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation, glob =global, loc = local, ER = (mean no. of) errors, PR = measure of recognition accuracy.

    means SDs

    Study n low medium high low medium high F p(F ) V

    exp1b.globER 48/3 0.19 0.31 0.81 0.40 0.48 1.52 0.4277 0.5164 1.0331exp1b.locER 48/3 0.56 1.00 1.38 0.89 0.89 1.31 0.0087 0.9260 6.0519exp1b.globPR 48/3 0.43 0.56 0.78 0.20 0.26 0.20 0.4390 0.5110 1.1999exp1b.locPR 48/3 0.33 0.41 0.56 0.23 0.25 0.27 0.2082 0.6504 1.4739

    Fig. 8.3. Trend lines for the remaining dependent variables of Experiment 1b. The error bars representone standard deviation from the cell mean.

  • 8.2 Results 51

    8.2.4 Follow-up: Collapsing atypical exemplar ratings of Experiment 4a over valence

    Experiment 4a has a 3 between (priming: novelty, control, oldness) 2 between (valence of priming:positive, negative) design with 2 dependent variables (Typicality ratings for atypical exemplars andtypicality ratings for typical exemplars). From Table 8.2 it can be seen that the evidential valuesfor the atypical exemplar ratings of Experiment 4a are low. In the positive valence of primingcondition a V of 1 is found. In the negative valence of priming condition V is also found to be 1.JF.LS09.JEPG also reports on pooled results, where the atypical exemplar ratings of Experiment4a are collapsed over the valence factor. Reviewing these pooled results, a different picture emerges.Figure 8.4 gives the trend line. Table 8.5 lists the corresponding data and results. The positive andnegative valence effects seem to cancel out into a very linear effect that sorts a substantive V of4.1277.

    Table 8.5. Results when collapsing experiment 4a over the valence factor. The number of observationsper cell is indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation, Atyp =atypical exemplars, coll = collapsed.

    means SDs

    Study n low medium high low medium high F p(F ) V

    exp4a.Atyp.coll 72/3 2 2.59 3.14 0.6 0.43 0.8 0.0162 0.8991 4.1277

    l

    l

    l

    2

    3

    4

    oldness control novelty


Recommended