Replicability+and+Questionable+Research+Practices

Replicability & Questionable Research Practices

Readings on Sakai:Pashler & Harris, 2012; Simmons et al., 2011

Replication Replicability: the degree to which similar results are

obtained if a study is repeated Exact replication:

Repeat study using the same methods as exactly as possible. Rare; difficult to publish (publication bias for novel research)

Conceptual Replication: Use slightly different methods (e.g., measures,

manipulations, sample) to test the same hypotheses Very common (virtually required in some top-tier journals)

Replication-plus-extension: Add something to extend the results (e.g., another condition). Can help show that the results 1) replicate and 2) generalize

Example of a Direct Replication:Bargh, Chen, & Burrons (1996)

“Priming” a feature of a stereotyped group can yield behaviour that is consistent with the stereotype

They used words in a scrambled sentence task to prime an “elderly” stereotype (e.g., old, wise, sentimental, bingo, retired, wrinkle) or neutral words (e.g., thirsty, clean, private)

After the study “ended,” they timed participants as they walked down the hallway.

Example of a Direct Replication:Bargh, Chen, & Burrons (1996)

Example of a Conceptual Replication:Elliot, Maier, Moller, Friedman, & Meinhardt (2007) Examined the effects of presenting the color

red on performance across 6 experiments All studies used the same variables at the

abstract (conceptual) level, but differed at the operational level

Experiment 1: IV: ID number was written on page in red, green,

or black ink DV: Number of anagrams solved correctly

Example of a Conceptual Replication:Elliot, Maier, Moller, Friedman, & Meinhardt (2007)

Experiment 2: IV: Cover page was page in red, green, or white DV: Number of correct analogy items on IQ test

Example of a Conceptual Replication:Elliot, Maier, Moller, Friedman, & Meinhardt (2007)

The “replicability crisis” in psychology “Crisis of confidence” in psychology this

decade: High-profile fraud cases (e.g., Diederik Stapel, Dirk

Smeesters) Report that psychologists are reluctant to share

data for reanalysis (Wicherts, Bakker, & Molenaar, 2011)

Focus on questionable research practices (Simmons, Nelson, & Simonsohn, 2011)

Widely ridiculed publication showing extrasensory perception effects (Bem, 2011) that failed to replicate (Ritchie, Wiseman, & French, 2012)

How many results in psychology would replicate?

Extransensory Perception Studies Bem (2011), reported 9 experiments supporting

precognition of events before they occurred Examined well-known psychological effects… … “time-reversed” (measure outcome before

manipulation) Example (experiment 9):

Presented list of words serially Type all words they could remember After typing the words, they practiced a randomly

selected half of the words Result: Participants recalled significantly more words

that they practiced (vs. control words), t(49) = 2.96, p = .002, d = 0.42.

Stephen Colbert's summary

http://thecolbertreport.cc.com/videos/bhf8jv/time-traveling-porn---daryl-bem?xrs=share_copy

Extransensory Perception Studies Bem (2011) was published in the Journal of

Personality and Social Psychology, a “top-tier” journal

The same journal subsequently rejected a manuscript that failed to replicate the finding (JPSP does not publish replications)

Ritchie, Wiseman, & French (2012) failed to replicate Bem’s Experiment 9 across 3 pre-registered direct replications

What does a failure to replicate mean? Failures to replicate are ambiguous! They could represent:

Differences in methods (measures, setting, sample, etc.)

Random variation across samples Mistakes made during data collection

The failure to replicate could be a Type II error (or the original study could be a Type I error)

Extransensory Perception Studies Should Bem have been published? Editorial (Judd & Gawronski, 2011):

“We openly admit that the reported findings conflict with our own beliefs about causality and that we find them extremely puzzling. Yet, as editors we were guided by the conviction that this paper—as strange as the findings may be—should be evaluated just as any other manuscript on the basis of rigorous peer review. Our obligation as journal editors is not to endorse particular hypotheses but to advance and stimulate science through a rigorous review process.” (abstract)

“Is the Replicability Crisis Overblown?”(Pashler & Harris, 2012) Choosing a=.05 does not mean the risk of a Type 1 error is 5%

How many of the effects that we examine actually exist? How much power do we have to detect those effects?

Direct replications are rare and conceptual replications are problematic

Science is not always self-correcting

The “replicability crisis” How many results in psychology would replicate?

We don’t know! (File drawer effect) The “Reproducibility Project” (Open Science Collaboration,

2012) Large-scale (>150 scientists) attempt to replicate studies Currently replicating studies from 3 prominent psychology

journals from 2008 What is the overall rate of replicability in psychology? What predicts replicability of studies?

Is this “crisis” unique to psychology? Begley and Ellis (2012) attempted to replicate 53

papers in top journals on cancer research They focused on “new” (unreplicated) results They did not replicate 47 (89%) of those studies

Questionable Research Practices

“Homework Assignment” Pretend that you MUST obtain a statistically

significant result in your group project at any cost.

Try to change your analyses to find a statistically significant result. For instance, you could: Exclude participants for any reason Add control variables Change scores that look unusual

The significant result does not need to be relevant to your hypotheses

Could you write a paper that makes sense of this significant result?

Questionable Research Practices Practices in the collection, analysis, and

reporting of results that inflate the risk of making a Type I error

False positive (Type I error): Incorrect rejection of a null hypothesis

More common (and perhaps less problematic) than outright fraud

“False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis…”(Simmons, Nelson, & Simonsohn, 2011) False-positives (Type I errors) are problematic!

Persistent because failures to replicate are not conclusive and are usually not published

Inspire future research that may waste resources Using a conservative a (e.g., a = .05) does not

solve those problems Researcher degrees of freedom

Decisions made during data collection, analysis, and reporting

Can yield significant results, but inflate Type 1 error rates

Researcher degrees of freedom inflate type 1 error rates (Simmons et al., 2011)

Simulations using randomly generated data Proportion of significant results (Type 1 errors)

Checking data and adding subjects if p > .05 inflates type 1 error rates (Simmons et al., 2011)

Checking data and adding subjects if p > .05 inflates type 1 error rates (Simmons et al., 2011)

How common are questionable research practices? (John, Loewenstein, & Prelec, 2012) Sent anonymous survey to 5,964 academic

psychologists at U.S. universities; 2,155 (36%) responded

Have you done this? Is it defensible? (0=no, 1=possible, 2=yes)

Item Admission rate (%)

Mean defensibility

Falsifying data 0.6 0.16Wrongly claiming results are unaffecting by demographic variables

3.0 1.32

Reporting unexpected finding as having been predicted from the start

27.0 1.50

Deciding whether to exclude data after looking at the impact on results

38.2 1.61

Stopping data collection earlier than planned because a result is significant

15.6 1.76

Failing to report all conditions in a study

27.7 1.77

Deciding whether to collect more data after checking results

55.9 1.79

How common are questionable research practices? (John, Loewenstein, & Prelec, 2012)

This study had several methodological limitations: Initial response rate of 36% 33% of participants dropped out of the survey

before finishing Some participants argued that the questions were

worded in a biased manner (e.g., Norbert Schwartz, 2012, listserv posting)

Some “QRP”s may be justifiable in some contexts

Another approach to identifying false-positives: p-values Masciampo & Lalande (2012) “A peculiar

prevalence of p values just below .05 Examined p-values from three prominent journals:

JEPG, JPSP, PS Collected 3,627 p values between .01 and .10

from 36 issues With “real” effects, you expect relatively low

p-values (an “exponential” curve of p-values In reality, there are more p-values just

below .05 than would be expected by chance

Another approach to identifying false-positives: p-values (Masciampo & Lalande, 2012)

Frequency

“P-curve: A key to the file drawer”(Simonsohn, Nelson, & Simmons, 2013)

“P-hacking:” Engaging in questionable research procedures in order to reduce the p-value to under .05

The shape of the distribution of p-values (the “p-curve”) can help to identify “p-hacking”

With a large effect size, p-hacking should not matter much

With a small effect or no effect, p-hacking will lead to more p-values just under .05

P-curves for different effect sizes with and without p-hacking (Simonsohn, Nelson, & Simmons, 2013)

How can we minimize false positives?(Simmons et al., 2011)Guidelines for authors:1. Decide rule for terminating data collection

before data collection begins and report that rule

2. Collect at least 20 observations per sell or justification

3. List all variables collected in a study4. Report all experimental conditions5. If observations are eliminated, report results

with and without those observations6. If covariates are included, report results

with and without the covariate

How can we minimize false positives?(Simmons et al., 2011)

Guidelines for journal reviewers:1. Ask authors to follow previous requirements2. Be tolerant of imperfections in results3. Ask authors to show that results are robust

(vs. hinging on a very specific type of analysis)

4. In some cases, require an exact replication

Additional guidelines for increasing replicability (Asendorpf et al., 2013) Increase sample size Increase reliability of measures Choose study designs that minimize error variance:

Clear and standardized instructions Use controlled conditions Design strong manipulations

Use appropriate statistical methods Test and address assumptions Control for covariates, when justified

Avoid multiple underpowered studies Publish all relevant information (materials, sample

size justifications, etc.)

AnyQuestion

s?

Date post:	27-May-2017
Category:	Documents
Upload:	nickcupolo
View:	216 times
Download:	0 times

Replicability+and+Questionable+Research+Practices

Documents