Date post: | 27-May-2017 |
Category: |
Documents |
Upload: | nickcupolo |
View: | 216 times |
Download: | 0 times |
Replicability & Questionable Research Practices
Readings on Sakai:Pashler & Harris, 2012; Simmons et al., 2011
Replication Replicability: the degree to which similar results are
obtained if a study is repeated Exact replication:
Repeat study using the same methods as exactly as possible. Rare; difficult to publish (publication bias for novel research)
Conceptual Replication: Use slightly different methods (e.g., measures,
manipulations, sample) to test the same hypotheses Very common (virtually required in some top-tier journals)
Replication-plus-extension: Add something to extend the results (e.g., another condition). Can help show that the results 1) replicate and 2) generalize
Example of a Direct Replication:Bargh, Chen, & Burrons (1996)
“Priming” a feature of a stereotyped group can yield behaviour that is consistent with the stereotype
They used words in a scrambled sentence task to prime an “elderly” stereotype (e.g., old, wise, sentimental, bingo, retired, wrinkle) or neutral words (e.g., thirsty, clean, private)
After the study “ended,” they timed participants as they walked down the hallway.
Example of a Direct Replication:Bargh, Chen, & Burrons (1996)
Example of a Conceptual Replication:Elliot, Maier, Moller, Friedman, & Meinhardt (2007) Examined the effects of presenting the color
red on performance across 6 experiments All studies used the same variables at the
abstract (conceptual) level, but differed at the operational level
Experiment 1: IV: ID number was written on page in red, green,
or black ink DV: Number of anagrams solved correctly
Example of a Conceptual Replication:Elliot, Maier, Moller, Friedman, & Meinhardt (2007)
Experiment 2: IV: Cover page was page in red, green, or white DV: Number of correct analogy items on IQ test
Example of a Conceptual Replication:Elliot, Maier, Moller, Friedman, & Meinhardt (2007)
The “replicability crisis” in psychology “Crisis of confidence” in psychology this
decade: High-profile fraud cases (e.g., Diederik Stapel, Dirk
Smeesters) Report that psychologists are reluctant to share
data for reanalysis (Wicherts, Bakker, & Molenaar, 2011)
Focus on questionable research practices (Simmons, Nelson, & Simonsohn, 2011)
Widely ridiculed publication showing extrasensory perception effects (Bem, 2011) that failed to replicate (Ritchie, Wiseman, & French, 2012)
How many results in psychology would replicate?
Extransensory Perception Studies Bem (2011), reported 9 experiments supporting
precognition of events before they occurred Examined well-known psychological effects… … “time-reversed” (measure outcome before
manipulation) Example (experiment 9):
Presented list of words serially Type all words they could remember After typing the words, they practiced a randomly
selected half of the words Result: Participants recalled significantly more words
that they practiced (vs. control words), t(49) = 2.96, p = .002, d = 0.42.
Stephen Colbert's summary
Extransensory Perception Studies Bem (2011) was published in the Journal of
Personality and Social Psychology, a “top-tier” journal
The same journal subsequently rejected a manuscript that failed to replicate the finding (JPSP does not publish replications)
Ritchie, Wiseman, & French (2012) failed to replicate Bem’s Experiment 9 across 3 pre-registered direct replications
What does a failure to replicate mean? Failures to replicate are ambiguous! They could represent:
Differences in methods (measures, setting, sample, etc.)
Random variation across samples Mistakes made during data collection
The failure to replicate could be a Type II error (or the original study could be a Type I error)
Extransensory Perception Studies Should Bem have been published? Editorial (Judd & Gawronski, 2011):
“We openly admit that the reported findings conflict with our own beliefs about causality and that we find them extremely puzzling. Yet, as editors we were guided by the conviction that this paper—as strange as the findings may be—should be evaluated just as any other manuscript on the basis of rigorous peer review. Our obligation as journal editors is not to endorse particular hypotheses but to advance and stimulate science through a rigorous review process.” (abstract)
“Is the Replicability Crisis Overblown?”(Pashler & Harris, 2012) Choosing a=.05 does not mean the risk of a Type 1 error is 5%
How many of the effects that we examine actually exist? How much power do we have to detect those effects?
Direct replications are rare and conceptual replications are problematic
Science is not always self-correcting
The “replicability crisis” How many results in psychology would replicate?
We don’t know! (File drawer effect) The “Reproducibility Project” (Open Science Collaboration,
2012) Large-scale (>150 scientists) attempt to replicate studies Currently replicating studies from 3 prominent psychology
journals from 2008 What is the overall rate of replicability in psychology? What predicts replicability of studies?
Is this “crisis” unique to psychology? Begley and Ellis (2012) attempted to replicate 53
papers in top journals on cancer research They focused on “new” (unreplicated) results They did not replicate 47 (89%) of those studies
Questionable Research Practices
“Homework Assignment” Pretend that you MUST obtain a statistically
significant result in your group project at any cost.
Try to change your analyses to find a statistically significant result. For instance, you could: Exclude participants for any reason Add control variables Change scores that look unusual
The significant result does not need to be relevant to your hypotheses
Could you write a paper that makes sense of this significant result?
Questionable Research Practices Practices in the collection, analysis, and
reporting of results that inflate the risk of making a Type I error
False positive (Type I error): Incorrect rejection of a null hypothesis
More common (and perhaps less problematic) than outright fraud
“False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis…”(Simmons, Nelson, & Simonsohn, 2011) False-positives (Type I errors) are problematic!
Persistent because failures to replicate are not conclusive and are usually not published
Inspire future research that may waste resources Using a conservative a (e.g., a = .05) does not
solve those problems Researcher degrees of freedom
Decisions made during data collection, analysis, and reporting
Can yield significant results, but inflate Type 1 error rates
Researcher degrees of freedom inflate type 1 error rates (Simmons et al., 2011)
Simulations using randomly generated data Proportion of significant results (Type 1 errors)
Checking data and adding subjects if p > .05 inflates type 1 error rates (Simmons et al., 2011)
Checking data and adding subjects if p > .05 inflates type 1 error rates (Simmons et al., 2011)
How common are questionable research practices? (John, Loewenstein, & Prelec, 2012) Sent anonymous survey to 5,964 academic
psychologists at U.S. universities; 2,155 (36%) responded
Have you done this? Is it defensible? (0=no, 1=possible, 2=yes)
Item Admission rate (%)
Mean defensibility
Falsifying data 0.6 0.16Wrongly claiming results are unaffecting by demographic variables
3.0 1.32
Reporting unexpected finding as having been predicted from the start
27.0 1.50
Deciding whether to exclude data after looking at the impact on results
38.2 1.61
Stopping data collection earlier than planned because a result is significant
15.6 1.76
Failing to report all conditions in a study
27.7 1.77
Deciding whether to collect more data after checking results
55.9 1.79
How common are questionable research practices? (John, Loewenstein, & Prelec, 2012)
This study had several methodological limitations: Initial response rate of 36% 33% of participants dropped out of the survey
before finishing Some participants argued that the questions were
worded in a biased manner (e.g., Norbert Schwartz, 2012, listserv posting)
Some “QRP”s may be justifiable in some contexts
Another approach to identifying false-positives: p-values Masciampo & Lalande (2012) “A peculiar
prevalence of p values just below .05 Examined p-values from three prominent journals:
JEPG, JPSP, PS Collected 3,627 p values between .01 and .10
from 36 issues With “real” effects, you expect relatively low
p-values (an “exponential” curve of p-values In reality, there are more p-values just
below .05 than would be expected by chance
Another approach to identifying false-positives: p-values (Masciampo & Lalande, 2012)
Frequency
“P-curve: A key to the file drawer”(Simonsohn, Nelson, & Simmons, 2013)
“P-hacking:” Engaging in questionable research procedures in order to reduce the p-value to under .05
The shape of the distribution of p-values (the “p-curve”) can help to identify “p-hacking”
With a large effect size, p-hacking should not matter much
With a small effect or no effect, p-hacking will lead to more p-values just under .05
P-curves for different effect sizes with and without p-hacking (Simonsohn, Nelson, & Simmons, 2013)
How can we minimize false positives?(Simmons et al., 2011)Guidelines for authors:1. Decide rule for terminating data collection
before data collection begins and report that rule
2. Collect at least 20 observations per sell or justification
3. List all variables collected in a study4. Report all experimental conditions5. If observations are eliminated, report results
with and without those observations6. If covariates are included, report results
with and without the covariate
How can we minimize false positives?(Simmons et al., 2011)
Guidelines for journal reviewers:1. Ask authors to follow previous requirements2. Be tolerant of imperfections in results3. Ask authors to show that results are robust
(vs. hinging on a very specific type of analysis)
4. In some cases, require an exact replication
Additional guidelines for increasing replicability (Asendorpf et al., 2013) Increase sample size Increase reliability of measures Choose study designs that minimize error variance:
Clear and standardized instructions Use controlled conditions Design strong manipulations
Use appropriate statistical methods Test and address assumptions Control for covariates, when justified
Avoid multiple underpowered studies Publish all relevant information (materials, sample
size justifications, etc.)
AnyQuestion
s?