Life After P -hacking (APS May 2013, Washington DC) With minor edits for posting

Post on 25-Feb-2016

24 views 2 download

description

Life After P -hacking (APS May 2013, Washington DC) With minor edits for posting. Photo not necessary. Leif Nelson UC Berkeley. Uri Simonsohn Penn (gave the talk). Joe Simmons Penn also. Definition. p -hacking : exploiting researchers’ degrees-of-freedom seeking p

transcript

Life After P-hacking(APS May 2013, Washington DC)

With minor edits for posting

Uri SimonsohnPenn (gave the talk)

Leif NelsonUC Berkeley

Joe SimmonsPenn also

Photo not necessary

Definition

p-hacking: exploiting researchers’ degrees-of-freedom seeking p<.05

Life after p-hacking

• n>50• Direct replications• 21 words• Compromise writing• Who to hire• What about Bayesian?

~ Median study: n=20

• False-Positive Psych: n>20

• What can you reliably detect with n=20?

• Mturk study. – N=674– Why not published ds?

n=20 is enough for:

• Men taller than womenn=6

• People above median age closer to retirementn=10

• Women, more shoes than menn=15

n=20 is not enough for:• People who like spicy food are more likely to like Indian food n = 27

• Liberals rate social equality as more important than do conservatives n = 34

• People who like eggs report eating egg salad more often n = 47

• Men weigh more than women n = 47

• Smokers think smoking is less likely to kill someone than do non-smokersn = 146

• People who like spicy food are more likely to like Indian food n = 27

• Liberals rate social equality as more important than do conservatives n = 34

• People who like eggs report eating egg salad more often n = 47

• Men weigh more than women n = 47

• Smokers think smoking is less likely to kill someone than do non-smokersn = 146

• Are you studying a bigger effect than: • Men weigh more than women?

• If not, use n>50

Life after p-hacking

• n>50• Direct replications• 21 words• Compromise writing• Who to hire• What about Bayesian?

Lion's Weight Coins Calories

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

Low HighEs

timat

e

Estimates are way off

Subjects confused?

Big outliers

Lion's Weight Coins Calories

-0.25-0.2

-0.15-0.1

-0.050

0.050.1

0.150.2

0.25

Low HighEs

timat

e

p < .03Estimates are way off

Subjects confused?

Big outliers

Calories

-0.25-0.2

-0.15-0.1

-0.050

0.050.1

0.150.2

0.25

Low HighEs

timat

e

p < .03

Study 1?

• Run calories study again.• Same exclusion rule.

Why not just conceptual replication?

• Restart p-hacking clock

• Failures do not count

Replications

• Conceptual– Rule out confounds– Rule in generalizability

• Direct– Rule out false-positive

Life after p-hacking

• n>50• Direct replications• 21 words (Google it)• Compromise writing• Who to hire• What about Bayesian?

How can an organic farmer compete?

How can an organic researcher compete?

• If you determined sample size in advanceSay it.

• If you did not drop variablesSay it.

• If you did not drop conditionsSay it.

21 Word Solution get .pdf here http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2160588

Footnote 1

We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study.

Organic Farmer Organic Researcher

Life after p-hacking

• n>50• Direct replications• 21 words • Compromise writing• Who to hire• What about Bayesian?

Compromise writing• While reviewers still in dark ages.• Have it both ways.• “Clean” version in main text

– All studies “worked” & < 2500 words• Supplement/footnote

– n=100n=150 – p=.08 w/o exclusion– Data and materials online

• Only reformers read small print• Organic 21 words applies.• Everybody likes the paper

Life after p-hacking

• n>50• Direct replications• 21 words • Compromise writing• Who to hire• What about Bayesian?

If you hire based on quantityyou pass on these guys

What’s the alternative to counting papers?

• Rookies: Best 1• Tenure: Best 3• Full: Best 5

Try it. It is a powerful question. What’s her best paper?

Life after p-hacking

• n>50• Direct replications• 21 words • Compromise writing• Who to hire• What about Bayesian? Only speak for myself here.

My prior: Bayesians will be unhappy in 3 2 1

P-hacking also invalidatesBayesian results

P-hacking also invalidatesBayesian results

Let me say that again

• Bayesian proposals for Psych1) Bayesian t-test• Replications use it sometimes • Turns out

– α = 5%

2) Bayesian estimation • Latest JEP:G . • Turns out

– Changes nothing

1%

t-test “vs” Bayesian Estimationchanges nothing

How similar?Results change by less than if we dropped 1 observation at random.

But!

• Isn’t data-peeking OK for Bayes?– Not when used for hypothesis testing

• Also:– Dropped subjects, measures, conditions invalidate all inference.

• P-hacking Bayesian stats

• Drunk driving leather seats

Good reasons to go Bayesian do not include p-hacking.

• Next slide is the last.

Life after p-hacking

• n>50• Direct replications• 21 words • Compromise writing• Who to hire• What about Bayesian? Only speak for myself here.

Leif NelsonUC Berkeley

Joe SimmonsPenn