Thinking About Data:
A Simple Principle to Help You Improve your Scientific Data Analysis
Scott A. Venners, Ph.D., MPH
November 13, 20031
PowerPoint slides available at:
www.artima.com/AMU/lecture.ppt
(Try tomorrow)
2
Classes First Data Set
?
3
Y = Outcome Variable
X = Predictor of Interest
Cov1…N = Potential Confounders (Covariates)
Y = X + Cov1 + Cov2 + Cov3 + … + Cov(n)
X p-value <0.05?
Yes - Write a paper.
4
Simple Principle:
1. Your model only represents one possible explanation of data.
2. You must actively think of all possible alternative explanations and test them.
3. Those that are not testable define the uncertainty of your analysis.
5
(Can Test)
Possible Explanations of Data
(Cannot Test)
6
(Can Test)
Possible Explanations of Data
(Cannot Test)
7
(Can Test)
Possible Explanations of Data
(Cannot Test)
8
Possible Explanations of Data
(Cannot Test)Model
9
(Can Test)
Do not stop here!
(Cannot Test)
Model
10
Skills you need:
1. Thinking of possible explanations
2. Knowing how to test them.
11
Example 1: Simple model.
Skill: Visualizing Confounding
12
Example 1: Does an inactive lifestyle increase the risk of low bone density?
= Inactive Lifestyle
= Active Lifestyle13
Inactive LifestyleActive Lifestyle
14
Inactive LifestyleActive Lifestyle
= Inactive Lifestyle
= Active Lifestyle
= Low Bone Density
15
Inactive LifestyleActive Lifestyle
What else could cause this result?
Female, Smoking, Excessive Alcohol, Old Age…
16
Inactive LifestyleActive Lifestyle
Female Smoking Ex Alcohol Old Age
Active Lifestyle Inactive Lifestyle
49% 21%
1% 30%
51% 19%
1% 50%
17
Inactive LifestyleActive Lifestyle
Is the association between inactive lifestyle and low bone density confounded by old age?
30% Old Age 50%
18
Inactive LifestyleActive Lifestyle
Is the association between inactive lifestyle and low bone density confounded by old age?
30% Old Age 50%
No 19
Older Age
Lo
w B
on
e D
ensi
ty
Active Inactive
30%
50%
Younger AgeL
ow
Bo
ne
Den
sity
Active Inactive
30%
50%
20
Inactive LifestyleActive Lifestyle
Is the association between inactive lifestyle and low bone density confounded by old age?
Yes
30% Old Age 50%
21
Lo
w B
on
e D
ensi
ty
Active Inactive
0% 0%
Younger Age
Lo
w B
on
e D
ensi
ty
Active Inactive
100% 100%
Older Age
22
Independent Effect(s)
Active (0)
Inactive (1)
Active (0)
Inactive (1)
Lo
w B
on
e D
en
sit
y
10% 30%10% 30%
Inactive Only
10 + 0(Old) + 20(Inactive)
Older Age (1)Younger Age (0)
23
Independent Effect(s)
Older Age Only
10 + 20(Old) + 0(Inactive)Lo
w B
on
e D
en
sit
y
30% 30%10% 10%
Lo
w B
on
e D
en
sit
y
10% 30%10% 30%
Inactive Only
10 + 0(Old) + 20(Inactive)
Older Age (1)Younger Age (0)
Active (0)
Inactive (1)
Active (0)
Inactive (1)
24
Independent Effect(s)
Both Older Age and Inactive
10 + 20(Old) + 20(Inactive)
Older Age (1)L
ow
Bo
ne
De
ns
ity
30% 50%
Younger Age (0)
Older Age Only
10 + 20(Old) + 0(Inactive)Lo
w B
on
e D
en
sit
y
30% 30%10% 10%
10% 30%
Lo
w B
on
e D
en
sit
y
10% 30%10% 30%
Inactive Only
10 + 0(Old) + 20(Inactive)
Active (0)
Inactive (1)
Active (0)
Inactive (1) 25
Independent Effect(s)
Both Older Age and Inactive
10 + 20(Old) + 20(Inactive)
Lo
w B
on
e D
en
sit
y
30% 50%10% 30%
Older Age (1)Younger Age (0)
Active (0)
Inactive (1)
Active (0)
Inactive (1)
26
Independent Effect(s)
Both Older Age and Inactive
10 + 20(Old) + 20(Inactive)
Lo
w B
on
e D
en
sit
y
30% 50%10% 30%
Active Inactive Active Inactive
Older Age and Inactive Interaction
10 + 20(Old) + 20(Inactive)
+ 10(Old*Inactive)
Lo
w B
on
e D
en
sit
y
30% 60%10% 30%
Older Age (1)Younger Age (0)
27
Example 2:
Sometimes just putting potential confounders into model is not correct.
28
Example 2: Does passive smoking increase the risk of chronic cough?
= Passive Smoking
= No Passive Smoking29
Passive SmokingNo Passive Smoking
30
= Passive Smoking
= No Passive Smoking
= Chronic Cough
Passive SmokingNo Passive Smoking
25% Cough 25%
31
Passive SmokingNo Passive Smoking
What else could cause this result?
Active Smoking…
25% Cough 25%
32
Passive SmokingNo Passive Smoking
Is the association between passive smoking and cough confounded by active smoking?
45% Active Smoking 17%
33
Active Smoking
Co
ug
h
No Passive Passive
47% 47%
No Active SmokingC
ou
gh 7% 20%
No Passive Passive34
Co
ug
h
No Passive Passive
47% 47%
Co
ug
h 7% 20%
No Passive Passive
How to model?Active SmokingNo Active Smoking
35
Co
ug
h
47% 47%
Co
ug
h 7% 20%
No Passive (0)
Passive (1)
How to model?Active Smoking (1)No Active Smoking (0)
No Passive (0)
Passive (1)
?Cough% = 7 + 40(Smoke) + 13(Passive) - 13(Smoke*Passive)
No36
Example 3:
Sometimes explanations for data are not so clear.
37
Husband’s current smoking
None
<20 cigs/day
>20 cigs/day
Crude Adjusted*
OR p
Ref
1.19 .429
2.18 .013
Ref
1.04 .854
1.81 .049
OR p
* Adjusted for husband and wife’s ages, education, stress, exposure to dust and noise, husband’s alcohol use, previous smoking, and exposure to toxins, and wife’s body-mass index.
Odds ratios of early pregnancy loss.
38
Husband’s current smoking
None
<20 cigs/day
>=20 cigs/day
Crude Adjusted*
OR p
Ref
1.19 .429
2.18 .013
Ref
1.14 .576
2.02 .022
OR p
If remove husband’s education from model:
39
High School 79% 59% 50%
Husband’s Smoking None <20 cigs/day >=20 cigs/day
40
20%
29%
44%
22% 21%
30%
< High School >= High School
% Early Pregnancy
Loss
Husband’s Smoking None <20 cigs/day >20 cigs/day
41
20%
29%
44%
22% 21%
30%
< High School >= High School
% Early Pregnancy
Loss
Husband’s Smoking None <20 cigs/day >20 cigs/day
High School 79% 59% 50%
42
Main Points:
No matter if you have good resultsor bad, always think beyond your preferred explanation for data.
Explore all possibilities before choosing your preferred model.
Acknowledge what you cannot testas your limitations.
43