Bias, Chance & Confounding. - Bias - Systematic deviation from the truth systematic deviation of the...

Post on 25-Dec-2015

257 views 6 download

Tags:

transcript

Bias, Chance & Confounding

- Bias - Systematic deviation from the truth

• systematic deviation of the results (from the true value) that leads to incorrect conclusions– design

– data collection

– analysis

– interpretation

– publication

– review

Consequences of bias

• underestimate or overestimate the parameter you are trying to measure– eg blood pressure, prevalence of arthritis

• incorrect estimate of association between disease and exposure– eg prevalence of lung cancer among smokers

Three types of bias

• Selection bias • Information bias

• Confounding

Birth complications in a Canadian hospital

HospitalComplications Total births % complications

Summer 20 240 8.3%

Winter 20 180 11.1%

What does it show?

Births at hospital and at home

HomeComplications Total births % complications

Summer 2 60 3.3%

Winter 2 120 1.7%

HospitalComplications Total births % complications

Summer 20 240 8.3%

Winter 20 180 11.1%Why?

Combining hospital and home data

All birthsComplications Total births % complications

Summer 22 300 7.3%

Winter 22 300 7.3%

Home deliveries were more common in winter. Labor complications among home deliveries were low. Women with prolonged or complicated labour attempt to reach the hospital no matter what season.

Selection bias: systematic differences between those who do and do not take part in a study

Prevalence of Human papilloma virus by source of subjects

Adapted from Revzina NV Int J STD/AIDS 2005

Prevalence of alcohol abuse(St Louis, Missouri)

No. of contact

attempts

Recruitment rate

Prevalence (%) alcohol

abuse1-5 56.5 3.89

7 56.5 3.98

8 70.3 4.22

9 73.1 4.26

10-57 85.2 4.61

Cottler et al 1987

Selection bias: depression in chronic pain patients

% with depression

setting

4.6% pain clinic

13.4% arthritis clinic

24.2% inpatient neurosurgery

57.1% psychiatric clinic

Dworkin & Gitlin 1991

Selection bias in words

• error due to systematic difference in characteristics of those who do and those who do not participate in a study

• study group is not representative of the population from which you think it was sampled

• can lead to – misleading prevalence estimate

– overestimate or underestimate of association between risk factor and disease

Exploring bias:cataracts and occupational irradiation

Cataract No cataract

Exposed 10 90 100

Not exposed 200 1800 2000

Percent of employees who developed cataractExposed = 10/100 = 10%Non-exposed = 200/2000 = 10%

What type of bias might occur

In reality

Self-selection bias I: the exposed are more concerned about sight

Cataract No cataract

Exposed 10 54 64

Not exposed 120 1080 1200

All the exposed with sight problems turn upOnly 60% of the all other groups turn up

Percent of employees who developed cataractExposed = 10/54 = 18.5%Non-exposed = 120/1200 = 10%

Self-selection bias II: the exposed with sight problems have move to other jobs

Cataract No cataract

Exposed 1 54 55

Not exposed 120 1080 1200

Only 10% of the exposed with sight problems are still employed60% of the all other groups turn up

Percent of employees who developed cataractExposed = 1/55 = 1.82%Non-exposed = 120/1200 = 10%

Types of selection bias

• sampling bias– eg faulty sampling frame

• self- selection bias– eg healthy worker effect

• response bias– eg more middle class/ worried participate

• diagnostic bias– knowledge of exposure status influences diagnosis

• admission (Berkson’s) bias– eg poor social support

Size is no protection….

The fall of the Literary Digest..

Magazine had 10,000,000 subscribers

Predicted a Republican victory

Democrats won, magazine folded

Information bias• systematic difference in quality/ accuracy of data

– point estimates– between groups

• reporting bias– recall bias– social desirability (halo effect)– Hawthorne effect

• measurement bias– poorly calibrated machine– poorly phrased questions

• observer bias– different observers give different results– eg blood pressure

Measuring blood pressure

Kim E S et al. Dia Care 2007;30:1959-1963

Misclassification bias

• a type of information bias– either exposure status incorrect – or disease status incorrect

• two types– at random– differential

Misclassification at random

Disease No disease

Exposed 250 250

Not exposed

100 400

No misclassification40% of exposed misclassified as not exposed

% with diseaseexposed 50%non-exposed 20%

% with diseaseexposed 50%non-exposed 40%

Disease No disease

Exposed 150 150

Not exposed

200 500

Random misclassification weakens the size of effect

Differential (systematic) misclassification

Disease No disease

Exposed 250 250

Not exposed

100 400

No misclassification

Systematic misclassificationExposed: diseased free labelled “disease”Non- exposed: diseased labelled “disease free ”

% with diseaseexposed 50%non-exposed 20%

% with diseaseexposed 80%non-exposed 20%

Disease No disease

Exposed 400 100

Not exposed

100 400

Differential misclassification can bias result in either direction

Effect of bias in epidemiology

• Systematic error in study resulting in incorrect estimate of association between exposure and disease

– unusual people participate

– exposure incorrectly measured

– outcome incorrectly classified

•There are lots of ways to do a poor study

•Need rigorous design and conduct of studies

The Play of Chance:

Influences the results of all studies, sometimes a little, sometimes a lot.

The play of chance?

• A week in Ninewells - 25 newborns

• 16 boys; 9 girls

• Natural variability

• Small numbers are volatile

Subsequent days

6 boys & 10 girls 11 boys & 3 girls 7 boys & 7 girls

10 20 30 40 50 60 70 80 90 100

Proportion Observed

X

25 babies: 11 boys: proportion=44%25 babies: 16 boys: proportion=64%25 babies: 14 boys: proportion=56%25 babies: 17 boys: proportion=68%

X XX

25 babies: 9 boys: proportion=36%

X

• Natural variability & small numbers are volatile• Some (unknown) ‘true’ proportion• Any particular sample gives just one possible result.

In subsequent weeks (different samples):

Breast cancer patients

Sample % surviving 1year

1 85%

2 77%

3 79%

4 89%

Suicide rate in women, Northern Ireland

Rate per100,000

Year

In research

chance influences the results

how can we find out if a result is due to chance?

can we get an idea of what the true answer might be?

Interpreting clinical trials

Active treatment Control

60% 55%

% surviving 3 years

Is there a real difference?

Interpreting results

The key question is:

How likely is it that the results happened by chance?

We need to measure chance

ie the likelihood of events happening

Probability: a measure of chance

Event Frequency Probability

throwing a six 1/6 0.167

throwing 3 sixes 1/216 0.005

dying if fly 1000 miles

1 in 1 million

0.000001

winning the lottery

1 in 14 million

0.00000007

Guide to probabilities• likely event large probability

eg frost in January (0.99)

• unlikely event small probability eg snow in August (0.001)

The Statistical Test

Proposes no effecteg no true difference between groups

By the play of chancea small difference may be seen

Calculate the probability thatthe difference is simply due to chance

The logic of a statistical test

Propose Null Hypothesis ie no effect

calculate probability of result by chance

if p small conclude chance unlikely

therefore the effect is real

Examples of p-values

Outcome measure

New treatment

Control

p-value

Dialstolic BP

85 mmHg

97 mmHg

0.002

% alive at 1 year

85% 70% 0.13

% pain free at 6 months

36%

23%

0.04

Decision rule

• if p<0.05 reject chanceie conclude real effect

• if p>0.05 cannot exclude chanceie cannot conclude there is real effect

Outcome measure

New treatment

Control p-value

Dialstolic BP 85 mmHg 97 mmHg 0.002

% alive at 1 year

85% 70% 0.13

% pain free at 6 months

36% 23% 0.04

Examples of p-values

Non-significance (P>0.05)

• Non-significant = no effect

• Absence of evidence = Evidence of absence

BSE infected meat is safeWe have no reason to believe it is harmfulWe have no evidence

The meaning of p< 0.05

If p=0.05

would get result (as extreme) 1 time in 20

ie if 20 independent tests

expect 1 spurious significance by chance

Two types of problemconclude there is effect when none exists

happens with multiple testing Type I error

fail to detect an effect when one exists happens with small studies Type II error

The trade off Statistical significance guards against

chance findings

If want too much protection (p<0.01), risk missing true effects

If too little protection (eg p,0.1), then likely to get spurious effects

Another approach:95% confidence interval

We know

i) observed treatment difference

ii) our result is affected by chance

We need to knowi) where the true value might lie

Several independent studies

Repeated survival studies

0

5

10

15

20

25

0 10 20 30 40

% survival

Stu

dy n

umbe

r

Series1

Mean 32.25

Most studies 30-34

True value likely to be 30 – 34 ish

Confidence Intervals

• repeated studies cluster round the true value• from one study

– need to specify a range – likely to contain the real value

• calculate 95% confidence interval

– Most of the time (95%) the confidence interval will contain the real value (but sometimes it will not).

Where does the true value lie?

• calculate the confidence interval– Mean +/- 1.96 * Std error

• 95% confident it includes the true value

• decide what this means

Examples of confidence intervals

• percent of boy babies - 62%– 95% confidence interval: 35% to 89%

• mean diastolic pressure - 95 mm Hg– 95% confidence interval: 88 to 102 mmHg

Interpreting the 95% c.i.

0 5 10 15 20-5-10-15-20

Active - Placebo

Placebobetter

Activebetter

If active same as placebo

Interpreting the 95% c.i.

0 5 10 15 20-5-10-15-20

Active - Placebo

Placebobetter

Activebetter

95% c.i.

Interpreting the 95% c.i..

0 5 10 15 20-5-10-15-20

Active - Placebo

Placebobetter

Activebetter

95% c.i.

Is zero likely to be the true value?

Can we reject the Null Hypothesis?

Interpreting the 95% c.i..

0 5 10 15 20-5-10-15-20

Active - Placebo

Placebobetter

Activebetter

95% c.i.-10 5

Can we reject the Null Hypothesis?

Pulling it together

• for difference in mean treatment effect– if zero within confidence interval – NOT

significant

• for ratio measures eg relative risk– no difference if ratio = 1

Confounding

The Glowing Field of Confounding FruitShona MacDonald

Confounding

Coffee drinking Pancreatic cancer

Smoking

Coffee drinking Pancreatic cancer

Defining confounding

• the observed association between two factors is due to the effect of a third factor– an apparent association may be spurious– a real association may be obscured

More confounding

• Divorced men drink more

• Alcohol caused divorce

Unhappy marriage caused both

• Derivation: Latin confundere – to mix up

• when an apparent association between a factor(F) and an outcome (O) is due to a third factor (R)

Confounding

R

F O

Ministers’ salaries

Price of beer

Confounding Ministers & Beer

Road traffic accidents

Age of driver Mortality rate (per 100,000)

35-44 years

1.9

75-84 years

6.2

Evidence for Hells’ Grannies ?

Clarification of terms

Factor of interest the one thought to be a new risk factor

Confounder the one that alters the observed relation

between factor of interest and outcome

Requirements for confounding

• Confounding factor must be associated with risk factor of interest

• confounder influences risk of disease

Why worry about confounding?

• Does air pollution cause bronchitis ?

Breathe polluted air

Develop bronchitis?

Have choices and power

Do seatbelts reduce crash injuries?

?Wear seatbelts

Risk averse

↓Injured in a crash

Do STD’s increase HIV transmission?

STD HIV

Risky sex

?

Does smoking lead to illicit drug use?

Factoreg smoking

Outcomeeg drug taking

R – true risk factoreg social deprivation

?

Dealing with confounding

• Must collect information on all known potential confounding factors

• Explore for confounding in the analysis

• Practically, difficult to know which are the important confounders

EPIET (www)

Cases of Down Syndrome by Birth Order

EPIET (www)

Cases of Down Syndrome by Age Groups

EPIET (www)

Cases of Down Syndrome by Birth Order

and Maternal Age

Note on confounders

A confounder

• is the true causal factor responsible for the disease

• has to be more strongly associated with the disease than the supposed risk factor– if smoking increased risk of lung cancer x10– confounder would need to have a bigger effect

Dealing with confounding:in the design

• restrict recruitment– to one level of confounding factor– could compromise sample size

• matching– see case-control studies

Dealing with confounding:in the analysis

stratified analysis explore association between risk factor (R) and

outcome (O) for each level (strata) of the confounding factor (F)

then calculate an overall, weighted (unconfounded) estimate

direct and indirect standardisation statistical modelling (multiple regression)

Control of confounding –hard to control unknown risk factors

• These methods can control only known potential confounders.

• Only random assignment of exposure can control for unknown potential confounders (see randomised controlled trials)

More Confounding

Is the association between obesity and mortality due to the confounding effect of hypertension?

Mortality

Obesity

?Hypertension

Hypertension is probably not a real confounder but

rather the mechanism whereby obesity causes mortality

Mortality

Obesity

Hypertension:mechanism or

intervening variable

*Manson JE et al: JAMA 1987;257:353-8.

Intervening variable

Mortality

Obesity

Hypertension

*Manson JE et al: JAMA 1987;257:353-8.

Even if hypertension is a mechanism linking obesity to mortality, are there alternative mechanisms may causally link obesity and mortality.

Mortality

Obesity

HypertensionBlock by

adjustment ?

Requirements for confounding

• Confounding factor must be associated with true risk factor and disease

• confounder does not influence risk of disease (not in the causal pathway)

Is maternal smoking a risk factor of perinatal death?

Is the association confounded by low birth weight?

Perinatal mortality

Maternal smoking

?Low birth

weight

Is low birth weight the mechanism by which maternal smoking leads to higher risk of perinatal death?

Low birth weight is an intervening variable

Perinatal mortality

Maternal smoking

Low birthweight

BUT THERE COULD BE AN ADDITIONAL QUESTION:Does maternal smoking cause perinatal death by mechanisms other than low birth weight?

Perinatal mortality

Maternal smoking

Low birthweight

Direct toxic effect?

Block by adjustment

Causal models

R

F O

(F confounding)

F R O

(R intervening)

Why confounding is a problem

risk behaviours are related smoking, drinking, diet, exercise health care seeking, compliance with medication

risk factors are hard to measure fully eg diet, alcohol, social class

social factors have complex associations social class, race, education

physical environment complexair quality, noise, traffic, parks are inter-related

many confounders are not knownhence control in the analysis is limited

Exploring dietary advice

The Japanese eat very little fat and suffer fewer heart attacks than the British or Americans.

On the other hand, the French eat a lot of fat and also suffer fewer heart attacks than the British or Americans.

Dietary advice

The Japanese drink very little red wine and suffer fewer heart attacks than the British or Americans.

On the other hand, Italians drink lots of red wine and also suffer fewer heart attacks than the British or Americans.

Conclusion: Eat & drink what you like. It appears that speaking English is what kills you.

borrowed from Victor J. Schoenbach, PhD

Bias, Chance & Confounding

• Assess bias first: Critical Appraisal

• Then assess chance: assumes no bias

• Then assess confounding: effect real,

what is the explanation?

What you should know• bias

– selection– information

• play of chance– p-values and confidence intervals– Type I and II errors

• confounding– meaning– reasons for– methods to control – stratify / adjust in statistical model

• intervening variables