+ All Categories
Home > Documents > Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Date post: 03-Jan-2016
Category:
Upload: emma-caldwell
View: 218 times
Download: 0 times
Share this document with a friend
Popular Tags:
111
Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08
Transcript
Page 1: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Data analysis considerations for (clinical) research

Jarno Tuimala2015-09-08

Page 2: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

2

Schedule 2015

LECTURESHaartman instituutti, Haartmaninkatu 3, pieni luentosali 14:15-15:45

1. Tue 1.9.2015Otto Helve: Introduction and curriculum of a clinical investigator

2. Wed 2.9.2015Jussi Merenmies: Evaluating results from a randomised controlled trialErkki Isometsä: Clinical Epidemiology: observational studies

3. Tue 8.9.2015 Jarno Tuimala: Statistical considerations for research plans

4. Wed 9.9.2015 Ritva Loponen, Harriet Colliander: Clinical trial registrations and submissions to the authorities

5. Tue 22.9.2015Mikael Knip: Research in international setting

Page 3: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

3

Principles of experimental design• Ronald A. Fisher (1935)

1. Comparison (results are in relation to something)2. Replication (several obs. units per groups)3. Randomization (randomly allocate units to groups)4. Blocking (take confounding into account)5. Factorial experiments (study interactions)

• Originally built on a foundation of analysis of variance (ANOVA), and aimed for agricultural experiments

Page 4: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Question 1The world’s oldest clinical

trial• Bible, Book of Daniel, 1:3-16.• Treatment group: Four boys from Israel

were given just water and vegetables.• Control group: Another group of boys

received meat and wine from the king's table.

• After ten days the groups were visually compared, and the treatment group was found to be healthier than the control group.

• In addition, the treatment group was ten times better in all matters of wisdom and understanding than the control group.

1. Comparison2. Replication3. Randomization

• What design principles were used in this experiment?

• What are the response and explanatory variables?

• How would you make the experiment better?

• What statistical method(s) would you use for analyzing this?

4

Page 5: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

5

Explanation

• Design principles used:• Comparison (two groups)• Replication (several individuals in both groups)

• Variables:• Response variables: health, knowledge• Explanatory variables: diet

• Make a better study:• Measure at baseline, i.e., at startof the study• Allocate individuals to groups randomly

Page 6: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

6

Things to consider

• Hypothesis• Outcome measures

• Data sources• Registries• Experiments• Data management

• Study design• Observational and experimental design• Sample size

• Statistical analysis• Reporting

Page 7: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

7

Hypothesis

Page 8: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

8

Study objectives

• Testable hypotheses?

• Primary and secondary questions?

• Example:• Primary: Does smoking cause lung cancer?• Secondary: Are old smokers in worse shape than old non-

smokers?

Page 9: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

9

Outcome measures

• What will be measured?• Does the individual get the disease (yes/no)?• How long does it take for the individual to get the disease

(time)?• How severe is the disease (laboratory tests, various scores

or gradings)?• Proxies?

• Example• Do smokers get cancer more often than non-smokers?• Does it take longer for non-smokers to get cancer than for

smokers?

Page 10: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

10

Smoking and cancer

• Objective: Find, if smoking causes cancer• Hypothesis: Smokers get cancer more often than

non-smokers

• Next:• What kind of data is needed to test this? • Where to get data to test this?

Page 11: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

11

Data sources

Page 12: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

12

Smoking and cancer

• Hypothesis: Smokers get cancer more often than non-smokers

• Data needs, at least:• Two groups: smokers, non-smokers• Data: Smoking status, cancer end-point status

Page 13: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

13

Registry or experimental study?• Experimental

• Expose individuals to tobacco smoke?• Not ethical -> registry study

• A review from an ethical board is needed

• Registry• If strictly registry-based, no ethical board review needed• If patients or their relatives are contacted, a review is

mandatory

Page 14: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

14

Registries (examples)

• National• Hospital’s discharge registry (HILMO) [THL]• Cancer registry [Cancer Society / THL]• Causes of Death [Statistics Finland]• Medications [KELA]• New special embursements for medicines [KELA]• ASA Registry [TTL]

• Local• Hospital registries

• Studies• Health 2000 / 2011

Page 15: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

15

Registry study example

• Easy to assess whether individual has or has had lung cancer

• Much harder to assess whether they smoked or not• Health 2000 /2011 helps

• Use Health 2000 or 2011 data to pick the smokers and non-smokers

• Link with other registries (cancer registry) to assess the cancer status

• Do you need to collect other variables?

Page 16: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Confounding

Page 17: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Causal inference

17

Page 18: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Causality is (often) the aim

18

• Causal effects?

– The amount of total damage of a fire and the number of firemen at the site are strongly correlated. Do the firemen cause the damages?

– More of the lung cancer patients are smokers thannon-smokers. Does smoking cause lung cancer?

• Evidence based medicine...

Page 19: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Confounding

cause outcome

confounder

19

Page 20: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Confounding

smoking Lung cancer

occupation

20

Page 21: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

21

Question 2

• The amount of total damage of a fire and the number of firemen at the site are strongly correlated. Do the firemen cause the damages?

• What could be:1. Outcome2. Cause3. Confounder(s)For the firemen example?

Page 23: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

23

Smoking and cancer

• Hypothesis: Smokers get cancer more often than non-smokers

• Data needs, at least:• Two groups: smokers, non-smokers• Data: Smoking status, cancer end-point status [from

Health 2000 or 2011]• Data: Occupational exposure [from ASA registry], Age,

Sex [from Health 2000 or 2011]

Page 24: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

24

Note on causality

Time

Exposure Cancer

This is the right way!

Page 25: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

25

Note on causality

StatinsHeart attack

Time

Don’t do this!

Page 26: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

26

Confounding by indication, example• Mikkola R, Heikkinen J, Lahtinen J, Paone R,

Juvonen T, Biancari F. Does blood transfusion affect intermediate survival after coronary artery bypass surgery? Scandinavian journal of surgery : SJS : official organ for the Finnish Surgical Society and the Scandinavian Surgical Society 102: 110-6, 2013.

Page 27: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Confounding by indication

• The patient’s condition affects the way treatments or medication are allocated (confounding by severity).

• So, business as usual, but it creates problems during epidemiological (observational) studies.

33

Page 28: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Confounding by indication

34

• If the effect of treatment is not adjusted for the initial condition of the patient, a risk for drawing a wrong conclusion is high!

Page 29: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Solutions

29

• The previous example is a type of confounding by indicationcalled confounding by severity.

• Usual statistical methods, such as multivariate regression do not adjust for unmeasured variables that are often of importance in this kind of a situation.

• Or even if measured, the severity of disease is a royal pain to adjust for!– Propensity score adjustment, inverse-probability weighting

(Rubin) or instrumental variable methods (factor analysis and structural equation modeling) might work better.

– If possible, better to use a controlled trial, where patients can be randomized to treatment and no-treatment (or placebo).

– Remember natural experiments, also!– In other words, this is not necessarily very easy...

Page 30: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Causal pathway and confounders

Lung cancer

Alcohol Tobacco

Socioeconomicstatus

CYP2D6

30

Occupation

Page 31: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Study designs

Page 32: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

32

Study designs

• Observational studies• Case-control studies• Cohort studies

• Treatment studies• Randomized Controlled Trials (RCTs)

Page 33: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

33

Case-control study

Page 34: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

34

Case-control study - Initiation

Time

Sampling

Age

Page 35: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

35

Case-control study – Disease status

Time

Sampling

Age

CaseHas the disease

ControlDoesn’t have the disease

Page 36: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

36

Case-control study - Sampling

Time

Sampling

Age

CaseHas the disease

ControlDoesn’t have the disease

Page 37: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

37

Case-control study - Matching

Time

Sampling

Age

CaseHas the disease

ControlDoesn’t have the disease

Page 38: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

38

Case-control study – Exposure?

Time

Sampling

Age

CaseHas the disease

ControlDoesn’t have the disease

Page 39: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

39

Smoking and cancer

• Hypothesis: Smokers get cancer more often than non-smokers

• Data needs, at least:• Two groups: smokers, non-smokers• Data: Smoking status, cancer end-point status [from

Health 2000 or 2011]• Data: Occupational exposure [from ASA registry], Age, Sex

[from Health 2000 or 2011]

• Study design: Case-control study• Cases and controls sampled from Health 2011

Page 40: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

40

Designed experiments

Page 41: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Treatment studies - Factorial design

• In designed experiments!• Sometimes used in clinical trials, also

• Factor is a manipulated phenomenan, or a treatment, presumed to affect the experiment, e.g.:

• Name of the factor: factor levels• Sex: male and female rats• vitamin C: low and high level

• Factorial designs have at least two distinct factors

41

Page 42: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Full factorial design, terms• The full factorial design

shown on the previous slides is often marked as 22 (or 2x2) and gives 2*2=4 different combinations or treatments.

• The base is the number of factor levels and the exponent gives the number of factors. Thus, there is a family of full factorial design that can be marked as 2k.

42

Diet

Normal Chocolate

Sex

Mal

eF

amal

e

Factors

Levels

Group 1 Group 2

Group 3 Group 4

Page 43: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

44

Question 3

• We have selected to use a case-control study.• Could a similar hypothesis be studies with other

designs? What about:• Cohort study?• Trial?• Factorial design?

• And why or why not?

Page 44: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

45

Explanation

• Smoking and lung cancer can be studied by:• Cohort study

• But it can’t be studied with:• Trials• Factorial designs

• Why?• Time from exposure to cancer -> prospective cohort study

• Why not?• Unethical to expose individuals to tobacco smoke on

purpose

Page 45: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

46

Sample size

Page 46: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

47

Sample size

• How many individuals do you need to have (in both groups) in order to be able to find a statistically significant difference (between the groups)?

• Essential step!• Many published studies are under-powered• R. Tsang, L. Colley, L. D. Lynd. Inadequate statistical power to detect

clinically significant differences in adverse event rates in randomized controlled trials. Journal of Clinical Epidemiology, 62:609–616, 2009.

• Educated guesswork• Very straightforward: Go to the library, and search for similar experiments you are

going to perform, and see how large a sample size is utilized in those.

• Formal power analysis• Should be done before the experiment is conducted.• Will complement the educated guesswork, or be worked out even without it.• Can be used for estimating any of the things listed on the following slides, if other

four are known or guessed.

Page 47: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

48

These affect the sample size• Desired power ↑ -> sample size ↑• Desired ”p-value” ↑ -> sample size ↓• Effect size ↑ -> sample size ↓

• Possibly estimated by a pilot study

• Amount of random variation ↑ -> sample size ↑• Possibly estimated by a pilot study

• Desired levels for Type I and Type II errors• Usually

• Type I (alpha) = 0.05 (”p-value”), false positives• Type II (beta) = 0.80 (”power”), 1 – frequency false negatives

Page 48: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

49

Power for a case-control study

Page 49: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Analysis

Page 50: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

51

Statistical analysis plan -Write it before you have the data!

1. Introduction2. Data sources3. Analysis objectives4. Analysis sets / populations / subgroups5. Endpoints and covariates6. Handling of missing values7. Other data convensions8. Statistical procedures9. Adjustment for confounders, etc.10. Sensitivity analyses11. Rationale for deviation (during the analysis) from this plan12. Quality control plan13. Programming plans14. References15. Appendices

Adapted from https://www.pfizer.com/files/research/research_clinical_trials/Clinical_Data_Access_Request_Sample_SAP.pdf

Page 51: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

52

Data manipulation

Page 52: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

53

Data manipulation

• Missing values• Not all individuals necessarily have values for all variables• For example, some individuals might miss information for

age and sex

• Solutions• Remove from the analysis all individuals with at least one

missing value• Impute, or estimate, the missing values using information

from other variables• SPSS offers, for example, a pairwise deletion possibility,

but it biases the results

Page 53: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

54

Example

Individual Age Sex Smoking Cancer

1 64 M S 1

2 79 M S 0

3 ?? M NS 0

4 91 M NS 1

5 83 F S 1

6 65 F NS 0

7 90 F NS 0

Page 54: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

55

Example - imputation

Individual Age Sex Smoking Cancer

1 64 M S 1

2 79 M S 0

3 90 M NS 0

4 91 M NS 1

5 83 F S 1

6 65 F NS 0

7 90 F NS 0

Page 55: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

56

Example – case-wise deletionIndividual Age Sex Smoking Cancer

1 64 M S 1

2 79 M S 0

4 91 M NS 1

5 83 F S 1

6 65 F NS 0

7 90 F NS 0

Page 56: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

57

Smoking and cancer

• Hypothesis: Smokers get cancer more often than non-smokers

• Data needs, at least:• Two groups: smokers, non-smokers• Data: Smoking status, cancer end-point status [from

Health 2000 or 2011]• Data: Occupational exposure [from ASA registry)

• Study design: Case-control study• Cases and controls sampled from Health 2011

• Analysis: • Missing values for explanatory variables are imputed

Page 57: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

58

Statistical analyses

Page 58: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Odds ratio – a measure of association

• Odds for cancer | smoker: 12 / 8 = 1.5• Odds for cancer | non-smoker: 36 / 180 = 0.2• Odds ratio = 1.5 / 0.2 = 7.5• The odd for a smoker getting lung cancer is

7.5times that of an odd for a non-smoker 59

Page 59: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Chi square test for odds ratio

60

• Is smoking associated with the cancer status?

Pearson's Chi-squared test with Yates' continuity correction

Warning message:: Chi-squared approximationIn chisq.test(m)

may be incorrect

data: mX-squared = 18.6247

df = 1p-value = 1.591e-05

Page 60: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Fisher’s exact test for odds ratio

61

Fisher's Exact Test for Count Data

data: m

p-value = 5.103e-05

interval:95 percent

2.575804

confidence

22.522438

sample estimates:

odds ratio

7.407224

alternative hypothesis: true odds ratio isnot equal to 1

Page 61: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

What is a P-value?

62

• Technicalities:– Null hypothesis: the odds ratio is not different from one– Alternative hypothesis: the odds ratio is different from one

• P-value gives us the probability that we would get a) such an extreme test statistic (here, X-squared) value or b) observe such an extreme data set, if the null hypothesis is true.

• Usually the P-value is compared to a cut-off, say 0.05, and if the P-value is smaller than the cut-off, the result is called statistically significant.

Page 62: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

What is a P-value?• P-values are used for testing hypothesis: one P-

valueper hypothesis!– Does smoking predispose individuals for lung cancer?– Does a larger exposure (more cigarettes smoked) give rise

to larger risk?

• If there is no hypothesis to be tested, do notgenerate a P-value!

• P-value is not the whole story. Pay attention to the effect size, also. More on this later. 20

Page 63: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

What is a confidence interval?

64

• A counterpart of p-value with a cut-off of 0.05 can bethought of being the confidence interval of 95%.

• If the same experiment would be repeated, say, a hundred times, the true population value (of OR) would fall inside the confidence interval in average 95 times out of hundred.

• If the 95% confidence interval for an odds ratio does not include one, the result is statistically significant at a0.05 risk level.

• Used for giving an idea of how imprecise the result is.

Page 64: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Which test to use - tableTypes of your dependent variable

Interval/Ratio (Normality assumed)

Interval/Ratio (Normality not assumed), Ordinal Dichotomy (Binomial)

Compare two unpaired groups Unpaired t test Mann-Whitney test Fisher's test

Compare two paired groups Paired t test Wilcoxon test McNemar's test

Compare more than two unmatched groups ANOVA Kruskal-Wallis test Chi-square test

Compare more than two matched groups

Repeated-measures ANOVA Friedman test Cochran's Q test

Find relationship between two variables Pearson correlation Spearman correlation Cramer's V

Predict a value with one independent variable

Linear/Non-linear regression Non-parametric regression Logistic regression

Predict a value with multiple independent variables or binomial variables

Multiple linear/non-linear regression

Poisson regression, survival analysis Multiple logistic regression

65Adapted from http://yatani.jp/HCIstats/HomePage

Page 65: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Adjusting for confounding

66

Page 66: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Stratification

26

strata

Mantel & Haenszel, 1956

occupation cases controls

non-smokers smokers non-smokers smokers

Housewives and white-collars

36 12 180 8

Other occupations 10 6 56 5

hw & wc Lung cancer No lung cancer

Smokers 12 8

Non-smokers 36 180

other Lung cancer No lung cancer

Smokers 6 5

Non-smokers 10 56

Page 67: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Separate analysis

68

OR=7.4 (2.6-22.5)

OR=6.5 (1.4-32.9)

hw & wc Lung cancer No lung cancer

Smokers 12 8

Non-smokers 36 180

other Lung cancer No lung cancer

Smokers 6 5

Non-smokers 10 56

Page 68: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Stratified analysis

69

• Mantel-Haenzel’s test: A Chi Square test withweighting over a stratification variable– OR = 7.2 (3.3 – 15.9)– Effect of smoking is significant even when the

confounding variable (occupation) is adjusted for.

occupation cases controls

non-smokers smokers non-smokers smokers

Housewives and white-collars

36 12 180 8

Other occupations 10 6 56 5

Page 69: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

70

Regression modelingResponse variable Example Regression method

Continuous Height of a person Linear regression

Dichotomous Disease / no disease[case-control studies]

Logistic regression

Count Number of naevi[cohort studies, and others]

Poisson regression

Time Time to death[cohort studies]

Cox’s regression

• These are very general and flexible methods• Several explanatory variables can be used in the model• Interactions between explanatory variables can be modeled

• If you know these, you seldom need anything else, since e.g., t-test, ANOVA, and ANCOVA can all be performed using linear (regression) models.

Page 70: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Logistic regression

30

• Regression:– Allows adjusting for several confounders and

co- variates at the same time– Different types for different purposes

• Linear, logistic, Poisson, survival time, ...

• Logistic regression:– The response (dependent) variable has two

possiblevalues (yes / no)

– Estimates an odds ratio, confidence interval and a p-value for every variable or variable’s level.

Page 71: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Age, occupation and smoking

72

• Effect of smoking is adjusted for both age and occupation at the same time.

• Note that after adjustment the OR is higher than the raw OR!

Variable OR (95% CI)Age

<45 145-54 1.91 (0.61... 6.75)55-64 2.05 (0.68... 7.24)>65 3.35 (1.07...12.18)

Occupationhousewife 1white-collar 0.92 (0.42... 1.91)other 0.97 (0.46... 1.97)

Smokingno 1yes 9.97 (4.22...25.28)

Page 72: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Causal pathway and confounders

Lung cancer

Alcohol Tobacco

Socioeconomicstatus

CYP2D6

73

Occupation

Page 73: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Causal pathway and confounders

Lung cancer

Alcohol Tobacco

Socioeconomicstatus

CYP2D6

74

Page 74: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Genotype and smoking

75

• Observation: Smoking is associated with lungcancer.

• Tobacco industry: observed association between smoking and lung cancer could be explained by some cancer predisposing genotype that also creates a craving for nicotine.

Page 75: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

CYP2D6 genotype and smoking

Pharmacogenetics. 1998 Jun;8(3):227-38.76

• Hypothesis: Carriers of CYP2D6 inactivating allele(s) metabolize chemicals in tobacco faster than others, and makes these individuals smoke more often than others.

• Observation: Risk of lung cancer for carriers of inactivating mutation is 0.69 (95% CI = 0.52- 0.90).

Page 79: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

80

Regression modeling Cox regression example

Page 80: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

81

Regression modeling Cox regression example

Page 81: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

82

Question 4

• We have collected data on• Response variable:

• Lung cancer• Explanatory variables:

• Smoking• Age• Sex• Occupation

• What statistical method(s) would you use to assess the association of explanatory variables and lung cancer

Page 82: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

83

Smoking and cancer

• Hypothesis: Smokers get cancer more often than non-smokers

• Data needs, at least:• Two groups: smokers, non-smokers• Data: Smoking status, cancer end-point status [from Health

2000 or 2011]• Data: Occupational exposure [from ASA registry), age, sex

[from Health 2011]

• Study design: Case-control study• Cases and controls sampled from Health 2011

• Analysis: • Missing values for explanatory variables are imputed• Confounders are adjusted for using logistic regression

Page 83: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Clinical relevance

84

Page 84: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Statistical and clinical significance

85

• Even if the result is statistically significant, it may• not be clinically significant

– Minimal clinically important difference (MCID)• MCID has to be decided before the study

– Sometimes it is known beforehand, sometimes not, and it has to be based on an educated guess.

• For case-control studies, MCID can also be thought of as, e.g., how much some new predictors help in setting the diagnosis.

Page 85: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

COPD• For forced expiratory volume in one second (FEV1)

an increase of about 100 mL, which can be perceived by patients, is sometimes considered MCID.

• Bronchodilators in healthy persons:– Salbutamol: FEV1 increase of 62 mL (0 – 152 mL)

• Bronchodilators in COPD patients (FEV1 in litres):– Pre-salbutamol: 1.29 (0.80-2.12)– Post-salbutamol: 1.53 (1.19-2.58)– Post-placebo: 1.40 (1.36-1.42)– Post-caffeine: 1.36 (1.31-1.41) [5 mg / kg, for

asthma]– Post-indacaterol: 1.71 (1.63-1.78)– Post-formoterol: 1.65 (1.59-1.70)

39COPD. 2005 Mar;2(1):111-24; Chest. 2008 Aug;134(2):387-93; Caffeine for asthma (Cochrane Review)

Page 86: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

COPD

87

Page 87: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

88

Smoking and cancer

• Hypothesis: Smokers get cancer more often than non-smokers

• Data needs, at least:• Two groups: smokers, non-smokers• Data: Smoking status, cancer end-point status [from Health 2000

or 2011]• Data: Occupational exposure [from ASA registry), age, sex [from

Health 2011]

• Study design: Case-control study• Cases and controls sampled from Health 2011

• Analysis: • Missing values for explanatory variables are imputed• Confounders are adjusted for using logistic regression• Clinical relevance is set at OR > 2

Page 88: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

89

Presenting results

Page 89: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Odds ratio – a measure of association

• Odds for cancer | smoker: 12 / 8 = 1.5• Odds for cancer | non-smoker: 36 / 180 = 0.2• Odds ratio = 1.5 / 0.2 = 7.5• The odd for a smoker getting lung cancer is

7.5times that of an odd for a non-smoker 90

Page 90: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Graphical representation of the table

91

Page 91: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Age, occupation and smoking

92

• Effect of smoking is adjusted for both age and occupation at the same time.

• Note that after adjustment the OR is higher than the raw OR!

Variable OR (95% CI)Age

<45 145-54 1.91 (0.61... 6.75)55-64 2.05 (0.68... 7.24)>65 3.35 (1.07...12.18)

Occupationhousewife 1white-collar 0.92 (0.42... 1.91)other 0.97 (0.46... 1.97)

Smokingno 1yes 9.97 (4.22...25.28)

Page 92: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Risk theatre

Non-smoker cases / 10 years Smokers cases / 10 years

Doll & Hill 191566

Page 93: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

94

Statins – base rate (absolute risk)

Page 94: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

95

Statins – relative risk

Page 95: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

96

Statins – benefits versus adverse effects

Page 96: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

97

Statins – risk theatre

Page 97: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

98

Statins – NNI / risk theatre

Page 98: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

99

Smoking and cancer

• Hypothesis: Smokers get cancer more often than non-smokers• Data needs, at least:

• Two groups: smokers, non-smokers• Data: Smoking status, cancer end-point status [from Health 2000 or

2011]• Data: Occupational exposure [from ASA registry), age, sex [from Health

2011]

• Study design: Case-control study• Cases and controls sampled from Health 2011

• Analysis: • Missing values for explanatory variables are imputed• Confounders are adjusted for using logistic regression. No subgroup

analyses are planned.• Clinical relevance is set at OR > 2• Results are presented as a regression table and graphically

Page 99: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

100

Reporting guidelines

Page 100: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

101

Reporting guidelines

• STROBE• STrengthening the Reporting of OBservational studies in

Epidemiology

• CONSORT• CONsolidated Standards of Reporting Trials

• Follow these!

Page 101: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

102

STROBEMethodsStudy design 4 Present key elements of study design early in the paper

Setting 5 Describe the setting, locations, and relevant dates, including periods of recruitment, exposure, follow-up, and data collection

Participants 6 (a) Give the eligibility criteria, and the sources and methods of case ascertainment and control selection. Give the rationale for the choice of cases and controls

(b) For matched studies, give matching criteria and the number of controls per case

Variables 7 Clearly define all outcomes, exposures, predictors, potential confounders, and effect modifiers. Give diagnostic criteria, if applicable

Data sources/ measurement

8* For each variable of interest, give sources of data and details of methods of assessment (measurement). Describe comparability of assessment methods if there is more than one group

Bias 9 Describe any efforts to address potential sources of bias

Study size 10 Explain how the study size was arrived atQuantitative variables

11 Explain how quantitative variables were handled in the analyses. If applicable, describe which groupings were chosen and why

Statistical methods 12 (a) Describe all statistical methods, including those used to control for confounding

(b) Describe any methods used to examine subgroups and interactions

(c) Explain how missing data were addressed(d) If applicable, explain how matching of cases and controls was addressed

(e) Describe any sensitivity analyses

Page 102: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

103

STROBEResultsParticipants 13* (a) Report numbers of individuals at each stage of study—eg

numbers potentially eligible, examined for eligibility, confirmed eligible, included in the study, completing follow-up, and analysed

(b) Give reasons for non-participation at each stage

(c) Consider use of a flow diagramDescriptive data

14* (a) Give characteristics of study participants (eg demographic, clinical, social) and information on exposures and potential confounders(b) Indicate number of participants with missing data for each variable of interest

Outcome data 15* Report numbers in each exposure category, or summary measures of exposure

Main results 16 (a) Give unadjusted estimates and, if applicable, confounder-adjusted estimates and their precision (eg, 95% confidence interval). Make clear which confounders were adjusted for and why they were included

(b) Report category boundaries when continuous variables were categorized(c) If relevant, consider translating estimates of relative risk into absolute risk for a meaningful time period

Page 103: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

Data management

Page 104: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

105

Reproducible research

• To sum up the previous steps:• Data gathering• Data analysis• Data presentation

• Working habit:• Document everything• Everything is a text file• Save in an open file format• Files should be human readable• Tie your files together• Have a data management plan

• Organization, (long-term) storage, availability• Use versioning on all files

Page 105: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

106

Clinical trials at Duke

• Potti et al. studied chemosentivity of cancer cell lines.

• Results were going to be applied in a clinical trial.• And so it begins...

• http://bioinformatics.mdanderson.org/Supplements/ReproRsch-All/Modified/StarterSet/index.html

Page 106: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

107

Summary in two minutes

• Coombs et al. delved into the analysis...• Doxorubicin

• Sensitive / resistant labels were reversed in the analysis• Some samples in the test data were duplicated• Some samples are labeled both sensitive and resistant

• Cisplatin and pemetrexed• Gene lists were off by one, the correct list does not differentiate

the cell lines• Some genes are not on arrays that were used• Sensitive / resistant labels are again reversed

• And the list goes on, see: http://arxiv.org/pdf/1010.1092.pdf

Page 107: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

108

Meticulous documentation

• Protect the individuals recruited for the study• Protect your co-workers and co-authors• Protect yourself

• Work openly. • Everybody makes mistakes. Embrace and learn

from them!

Page 108: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

109

Smoking and cancer

• Hypothesis: Smokers get cancer more often than non-smokers• Data needs, at least:

• Two groups: smokers, non-smokers• Data: Smoking status, cancer end-point status [from Health 2000 or 2011 and

cancer registry]• Data: Occupational exposure [from ASA registry), age, sex [from Health 2011]

• Study design: Case-control study• Cases and controls sampled from Health 2011

• Analysis: • Missing values for explanatory variables are imputed• Confounders are adjusted for using logistic regression. No subgroup analyses are

planned.• Clinical relevance is set at OR > 2• Results are presented as a regression table and graphically

• Data management• Primary data is available from the mentioned registries. Code documenting data

manipulation and analyses is published as supplementary information with the article.

Page 109: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

110

Wrap-up

Page 110: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

111

Summary

• The analysis methods are coupled to the study design which is itself affected by the hypotheses

• Write the analysis plan before you have the data• Prepare for small deviations, but don’t change the major

themes

• Learn regression methology, it will serve you well• Make a data management plan. Document everything• Learn from mistakes. Everybody makes them.

• Also, protect the innocent. It’s better to have a horrible end than horrors without end.

Page 111: Data analysis considerations for (clinical) research Jarno Tuimala 2015-09-08.

112

Question 5 - Homework10 but the official told Daniel, "I am afraid of my lord the king, who has assigned your food and drink. Why should he see you looking worse than the other young men your age? The king would then have my head because of you." 11 Daniel then said to the guard whom the chief official had appointed over Daniel, Hananiah, Mishael and Azariah, 12 "Please test your servants for ten days: Give us nothing but vegetables to eat and water to drink.13 Then compare our appearance with that of the young men who eat the royal food, and treat your servants in accordance with what you see." 14 So he agreed to this and tested them for ten days. 15 At the end of the ten days they looked healthier and better nourished than any of the young men who ate the royal food.16 So the guard took away their choice food and the wine they were to drink and gave them vegetables instead. 17 To these four young men God gave knowledge and understanding of all kinds of literature and learning. And Daniel could understand visions and dreams of all kinds. 18 At the end of the time set by the king to bring them in, the chief official presented them to Nebuchadnezzar. 19 The king talked with them, and he found none equal to Daniel, Hananiah, Mishael and Azariah; so they entered the king's service. 20 In every matter of wisdom and understanding about which the king questioned them, he found them ten times better than all the magicians and enchanters in his whole kingdom.

• Does this description fulfill the STROBE guidelines. If not, what is missing?


Recommended