Biostatistics workshop
Matthew Law, Awachana Jiamsakul
APACC, Hong Kong, 29 June 2018
Basics of statistical inference
Framework to fix ideas
• Two arm randomized trial
• X patient randomized to each of treatments A and B
• Treatments A and B compared using key endpoints
• Survival
• Proportions detectable HIV viral load
• Changes in CD4 count
Size and power
3
Hypothesis testing
• Randomise into two groups
• Null hypothesis
• No difference between treatments
• Mean change in CD4 count is the same for A and B
• Alternative hypothesis
• There is a difference between treatments
Size and power
4
Hypothesis testing
• Randomise into two groups
• Null hypothesis
• No difference between treatments
• Mean change in CD4 count is the same for A and B
• Alternative hypothesis
• There is a difference between treatments
• Under the null hypothesis
• The difference in mean change in CD4 count between A and B has a
known probability distribution
Size and power
5
Hypothesis testing
Size and power
6
0
t-distribution
Hypothesis testing
• Randomise into two groups
• Null hypothesis
• No difference between treatments
• Mean change in CD4 count is the same for A and B
• Alternative hypothesis
• There is a difference between treatments
• Under the null hypothesis
• The difference in mean change in CD4 count between A and B has a known probability
distribution
• Calculate the probability of something as or more extreme than observed in our sample
– p-value
• If ‘p’ is small, we can reject the null hypothesis
• If ‘p’ is not small, we can not reject the null hypothesis
Size and power
7
Hypothesis testing
• Randomise into two groups
• Null hypothesis
• No difference between treatments
• Mean change in CD4 count is the same for A and B
• Alternative hypothesis
• There is a difference between treatments
• Under the null hypothesis
• The difference in mean change in CD4 count between A and B has a known probability distribution
• Calculate the probability of something as or more extreme than observed in our sample – p-value
• If ‘p’ is small, we can reject the null hypothesis
• If ‘p’ is not small, we can not reject the null hypothesis
• Important point
• Failure to reject null hypothesis ≠ null hypothesis is true
Size and power
8
Hypothesis testing
• Type 1 error (size)
• Reject the null hypothesis when it is true
• 5%
• Type 2 error
• Fail to reject the null hypothesis when it is false
• 1 - type 2 error = power
Size and power
9
Hypothesis testing
Size and power
10
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
0 0,02 0,04 0,06 0,08 0,1 0,12
Po
wer
P-value
Trade off between significance level and power
0.05
Why 5%
• Ronald Fisher
Size and power
11
Confidence intervals
• Estimate the difference between the treatments
• Calculate a range of values for the treatment difference which
allows for random variation in your sample
• A confidence interval
• The width of the confidence interval depends on the amount of
random variation
Size and power
12
Confidence intervals
• Formally not a probability statement
• Probability a parameter lies in a 95% CI ≠ 0.95
• If we repeated the trial 1,000 times, we’d expect the 95% CI to
contain the parameter of interest 950 times
• 50 times (5%) won’t – type 1 error
• Working interpretation
• 95% CI gives a range of values for a parameter estimate that allows for
random variation
• NB Not bias
Size and power
13
Sample size
• Power increases with larger sample size
• Turns out that power total number of patients
Size and power
14
Power by total sample size
0.4
0.5
0.6
0.7
0.8
0.9
1
Sample size
Po
we
r
Hypothetical examples
Interpreting study results
15
Hypothetical examples
• Two arm RCT comparing A and B
• Change in CD4 count is endpoint
N Mean difference 95% CI p
1. 100 per arm 50 cells/µL 8 to 92 0.019
Interpreting study results
16
Hypothetical examples
• Two arm RCT comparing A and B
• Change in CD4 count is endpoint
N Mean difference 95% CI p
1. 100 per arm 50 cells/µL 8 to 92 0.019
2. 40 per arm 50 cells/µL -16 to 116 0.140
Interpreting study results
17
Hypothetical examples
• Two arm RCT comparing A and B
• Change in CD4 count is endpoint
N Mean difference 95% CI p
1. 100 per arm 50 cells/µL 8 to 92 0.019
2. 40 per arm 50 cells/µL -16 to 116 0.140
3. 40 per arm 10 cells/µL -57 to 77 0.776
Interpreting study results
18
Hypothetical examples
• Two arm RCT comparing A and B
• Change in CD4 count is endpoint
N Mean difference 95% CI p
1. 100 per arm 50 cells/µL 8 to 92 0.019
2. 40 per arm 50 cells/µL -16 to 116 0.140
3. 40 per arm 10 cells/µL -57 to 77 0.776
• Are these three trial results
1. Quite consistent
2. Completely inconsistent since some are significant, and others not
3. Strong evidence that A is better than B
Participant questions
19
Hypothetical examples
• Two arm RCT comparing A and B
• Change in CD4 count is endpoint
N Mean difference 95% CI p
1. 100 per arm 50 cells/µL 8 to 92 0.019
2. 40 per arm 50 cells/µL -16 to 116 0.140
3. 40 per arm 10 cells/µL -57 to 77 0.776
• Are these three trial results
1. Quite consistent
2. Completely inconsistent since some are significant, and others not
3. Strong evidence that A is better than B
Participant questions
20
Key messages
• Not statistically significant ≠ no effect
• Remember type 1 error
• 5% of all tests will be significant by chance alone
• Look at the confidence intervals
Summary
21
Hierarchy of study designs
Hierarchy
Hierarchy of study designs
23
Study type
Systematic review (meta-
analysis)
Randomised controlled trial
Cohort study
Case-control study
Cross-sectional study
Ecological study
Case report
Asse
ss c
ausation
Causation
Study Factor Outcome Factor
direction of assessment
incidence/prospective
independent
Confounder
Case report, case series
• Not really much use for establishing causality
• Useful for “proof of principle”
Hierarchy of study designs
25
Case report, case series
• Lorem ipsum
Hierarchy of study designs
26
Ecological study
• Compare populations
Hierarchy of study designs
27
Strengths Limitations
quick outcomes did not
necessarily occur in
individuals with exposure
generate hypotheses not possible to control for
confounders
influenced by time lags
Associations can be entirely
spurious
Ecological studies
Hierarchy of study designs
28
Cross-sectional studies
• Representative study population
• Exposure and outcome measure at same point in time
“snap shot”
Hierarchy of study designs
29
Strengths Limitations
quick difficult to recruit
appropriate sample
can determine prevalence difficult to control for
confounders
assess association cannot assess causation
Case-control studies
• Study population – disease status
– case
– control - independently selected, often matched
• Previous exposure investigated
Hierarchy of study designs
30
Strengths Limitations
quicker (do not have to wait
for outcome to occur)
difficult to control for bias
and confounding
large sample size not
required
Choice of controls and
matching critical
suitable for rare diseases exposure information
dependent on recall, and
maybe biased
Case-control studies
• Doll & Bradford-Hill
• BMJ 1950:2;4682
Hierarchy of study designs
31
Cohort studies
• Study population – population sample
• Retrospective studies
• Go back through medical records
• Problems with case validation and missing data
Hierarchy of study designs
32
Cohort studies
• Study population – population sample
• Prospective studies
• Fixed visit cohorts
• All subjects seen at regular scheduled visits and have same
standardised assessments
• Observational cohorts
• Data collected as and when patients attend clinic
• Have proved useful in HIV
Hierarchy of study designs
33
Cohort studies
Hierarchy of study designs
34
Strengths Limitations
Document outcomes
accurately
expensive
Control of missing data,
reduce recall bias
Lost to follow-up
Can determine timing
between exposure and
outcomes
Can be subtle, but very
powerful, confounding
factors and bias
Doll et al, Can dietrary beta-carotene materially reduce human cancer risk? Nature 1982:290;201-8
Cohort studies
• Beta-carotene, retinol and vitamin A
Hierarchy of study designs
35
NEJM 1994:330;1029-35
Cohort studies
• RCT of beta-carotene on lung-cancer incidence in smokers
Hierarchy of study designs
36
Randomised clinical trials
• Study population – eligible sample
• Randomised exposure
Hierarchy of study designs
37
Strengths Limitations
Scientifically rigorous Expensive can be difficult to
conduct
Most convincing evidence Generalisability may be poor
Control for known and unknown
confounders
May not be ethically feasible
Wright ST, Carr A, Woolley I, Giles M, Hoy J, Cooper DA, Law MG JAIDS 2011;58(1):72.
Early versus deferred ART in CD4>500 cells/µL
• A number of analyses of cohort studies giving contradictory
results
Hierarchy of study designs
38
Wright ST, Carr A, Woolley I, Giles M, Hoy J, Cooper DA, Law MG JAIDS 2011;58(1):72.
Early versus deferred ART in CD4>500 cells/µL
• A number of analyses of cohort studies giving contradictory
results
• Modelled a 14% reduction in AIDS/death • Starting ART >650 cells/µL versus 351-500 cells/µL
Hierarchy of study designs
39
NEJM 2015;373:795-807
START
• HR=0.43 95%CI (0.30, 0.62) p<0.001
Hierarchy of study designs
40
Meta-analysis of RCTs
• Combines results of similar randomised trials
• Highest level of evidence of causality
Hierarchy of study designs
41
Teeraananchai S, et al. HIV Medicine 2017;18(4):256-266
,
Meta-analysis of non-randomised studies
• Life expectancy following ART
Hierarchy of study designs
42
Study designs
Why are randomised controlled trials such powerful
evidence of treatment efficacy?
1. Because they are well powered
2. Because you can adjust for baseline covariates
3. Because randomisation balances both known and
unknown confounding factors
Participant questions
43
Study designs
Why are randomised controlled trials such powerful
evidence of treatment efficacy?
1. Because they are well powered
2. Because you can adjust for baseline covariates
3. Because randomisation balances both known and
unknown confounding factors
Participant questions
44
Study Endpoints
Am Jiamsakul
• Outcomes of the study research question
Study endpoints
46
• Continuous endpoint
• Differences in CD4 cell count, changes in BMI,
changes in drug concentration, etc, at a specified time point.
• Linear regression - Difference in mean (Diff)
Example: CD4 at baseline (cells/uL)
Difference in mean: 226-210=16
Study endpoints
47
Sex Mean CD4 (cells/uL) Diff 95% CI p
Male 210 Ref
Female 226 16 (8, 24) <0.001
• Continuous endpoint - univariate
Under normal distribution assumption
• Univariate linear regression = t-test (2 groups) and ANOVA (3
or more groups)
When data is not normally distributed
• Transform the variable in the linear regression (log, square
root)
• Use non-parametric test
Study endpoints
48
• Continuous endpoint - univariate
Non-parametric test for difference in mean
2 groups
• Wilcoxon rank-sum test (paired)
• Mann-Whitney U (or Sign test) (independent)
3 or more groups
• Friedman test (dependent, repeated measures)
• Kruskall Wallis test (independent)
Study endpoints
49
• Binary endpoint
• Treatment failure, undetectable VL,
drug resistance, etc, at a specified time point.
• Logistic regression- Odds ratio (OR)
Example: Undetectable VL at 12 months from ART
OR =[63/13]/[84/26] =1.5
Study endpoints
50
Sex Und VL VL fail OR 95% CI pMale 84 26 1Female 63 13 1.5 (1.2, 2.0) 0.001
• Count data
• Incidence of hospitalisation, SAEs,
non adherence, VL testing rate, etc, at any time point.
• Poisson regression- Incidence rate ratio (IRR)
Example: Glucose testing rates
IRR for age 41-50 = 43/36 = 1.19
IRR for age >50 = 59/36 = 1.64
Study endpoints
51
Age (years) Rate (/100PYS) IRR 95% CI p≤40 36 141-50 43 1.19 1.12-1.23 <0.001>50 59 1.64 1.15-2.01 <0.001
• Survival / time to event analysis
• Time to death, time to treatment failure, time to LTFU,
etc, at any time point.
• Cox regression – Hazard ratio (HR)
Example: Survival analysis
HR for 2006-2009 =1.27/2.11 = 0.60
HR for 2010-2013 = 1.10/2.11 = 0.52
Study endpoints
52
Year of ART initiation Rate (/100pys) HR 95% CI p2003-2005 2.11 12006-2009 1.27 0.60 (0.27, 0.75) <0.0012010-2013 1.10 0.52 (0.17, 0.64) <0.001
Cheng et al., Bull World Health Organ 2015;93:152–160 ,
• Graphical presentation – Survival curve
Study endpoints
53
Ribaudo et al., CID 2013;57(11):1607–17
• Graphical presentation
OR, HR, RR, IRR – Forest plot
Study endpoints
54
Fretts et al., Am J Clin Nutr doi: 10.3945/ajcn.114.101238
• Graphical presentation
Difference in mean– Forest plot (similar to this meta
analysis)
Study endpoints
55
Study endpointsYou want to analyse factors associated with having drug
resistance mutations at one year from ART initiation . Patients had
resistance testing done at various time from 6 months to two
years. What is the correct approach for the analysis?
1) Maximise sample size and include all mutation results to
perform survival analysis
2) Restrict to include only patients with mutation results at 1 year
and exclude all others to perform logistic regression
3) Maximise sample size and include all mutation results to
perform logistic regression
Participant questions
56
Study endpointsYou want to analyse factors associated with having drug
resistance mutations at one year from ART initiation . Patients had
resistance testing done at various time from 6 months to two
years.
1) Maximise sample size and include all mutation results to
perform survival analysis
2) Restrict to include only patients with mutation results at 1 year
and exclude all others to perform logistic regression
3) Maximise sample size and include all mutation results to
perform logistic regression
Participant questions
57
Q&A
Participant questions
58
SupplementaryA randomised trial compares treatments A and B in HIV-positive people.
The primary endpoint is undetectable HIV viral load at 48 weeks.
The study results are reported as a difference in proportions
undetectable of 15%, 95% CI -5% to 35%, p=0.035
What is wrong with these results as presented
1. The trial is unbiased but underpowered
2. The p-value and confidence interval are inconsistent
3. Should have been analysed using a survival analysis
Participant questions
59
SupplementaryA randomised trial compares treatments A and B in HIV-positive people.
The primary endpoint is undetectable HIV viral load at 48 weeks.
The study results are reported as a difference in proportions
undetectable of 15%, 95% CI -5% to 35%, p=0.035
What is wrong with these results as presented
1. The trial is unbiased but underpowered
2. The p-value and confidence interval are inconsistent
3. Should have been analysed using a survival analysis
Participant questions
60
Supplementary
An RCT presents results summarised
In the figure right
What is the best interpretation of these results?
1. Treatment A works better in women
2. The subgroup analysis is inappropriate
3. The estimated treatment effect in men and women is consistent
Participant questions
61
Figure 1. Comparison of treatment A and B on death rates, overall and by sex
0 0.5 1 1.5 2 2.5 3
Hazard ratioTreatment A better Treatment B better
Overall (N=3,000)
Men (N=2,000)
Women (N=1,000)
p=0.007
p=0.255
p=0.002
Supplementary
An RCT presents results summarised
In the figure right
What is the best interpretation of these results?
1. Treatment A works better in women
2. The subgroup analysis is inappropriate
3. The estimated treatment effect in men and women is consistent
Participant questions
62
Figure 1. Comparison of treatment A and B on death rates, overall and by sex
0 0.5 1 1.5 2 2.5 3
Hazard ratioTreatment A better Treatment B better
Overall (N=3,000)
Men (N=2,000)
Women (N=1,000)
p=0.007
p=0.255
p=0.002
Supplementary
An RCT presents results summarised
In the figure right
What would help interpretation of these results?
1. A test for interaction between treatment effect and sex
2. Recruit more women
3. Adjust for age, as women might be younger than men
Participant questions
63
Figure 1. Comparison of treatment A and B on death rates, overall and by sex
0 0.5 1 1.5 2 2.5 3
Hazard ratioTreatment A better Treatment B better
Overall (N=3,000)
Men (N=2,000)
Women (N=1,000)
p=0.007
p=0.255
p=0.002
Supplementary
An RCT presents results summarised
In the figure right
What would help interpretation of these results?
1. A test for interaction between treatment effect and sex
2. Recruit more women
3. Adjust for age, as women might be younger than men
Participant questions
64
Figure 1. Comparison of treatment A and B on death rates, overall and by sex
0 0.5 1 1.5 2 2.5 3
Hazard ratioTreatment A better Treatment B better
Overall (N=3,000)
Men (N=2,000)
Women (N=1,000)
p=0.007
p=0.255
p=0.002