Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | randy-lamkins |
View: | 214 times |
Download: | 0 times |
Clinical Statistics for Non-Statisticians – Part II
Kay M. Larholt, Sc.D.
Vice President, Biometrics & Clinical Operations
Abt Bio-Pharma Solutions
2
Topics
1) Review of Statistical Concepts2) Hypothesis Testing3) Power and Sample Size4) Interim Analysis
3
Basic Statistical Concepts
4
Statistics
Per the American Heritage dictionary - “The mathematics of the collection, organization,
and interpretation of numerical data, especially the analysis of population characteristics by inference from sampling.”
• Two broad areas Descriptive – Science of summarizing data Inferential – Science of interpreting data in order
to make estimates, hypothesis testing, predictions, or decisions from the sample to target population.
5
Introduction to Clinical Statistics
• Statistics - The science of making decisions in the face of uncertainty
• Probability - The mathematics of uncertainty – The probability of an event is a measure of how
likely the event is to happen
6
Sample versus Population
7
Descriptive Statistics for Continuous Variables
Measures of central tendency Mean, Median, Mode
Measures of dispersion Range, Variance, Standard deviation Measures of relative standing Lower quartile (Q1) Upper quartile (Q3)
Interquartile range (IQR)
: range (IQR)
8
Basic Probability Concepts
Sample spaces and events
Simple probability
Joint probability
9
Probability
• Probability is the numerical measure of the likelihood that an event will occur
• Value is between 0 and 1
Certain
Impossible
.5
1
0
10
The probability of an event E:
Assumes each of the outcomes in the sample space is equally likely to occur
Computing Probabilities
P( E ) =Number of event outcomes
Total number of possible outcomes in the sample space
11
Gaussian or Normal Distribution aka “Bell Curve”
• Most important probability distribution in the statistical analysis of experimental data.
• Data from many different types of processes follow a “normal” distribution:– Heights of American women– Returns from a diversified asset portfolio
• Even when the data do not follow a normal distribution, the normal distribution provides a good approximation
12
Gaussian or Normal Distribution aka “Bell Curve”
The Normal Distribution is specified by two parameters– The mean, – The standard deviation,
13
Standard Normal Distribution
=1
14
Characteristics of the Standard Normal Distribution
• Mean µ of 0 and standard deviation σ of 1.• It is symmetric about 0 (the mean, median
and the mode are the same).• The total area under the curve is equal to
one. One half of the total area under the curve is on either side of zero.
15
Area in the Tails of Distribution
• The total area under the curve that is more than 1.96 units away from zero is equal to 5%. Because the curve is symmetrical, there is 2.5% in each tail.
16
Normal Distribution
• 68% of observations lie within ± 1 std dev of mean
• 95% of observations lie within ± 2 std dev of mean
• 99% of observations lie within ± 3 std dev of mean
17
Study Design
18
Sample versus Population
• A population is a whole, and a sample is a fraction of the whole.
• A population is a collection of all the elements we are studying and about which we are trying to draw conclusions.
• A sample is a collection of some, but not all, of the elements of the population
19
Sample versus Population
20
Sample versus Population
• To make generalizations from a sample, it needs to be representative of the larger population from which it is taken.
• In the ideal scientific world, the individuals for the sample would be randomly selected. This requires that each member of the population has an equal chance of being selected each time a selection is made.
21
Randomisation
• To guard against any use of judgement or systematic arrangements i.e to avoid bias
• To provide a basis for the standard methods of statistical analysis such as significance tests
• Assures that treatment groups are balanced (on average) in all regards.
– i.e. balance occurs for known prognostic variables and for unknown or unrecorded variables
22
• Inferential statistics calculated from a clinical trial make an allowance for differences between patients and that this allowance will be correct on average if randomisation has been employed.
23
Hypothesis Testing
24
Hypothesis Testing
• Steps in hypothesis testing: state problem, define endpoint, formulating hypothesis, - choice of statistical test, decision rule, calculation, decision, and interpretation
• Statistical significance: types of errors, p-value, one-tail vs. two-tail tests, confidence intervals
25
Descriptive and inferential statistics
• Descriptive statistics is devoted to the summarization and description of data (population or sample) .
• Inferential statistics uses sample data to make an inference about a population .
26
Objectives and Hypotheses
• Objectives are questions that the trial was designed to answer
• Hypotheses are more specific than objectives and are amenable to explicit statistical evaluation
27
Examples of Objectives
• To determine the efficacy and safety of Product ABC in diabetic patients
• To evaluate the efficacy of Product DEF in the prevention of disease XYZ
• To demonstrate that images acquired with product GHI are comparable to images acquired with product JKL for the diagnosis of cancer
28
How do you measure the objectives?
• Endpoints need to be defined in order to measure the objectives of a study.
29
Endpoints: Examples:
• Primary Effectiveness Endpoint –
– Percentage of patients requiring intervention due to pain, where an intervention is defined as :
1. Change in pain medication
2. Early device removal
30
Endpoints: Examples:
• Primary Endpoint:
Percentage of patients with a reduction in pain:
– Reduction in the Brief Pain Inventory (BPI) worst pain scores of ≥ 2 points at 4 weeks over baseline.
31
Endpoints: Examples
• Patient Survival– Proportion of patients surviving two years post-
treatment– Average length of survival of patients post-
treatment
32
Objectives and Hypotheses
• Primary outcome measure
– greatest importance in the study
– used for sample size
– More than one primary outcome measure - multiplicity issues
33
Hypothesis Testing
• Null Hypothesis (H0)– Status Quo– Usually Hypothesis of no difference– Hypothesis to be questioned/disproved
• Alternate Hypothesis (HA)– Ultimate goal– Usually Hypothesis of difference– Hypothesis of interest
34
Decision Making
Type II Error
Decision
“Truth”
Type I Error
35
Decision Making
Not Suitable to be a Physician
Suitable to be a Physician
Don’t Accept to Medical School Type II Error
Accept to Medical School
Decision
“Truth”
Type I Error
36
Decision Making
Not Suitable to be a Teacher
Suitable to be a Teacher
Don’t Accept to Teacher Training School
Type II Error
Accept to Teacher Training School
Decision
“Truth”
Type I Error
37
Decision Making
Cancer Not Cancer
Positive
Type II Error
Negative
Test
“Truth”
Type I Error
38
Decision Making
New Therapy doesn’t work
New Therapy works
Not Positive Clinical Trial Type II Error
Positive Clinical Trial
Decision
“Truth”
Type I Error
39
Hypothesis Testing
If H0 is
True False
Decision
Fail to reject
No Error Type II Error (β)
Reject Type I Error (α) No Error
Type I Error – Society’s Risk
Type II Error – Sponsor’s Risk
40
Two Possible Errors of Hypothesis Testing
• The Type I Error occurs when we conclude from an experiment that a difference between groups exists when in truth it does not
rejecting H0 when H0 is in Fact True
• Investigators reject H0 and declare that a real effect exists when the chance of this decision being wrong is less than 5%.
41
Two Possible Errors of Hypothesis Testing
• The Type II Error occurs when we conclude that there is no difference between treatments when in truth there is a difference
fail to reject H0 when H0 is in fact False
42
Two Possible Errors of Hypothesis Testing
• In many circumstances a type I error is often regarded as more serious than a type II error.
Example:
H0: innocent vs.
H1: guiltyType I error = declaring an innocent man guiltyType II error = declaring a guilty man innocent
Presumption of innocence• Negative test result means "There is not enough
evidence to convict“ rather than "innocence"
43
Review of errors in hypothesis testing
• One will never know whether one has committed either error unless data are available for the entire population.
• The only thing we are able to do is to assign α and β as the probabilities of making either type of error.
• It is important to keep in mind the difference between the truth and the decision that is being made as a result of the experiment.
44
Hypothesis testing
• Null Hypothesis – No difference between Treatment and Control
• Type I error, alpha, , p-value– The probability of declaring a difference
between treatment and control groups even though one does not exist (ie treatment is not statistically different from control in this experiment)
– As this is “society’s risk” it is conventionally set at 0.05 (5%)
45
Hypothesis testing
• Type II error, beta, – The probability of not declaring a difference
between treatment and control groups even though one does exist (ie treatment is statistically different from control in this experiment)
– 1 - is the power of the study• Often set at 0.8 (80% power) however many
companies use 0.9 (90% power)• Underpowered studies have less probability of
showing a difference if one exists
46
Steps in Hypothesis Testing
1. Choose the null hypothesis (H0) that is to be tested
2. Choose an alternative hypothesis (HA) that is of interest
3. Select a test statistic, define the rejection region for decision making about when to reject H0
4. Draw a random sample by conducting a clinical trial
47
Steps in Hypothesis Testing
5. Calculate the test statistic and its corresponding p-value
6. Make conclusion according to the pre-determined rule specified in step 3
48
Hypothesis Testing - How to test a hypothesis
• Assume that we believe that we have a fair coin – equal chance of getting H or T when we flip the coin
• Test the hypothesis by carrying out an experiment.
49
Hypothesis Testing - How to test a hypothesis
• Flip the coin 4 times, each time is H. What is the likelihood of getting 4 H if this is a fair coin?
50
Remember the Binomial Probability Function
xnxX pp
xnx
nxP
1
!!
!
Let X be the event of getting a H
X ~ Binomial (n = 4, p=0.5)
In this case, we want x=4
= 0.0625 = 6.25%
51
• There is a 6.25% probability of getting 4 H even if this is a completely fair coin. If we were to include 4 T then there would be a 12.5% probability of getting 4 H or 4 T with a fair coin.
52
What happens if we increase the sample size?
What is the probability of getting 10 H if you flip a fair coin10 times?
xnxX pp
xnx
nxP
1
!!
!
X ~ Binomial (n = 10, p=0.5)
In this case, we want x=10
= 0.000977 =0 .098%
53
• There is a 0.098% probability of getting 10 H even if this is a completely fair coin. If we were to include 10 T then there would be a 0.2% probability of getting 10 H or 10 T with a fair coin tossed 10 times.
54
• How does this fit in with our decision making?• We hypothesised that this was a fair coin (50%
chance of H and 50% chance of T)• We carried out our experiment, flipped the coin 4
times and got 4 H. We calculated the probability of getting a result like this = 6.25% under H0 (fair coin)
55
Test of Significance and p-value
• Statistically significant:– Conclusion that the results of a study are
not likely to be due to chance alone. – Clinical significance is unrelated to
statistical significance
56
Test of Significance and p-value
p-value– Probability that the observed relationship (e.g.,
between variables) or a difference (e.g., between means) in a sample occurred by pure chance and that in the population from which the sample was drawn, no such relationship or differences exist.
– It is not the probability that given result is wrong.
57
Power and Sample Size
• Basic terms and concepts
• Study parameters: design, confidence level, power, acceptable error, effect size, variability
58
One day there was a fire in a wastebasket in the
Dean's office and in rushed a physicist, a chemist, and a statistician.
The physicist immediately starts to work on how much energy would have to be removed from the fire to stop the combustion.
The chemist works on which reagent would have to be added to the fire to prevent oxidation.
While they are doing this, the statistician is setting fires to all the other wastebaskets in the office.
"What are you doing?" they demanded.
"Well to solve the problem, obviously you need a large sample size" the statistician replies.
59
• Power Calculation – a guess masquerading as mathematics
Stephen Senn
Statistical Issues In Drug Development
60
Sample versus population
61
• Power is the probability of finding an effect when an effect actually exists.
Power = Probability {correctly reject H0} = 1 – P (Type II Error)
• To increase power we want to decrease the Type II error
Power
62
• In our experiment with the coin we observed that changing the sample size from 4 to 10 changed the probability of a Type I error
• If we had rejected the Fair Coin hypothesis when we got 4/4 H we would have made a Type I error = 6.25%.
• If we rejected the Fair Coin hypothesis when we got 10/10 H the Type I error was 0.098%
– Assuming the coin was a Fair Coin
63
• Power = 1 – Type II error (β)• Type I error – α• Meaningful effect size - δ• Variability - σ
Sample size
64
Sample Size Rules of Thumb
• If variability (σ) increases, then n (sample size) increases
• If effect size (δ) increases, then n decreases• If either α or β decreases, then n increases
65
Effect Size
• Effect size is the biologically significant difference e.g. size of the effect produced by a treatment.
• It is the generic term to describe the magnitude of the relationship between an independent variable and a dependent variable.
• Statistical significance demonstrates that the observed effect is unlikely to have occurred by chance, whereas effect size addresses the magnitude of the effect.
• Usually the symbol δ is used to refer to effect size.
66
Estimating the effect size
• Estimating δ is definitely one of the most challenging aspect of these calculations.
• Specifically, we are conducting the study because the knowledge regarding the treatment under study is incomplete.
• We end up making guesses about the δ for the proposed research based on knowledge that is by necessity incomplete.
67
Estimating the effect size
• When clinical questions the statistician as to how many subjects are needed, the cautious statistician replies that ‘ in order to do this, one needs the result from a study that is well designed, has ample powerand tests the same hypothesis’.
• Clinical however replies that if such study wereavailable, there would be no need to do the study.
68
Estimating the effect size
• A similar study may have been done previously
• A pilot study can be done to provide an initial estimate of the δ
• A meta-analysis of prior research can be used to provide estimates
69
Estimating the effect size
Specifying the Minimum Effect of Interest
• Specify the minimum δ that would be meaningful.
– Is a 2% increase meaningful? Depends on the context.
– Is a 50% reduction meaningful? 50% of what? Relative reduction or absolute reduction?
60% vs 30% or 60% vs 10%?
70
Type I error
• In most clinical trial settings the established standard is Type I error = 5%, i.e. There is a 5% chance that the null hypothesis was rejected even if it is true (i.e. no difference between treatments).
71
Hypothesis Testing – Normal Distribution
72
power
significancelevel
73
Two-tailed test
• Usually a 2 tailed test is performed with the risk of making a Type I error set at α/ 2 in each tail.
• If the null hypothesis is:
H0: Trt A = Trt B• and the alternative hypothesis is:
HA: Trt A ≠ Trt B
• Each of the two ways making a Type I error are equally undesirable.
74
One-tailed test
• Sometimes an investigator is only interested in a difference between treatments in one direction.
• This is appropriate when the scientific reasoning behind the experiment leads to a prediction in one direction.
• However FDA will not allow you to do One Tailed Tests at α = 0.05 but will use α = 0.025 even if the study is designed as a one-tailed test.
75
Calculation of sample size
• The calculation of sample size depends on the summary statistics chosen. The most common choices are
• Treatment mean
e.g. average blood pressure, average cholesterol, average days in hospital
• Treatment proportion
e.g. % of patients who die, recover, achieve some therapeutic goal or any defined state
76
Sample Size Calculations
• Janet Wittes – Sample Size Calculations for Randomized Controlled Trials. Epidemiologic Reviews Vol. 24, No. 1, 2002
“Most informed consent documents for randomized controlled trials implicitly or explicitly promise the prospective participant that the trial has a reasonable chance of answering a medically important question.”
77
• In order to fulfill that promise a clinical trial must be sized appropriately, have high enough power and long-enough follow up.
• Too many trials are designed with over optimistic assumptions about treatment effect, inappropriate assumptions about compliance or follow-up and inaccurate assumptions about the response in the control group.
78
Example
• A study is designed to determine the rate in the active treatment arm is significantly higher than the rate in the control arm.
• Compute the sample size with the following assumptions:
80% power
Rate of 35% in the active treatment arm
Rate of 15% in the control treatment arm
Level of significance is 5%.
79
How would the sample size change if we change study parameters?
Treat Cont Total
.35 .15 160
.35 .20 298
.30 .15 262
.30 .20 622
.25 .15 532
.25 .20 2262
Treat Cont Total
.35 .15 208
.35 .20 392
.30 .15 344
.30 .20 820
.25 .15 702
.25 .20 3002
Power = 80% Power = 90%
80
80% Power vs 90% Power?80% power for detecting a statistically meaningfuldifference is generally considered desirable,however 90% or higher is preferable.
Example: .35 vs. .15, 80% power …n=160 What if the rates are .34 vs. .16 instead? With n=160 patients the power is only 70%!
Notice that the difference between .30 and .20 is .1 and the difference between .25 and .15 is .1 but the sample size needed to detect the same difference (0.1) is different.
81
Power = success?
• Designing a study with 80% power does not imply that there is an 80% chance that the study will be a success.
• Other factors influence the success of a study:– Treatment does not work– Placebo is better than expected– Too much variability in the data
82
Implications of an under-powered study
• The power of the study provides us with a probability of rejecting the Null Hypothesis if the Null Hypothesis is incorrect.
• If we under-power the study then we have put patients at risk with a reduced chance of being able to reject the null hypothesis.
83
Sample size determination
• It is important to– Identify primary endpoint– Explicitly formulate hypothesis to be tested– Explicitly formulate statistical analysis of endpoint– Account for lost to follow-up, drop-outs,
compliance
84
Put science before statistics
• Studies should be designed to meet scientific goals.
• Although sometimes resources, time constraints and financial reasons may be issues, try not to estimate the number of patients who can be recruited into a trial and then ask the statistician to justify the sample size by calculating the "detectable" difference implied by the number of recruitable patients.
85
Put science before statistics (cont.)
• Clinical trials should be large enough to detect a clinically important difference between two treatments.
• The appropriate inputs to power/sample-size calculations are effect sizes that are deemed clinically important, based on careful considerations of the underlying scientific (not statistical) goals of the study.
• It is easy to get caught up in statistical significance; but statistical considerations are used to identify a plan that is effective in meeting scientific goals -- not the other way around.
86
Interim Analysis
87
What is interim analysis?
Interim analysis is analysis of the data at one ormore time points prior to the official close of thestudy with the intention of possibly terminating thestudy early.
88
Interim Analysis
• A Phase III study was designed with n=600. Recruitment is going slowly and the CEO asks you to do an interim analysis after 300 patients to see if the study can be stopped with a significant result.
• What are the problems with this approach?
89
1. Why was the study designed with 600 patients if 300 would be enough?
2. Is there any new evidence from outside the study that something has changed that would mean that 300 patients are enough?
3. What are the implications for the blinding, power and Type I error of the study?
90
Interim analysis
• You might need to continue a trial, even after you have accumulated substantial evidence that the new therapy is superior, because you need the extra data to accurately characterize side effects.
• Interim analyses should be pre-specified to be valid.
• The level of evidence that you need to stop a study early is higher than what is needed at the end of the study.
91
Interim analysis
Reasons for considering an interim analysis:• In a study where you expect the new therapy to be
better than placebo (for example, you might want to stop the study as soon as you have enough evidence that the new therapy is better).
• Ethical reasons (you want to minimize the number of subjects getting the placebo)
• Economic reasons (you don't want to spend extra money after enough evidence has been accumulated).
92
Interim analysis
• We need to be careful! There is no “free lunch” in statistics.
• If we carry out one or more interim analyses, the test at the end of the study can not be carried out at the 0.05 level. You have to “spend” some of your α level at each interim analysis leaving you with less at the end. This reduces the power at the final analysis unless you have designed the study appropriately.
93
Interim analysis
The two classic approaches to interim analysis are:
Pocock method and
O'Brien-Fleming method.
94
Interim Analysis
• Procedure– Clear details in the protocol– Identification of independent team for blinded study– Reporting and statistical analysis plan– Data management strategy
• Cleaning data, database lock, etc.
95
New Approaches to Clinical Trial Design
• Group Sequential Trial Design• Adaptive Trials
96
What have we learnt?
• Statistics is all about a way of thinking• If you don’t have uncertainty you don’t need
statistics• p-values are probability statements that tell you
something about your experiment • The sample size of any study depends on the
treatment effect you expect to see and the variability of the measurement in the sample
97
What haven’t we learnt?
• All the detailed theory and formulae that back up everything we have discussed
• How to be a statistician (for that you do have to go to graduate school)
• How to get the perfect answer each time we run a clinical trial:– We are working with patients not widgets and human
beings are incredibly complex
98
References
• ICH Guidelines E9, E3 and others• Statistical Issues in Drug Development – Stephen
Senn 1997 John Wiley & Sons• Janet Wittes – Sample Size Calculations for
Randomized Controlled Trials. Epidemiologic Reviews Vol. 24, No. 1, 2002