Date post: | 30-Dec-2015 |
Category: |
Documents |
Upload: | abel-spence |
View: | 26 times |
Download: | 1 times |
Sample Size Consideration in Clinical Research
John Kwagyan, PhD
Howard University College of Medicine
GHUCCTS
The science of collecting, organizing, analyzing, and interpreting data to assist in making effective decisions.
What Is Statistics?
The science of collecting, organizing, analyzing and interpreting data to assist in making effective decisions.
• Summarization of large quantities of data (Descriptive/Summary Statistics)
• Making decision from sample to population (Inferential Statistics)
What is Statistics?
Type of Statistics
• Descriptive/Summary Statistics
Methods for organizing, summarizing, and presenting data in an informative way.
• Inferential Statistics
Methods for estimation and testing population parameters?? based on sample information.
Well defined Large
Unique Characteristics -prevalence of a disease -variability of a measure
-Response rate of therapy -etc
Population
We are interested in estimating the population characteristics!!!
sample data
SAMPLEPOPULATION
We make inference about population characteristicsbased on sample data
Population Parameters
• Mean cholesterol level of obese individuals • Prevalence of hypertension in Blacks • Incidence of lung cancer among smokers• Risk of liver disease (hepatitis) associated with
drinking • Mortality rate of heart attach among men• Variability of heart rate in PTSD
CENTRAL IDEA: Estimate and Test for differences in parameters
Case Example
• Suppose that we plan to conduct a study comparing a treatment with a control.
• The response variable is systolic blood pressure (SBP), measured using a standard sphygmomanometer.
• The treatment is supposed to reduce blood pressure
• We set up a one-sided test
H0 : μT = μC versus H1 : μT <μC
where μT = mean SBP for the Trt group.
• The parameter Δ = μT −μC is the effect being tested
Case Example
• Suppose the goals of the study specify that we want to be able to detect a situation where the treatment mean is 15 mmHg lower than the control group.
• The required effect size is Δ= −15.
• We specify that such an effect be detected with 80% power (1-β= .80) when the significance level α = .05.
• Past experience with similar study-with similar sphygmomanometers and similar subjects-suggests that the data will be approximately normally distributed with a standard deviation of SD =20 mmHg.
• We plan to use a two-sample pooled t test with equal numbers n of subjects in each group.
Case Example
• Now we have all of the specifications needed for determining sample size using the power approach, and their values may be entered in suitable formulas, charts, or power-analysis software.
• We find that a sample size of n = 23 per group is needed to achieve the stated goals.
Basic Parameters and Concepts
• Study (Research) Hypotheses
• Type I Error Rate, , Significance level
• P-value
• Type II Error Rate, • Power, 1- • Effect Size, Δ
~size of clinically meaningful change.
HYPOTHESIS,HYPOTHESIS TESTING
Hypothesis
• HYPOTHESIS: a statement about a population characteristic/parameter
• HYPOTHESIS: a prediction/idea about what the examination of appropriate data will show about a characteristic
Hypothesis
• Null (Test) Hypothesis, H0
~ Hypothesis to be questioned (disproved).
~ Hypothesis of no real (true) difference
• Alternative (Research) Hypothesis, HA
~ Hypothesis investigator wishes to establish.
~ Hypothesis of a real (true) difference
Example• Research Hypothesis: Combination therapy is
effective?? in the treatment of hypertension.
• Effective ~ considerable reduction in BP (1) ~ controls BP increases (2) • Parameter ~ Mean percent reduction in BP (1) ~ Proportion controlled (2)
• Test Hypothesis: The combination therapy is not effective.
Goal
• Goal is to TEST the Null Hypothesis and decide whether to REJECT IT in favor of the Alternative, or FAIL TO REJECT it.
Test of Hypothesis
One-Tailed Tests
• A test is one-tailed when the research hypothesis, HA , specifies a direction:
HA: The incidence of lung cancer among smokers is higher than nonsmokers
Two-Tailed Tests
• A test is two-tailed when no direction is specified in the research hypothesis HA.
HA: The stress level in DC is different from NY.
Test & Decision Test H0 : no difference in effectiveness
Possible Outcomes
Null Hypothesis could be true (i.e., no difference)
Null Hypothesis could be false (i.e., difference)
Decision Making
Investigator rejects the null hypothesis
Investigator fails to rejects the null hypothesis
Test & Decision
Test: H0 ________________________________________________________________
True (not effective) False (Effective)______________________________________________________________________________________________________
Decision
Accept No Error Type II Error
Reject Type I Error No Error _____________________________________________________________________________________________________
Test H0: therapy is not effective
Drug Trial
H0 ________________________________________________________________
True( Not Effective) False (Effective)__________________________________________________________________________________________
Decision
Accept No Error Type II Error
Reject Type I Error No Error
H0: “Miracle” drug is not effective
TI: Deny a patient a “known therapy” in favor of an ineffective “miracle drug”
TII: Deny a patient a better drug in favor of a less effective “known therapy
Test & Decision
Test H0 ________________________________________________________________
True False __________________________________________________________________________________________________
Decision
Accept No Error Type II Error =P(Type II Error)
Reject Type I Error No Error
=P(Type I Error )_____________________________________________________________________________________________________
Is this Familiar !!!!!• All tests were performed two-sided at the
5% level of significance.
• Significance was defined as a value of p < 0.05.
• A value of p < 0.05 was considered statistically significant.
• ALL YOU ARE DOING IS CONTROLLING THE TYPE I ERROR RATE
Definitions
= P{Type I Error }
= P{rejecting H0|H0 is true}
= P{rejecting the truth}
~ is called the Type I Error Rate ~ is called the Significance Level
Definitions
= P{Type II error}
= P{fail to reject H0|H0 is false}
= P{accepting a fallacy }
~ called the Type II Error Rate
1- ~ called Power of study
Definitions
= P{fail to reject H0|H0 is false}
1- = P{reject H0 | H0 is false}
= P{ accept HA| HA is true}
1- ~ is called Power of study
Power ~ quantifies the ability of the study to detect a difference, if any
Definitions: P-value
~ probability of having observed our data (i.e. observed a difference) when the null hypothesis is true???.
~ probability of the data having arisen by chance when the null hypothesis is true.
Definitions: P-value
~ the smaller the p-value, the weaker the null hypothesis
~ the smaller the p-value, the stronger the alternative hypothesis
How do we evaluate this probability?
By calculating a test statistic
Test Statistic
Most test statistic have the form:
• Test Statistic
= observed value – expected value
standard error of observed value
-a value which we can compare with a known distribution of what we expect when the null hypothesis is true
Common Test Statistic
• T-test• F-test• Chi-square (χ2) test
How do you choose the appropriate statistic???
Statistical Significance
• Accepted values in clinical research
p 0.05 significant P 0.01 highly significant
In Genetic (Linkage) Analysis:
• Lod Score =3.0 ~ significant• Lod Score =3.0 ~ =0.0001
SAMPLE SIZE CONSIDERATION
Population And Sample
Target Population
Study Sample
Study Population
IneligibleDefine Eligibility Criteria
Eligibility Criteria!!!!
~ consist of inclusion criteria exclusion criteria
• Inclusion criteria is used to outline the intended study population
• Exclusion criteria is used to fine-tune the intended population by removing expected sources of variation
Eligibility Criteria!!!!
• Inclusion Criteria
Female
Age ≥ 21 years
BMI ≥ 25 kgm-2
REDUNDANT!!!!
• Exclusion Criteria
Male
Age < 21 years
BMI < 25 kgm-2
Eligibility Criteria!!!!• Inclusion Criteria Exclusion Criteria i. Female i. Male
ii. Age > 21 yrs ii. Age < 21 iii. BMI ≥ 25kgm-2 iii. BMI < 25
• Exclusion Criteria i. Pregnant or breast feeding ii. History of …….
iii. Any other condition in the opinion of the investigator (s) that would make the subject unsuitable for the study
Why Sample Size ?
• Requirement ( Clinical Research Protocol, Funding Agencies, etc) in many grant application
• Budgetary Constraints
• Provide Statistical Justification
• Inference (decision) is based on it
How Much Data Do I Need?
• How big a difference are you trying to detect? Effect Size
- Absolute difference ~ say 5mmHg drop BP
- Relative difference ~ 5% drop in BP
• How much variation is there in the outcome?
• How certain do you want to be that you will detect the difference of interest ?
Eliciting effect size
• How big a difference would be of clinical importance for you?
Some responses I get:• Huh??• What do you mean?• What do you recommend? • Any difference at all would be important
Finding the right variance
• Based on experience
Range of values
Stories behind extreme values
Sources of variations• Use of historical data• Conduct a pilot study.
What if u have imposed sample size
• Sometimes, a proposal comes with imposed sample size.
• Sample size is but one of several quality characteristics of a study
• If n is held fixed, we simply need to focus on other characteristics, such as effect size.
Determination of Sample Size
Depends on:
1. Outcome measure (Data Endpoint)2. Study Design
Types of Data Endpoints
• Continuous Data - BP, BMI, TC, LDL, Blood Sugar
• Categorical Data - Hypertension, Obese, Dyslipidemia, Diabetes
• Count Data
0, 1, 2, 3 - No of risk factors
• Survival (Time-to-Event) Data
- time-to-cardiac event, time-to-death
Putting All Together(Power Analysis)
1- = P{ accept HA|HA is true)
=Func (, 2(n), )
Power
Certainty Variability Effect Size
Sample size
Crude SS Estimate for Means 2-Sample Test for Means (2-sided)
16s2
2n = , =0.05, =0.2
sd n
10 5 48
10 10 16
15 5 144
15 10 36
Power = 80%
Sample Size Formula2-Sample Test for Means (2-sided)
162
2n = , =0.05, =0.2
sd n
10 5 48
10 10 16
15 5 144
15 10 36
Power = 80%
Sample Size
• A larger sample size is needed to detect the smallest meaningful difference.
• A larger sample size is needed when there is much variability in the population
• A larger sample size is required to increase the power of a study.
Other Approaches
There are several approaches to sample size.
• One can specify the desired width of a confidence interval and determine the sample size that achieves that goal.
• A Bayesian approach can be used where we optimize some utility function-perhaps one that involves both precision of estimation and cost.
Avoid “canned” effect sizes.- The T-shirt effect sizes
• This is an elaborate way to arrive at the same sample size that has been used in past social science studies of large, medium, and small size.
• The method uses a standardized effect size as the goal.
• Think about it: for a "medium" effect size, you'll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your population.
• Important considerations are being ignored here. "Medium" is definitely not the message!
Cohen Effect Sizes????
What is small, medium, or large effect sizes for:
• Odds Ratio
• Hazard Ratios
• Repeated Measures ANOVAs
• Regression Models
• Multivariate Models
• Sensitivity Analysis
• Adaptive Designs
Post Hoc Power Analyses
• In contrast to a priori power analyses, post hoc power analyses often make sense after a study has already been conducted.
Take Away Points• Use power prospectively for planning future
studies. • Put science before statistics. The appropriate
inputs to power/sample-size calculations should be based on careful considerations of the underlying scientific (not statistical!!) goals of the study.
• T-shirt Effect Sizes- If at all possible avoid using “canned” effect sizes
References
1. Lenth, R. V. (2001), ``Some Practical Guidelines for Effective Sample Size Determination,'' The American Statistician, 55, 187-193.
2. Hoenig, John M. and Heisey, Dennis M. (2001), ``The Abuse of Power: The Pervasive Fallacy of Power Calculations for Data Analysis,'' The American Statistician, 55, 19-24