Clinical Statistics for Non-Statisticians – Part II Kay M. Larholt, Sc.D. Vice President,...

Clinical Statistics for Non-Statisticians – Part II

Kay M. Larholt, Sc.D.

Vice President, Biometrics & Clinical Operations

Abt Bio-Pharma Solutions

2

Topics

1) Review of Statistical Concepts2) Hypothesis Testing3) Power and Sample Size4) Interim Analysis

3

Basic Statistical Concepts

4

Statistics

Per the American Heritage dictionary - “The mathematics of the collection, organization,

and interpretation of numerical data, especially the analysis of population characteristics by inference from sampling.”

• Two broad areas Descriptive – Science of summarizing data Inferential – Science of interpreting data in order

to make estimates, hypothesis testing, predictions, or decisions from the sample to target population.

5

Introduction to Clinical Statistics

• Statistics - The science of making decisions in the face of uncertainty

• Probability - The mathematics of uncertainty – The probability of an event is a measure of how

likely the event is to happen

6

Sample versus Population

7

Descriptive Statistics for Continuous Variables

Measures of central tendency Mean, Median, Mode

Measures of dispersion Range, Variance, Standard deviation Measures of relative standing Lower quartile (Q1) Upper quartile (Q3)

Interquartile range (IQR)

: range (IQR)

8

Basic Probability Concepts

Sample spaces and events

Simple probability

Joint probability

9

Probability

• Probability is the numerical measure of the likelihood that an event will occur

• Value is between 0 and 1

Certain

Impossible

.5

1

0

10

The probability of an event E:

Assumes each of the outcomes in the sample space is equally likely to occur

Computing Probabilities

P( E ) =Number of event outcomes

Total number of possible outcomes in the sample space

11

Gaussian or Normal Distribution aka “Bell Curve”

• Most important probability distribution in the statistical analysis of experimental data.

• Data from many different types of processes follow a “normal” distribution:– Heights of American women– Returns from a diversified asset portfolio

• Even when the data do not follow a normal distribution, the normal distribution provides a good approximation

12

Gaussian or Normal Distribution aka “Bell Curve”

The Normal Distribution is specified by two parameters– The mean, – The standard deviation,

13

Standard Normal Distribution

=1

14

Characteristics of the Standard Normal Distribution

• Mean µ of 0 and standard deviation σ of 1.• It is symmetric about 0 (the mean, median

and the mode are the same).• The total area under the curve is equal to

one. One half of the total area under the curve is on either side of zero.

15

Area in the Tails of Distribution

• The total area under the curve that is more than 1.96 units away from zero is equal to 5%. Because the curve is symmetrical, there is 2.5% in each tail.

16

Normal Distribution

• 68% of observations lie within ± 1 std dev of mean



17

Study Design

18


• A population is a whole, and a sample is a fraction of the whole.

• A population is a collection of all the elements we are studying and about which we are trying to draw conclusions.

• A sample is a collection of some, but not all, of the elements of the population

19


20


• To make generalizations from a sample, it needs to be representative of the larger population from which it is taken.

• In the ideal scientific world, the individuals for the sample would be randomly selected. This requires that each member of the population has an equal chance of being selected each time a selection is made.

21

Randomisation

• To guard against any use of judgement or systematic arrangements i.e to avoid bias

• To provide a basis for the standard methods of statistical analysis such as significance tests

• Assures that treatment groups are balanced (on average) in all regards.

– i.e. balance occurs for known prognostic variables and for unknown or unrecorded variables

22

• Inferential statistics calculated from a clinical trial make an allowance for differences between patients and that this allowance will be correct on average if randomisation has been employed.

23

Hypothesis Testing

24

Hypothesis Testing

• Steps in hypothesis testing: state problem, define endpoint, formulating hypothesis, - choice of statistical test, decision rule, calculation, decision, and interpretation

• Statistical significance: types of errors, p-value, one-tail vs. two-tail tests, confidence intervals

25

Descriptive and inferential statistics

• Descriptive statistics is devoted to the summarization and description of data (population or sample) .

• Inferential statistics uses sample data to make an inference about a population .

26

Objectives and Hypotheses

• Objectives are questions that the trial was designed to answer

• Hypotheses are more specific than objectives and are amenable to explicit statistical evaluation

27

Examples of Objectives

• To determine the efficacy and safety of Product ABC in diabetic patients

• To evaluate the efficacy of Product DEF in the prevention of disease XYZ

• To demonstrate that images acquired with product GHI are comparable to images acquired with product JKL for the diagnosis of cancer

28

How do you measure the objectives?

• Endpoints need to be defined in order to measure the objectives of a study.

29

Endpoints: Examples:

• Primary Effectiveness Endpoint –

– Percentage of patients requiring intervention due to pain, where an intervention is defined as :

1. Change in pain medication

2. Early device removal

30

Endpoints: Examples:

• Primary Endpoint:

Percentage of patients with a reduction in pain:

– Reduction in the Brief Pain Inventory (BPI) worst pain scores of ≥ 2 points at 4 weeks over baseline.

31

Endpoints: Examples

• Patient Survival– Proportion of patients surviving two years post-

treatment– Average length of survival of patients post-

treatment

32

Objectives and Hypotheses

• Primary outcome measure

– greatest importance in the study

– used for sample size

– More than one primary outcome measure - multiplicity issues

33

Hypothesis Testing

• Null Hypothesis (H0)– Status Quo– Usually Hypothesis of no difference– Hypothesis to be questioned/disproved

• Alternate Hypothesis (HA)– Ultimate goal– Usually Hypothesis of difference– Hypothesis of interest

34

Decision Making

Type II Error

Decision

“Truth”

Type I Error

35

Decision Making

Not Suitable to be a Physician

Suitable to be a Physician

Don’t Accept to Medical School Type II Error

Accept to Medical School

Decision

“Truth”

Type I Error

36

Decision Making

Not Suitable to be a Teacher

Suitable to be a Teacher

Don’t Accept to Teacher Training School

Type II Error

Accept to Teacher Training School

Decision

“Truth”

Type I Error

37

Decision Making

Cancer Not Cancer

Positive

Type II Error

Negative

Test

“Truth”

Type I Error

38

Decision Making

New Therapy doesn’t work

New Therapy works

Not Positive Clinical Trial Type II Error

Positive Clinical Trial

Decision

“Truth”

Type I Error

39

Hypothesis Testing

If H0 is

True False

Decision

Fail to reject

No Error Type II Error (β)

Reject Type I Error (α) No Error

Type I Error – Society’s Risk

Type II Error – Sponsor’s Risk

40

Two Possible Errors of Hypothesis Testing

• The Type I Error occurs when we conclude from an experiment that a difference between groups exists when in truth it does not

rejecting H0 when H0 is in Fact True

• Investigators reject H0 and declare that a real effect exists when the chance of this decision being wrong is less than 5%.

41


• The Type II Error occurs when we conclude that there is no difference between treatments when in truth there is a difference

fail to reject H0 when H0 is in fact False

42


• In many circumstances a type I error is often regarded as more serious than a type II error.

Example:

H0: innocent vs.

H1: guiltyType I error = declaring an innocent man guiltyType II error = declaring a guilty man innocent

Presumption of innocence• Negative test result means "There is not enough

evidence to convict“ rather than "innocence"

43

Review of errors in hypothesis testing

• One will never know whether one has committed either error unless data are available for the entire population.

• The only thing we are able to do is to assign α and β as the probabilities of making either type of error.

• It is important to keep in mind the difference between the truth and the decision that is being made as a result of the experiment.

44

Hypothesis testing

• Null Hypothesis – No difference between Treatment and Control

• Type I error, alpha, , p-value– The probability of declaring a difference

between treatment and control groups even though one does not exist (ie treatment is not statistically different from control in this experiment)

– As this is “society’s risk” it is conventionally set at 0.05 (5%)

45

Hypothesis testing

• Type II error, beta, – The probability of not declaring a difference

between treatment and control groups even though one does exist (ie treatment is statistically different from control in this experiment)

– 1 - is the power of the study• Often set at 0.8 (80% power) however many

companies use 0.9 (90% power)• Underpowered studies have less probability of

showing a difference if one exists

46

Steps in Hypothesis Testing

1. Choose the null hypothesis (H0) that is to be tested

2. Choose an alternative hypothesis (HA) that is of interest

3. Select a test statistic, define the rejection region for decision making about when to reject H0

4. Draw a random sample by conducting a clinical trial

47

Steps in Hypothesis Testing

5. Calculate the test statistic and its corresponding p-value

6. Make conclusion according to the pre-determined rule specified in step 3

48

Hypothesis Testing - How to test a hypothesis

• Assume that we believe that we have a fair coin – equal chance of getting H or T when we flip the coin

• Test the hypothesis by carrying out an experiment.

49

Hypothesis Testing - How to test a hypothesis

• Flip the coin 4 times, each time is H. What is the likelihood of getting 4 H if this is a fair coin?

50

Remember the Binomial Probability Function

xnxX pp

xnx

nxP

1

!!

!

Let X be the event of getting a H

X ~ Binomial (n = 4, p=0.5)

In this case, we want x=4

= 0.0625 = 6.25%

51

• There is a 6.25% probability of getting 4 H even if this is a completely fair coin. If we were to include 4 T then there would be a 12.5% probability of getting 4 H or 4 T with a fair coin.

52

What happens if we increase the sample size?

What is the probability of getting 10 H if you flip a fair coin10 times?

xnxX pp

xnx

nxP

1

!!

!

X ~ Binomial (n = 10, p=0.5)

In this case, we want x=10

= 0.000977 =0 .098%

53

• There is a 0.098% probability of getting 10 H even if this is a completely fair coin. If we were to include 10 T then there would be a 0.2% probability of getting 10 H or 10 T with a fair coin tossed 10 times.

54

• How does this fit in with our decision making?• We hypothesised that this was a fair coin (50%

chance of H and 50% chance of T)• We carried out our experiment, flipped the coin 4

times and got 4 H. We calculated the probability of getting a result like this = 6.25% under H0 (fair coin)

55

Test of Significance and p-value

• Statistically significant:– Conclusion that the results of a study are

not likely to be due to chance alone. – Clinical significance is unrelated to

statistical significance

56

Test of Significance and p-value

p-value– Probability that the observed relationship (e.g.,

between variables) or a difference (e.g., between means) in a sample occurred by pure chance and that in the population from which the sample was drawn, no such relationship or differences exist.

– It is not the probability that given result is wrong.

57

Power and Sample Size

• Basic terms and concepts

• Study parameters: design, confidence level, power, acceptable error, effect size, variability

58

One day there was a fire in a wastebasket in the

Dean's office and in rushed a physicist, a chemist, and a statistician.

The physicist immediately starts to work on how much energy would have to be removed from the fire to stop the combustion.

The chemist works on which reagent would have to be added to the fire to prevent oxidation.

While they are doing this, the statistician is setting fires to all the other wastebaskets in the office.

"What are you doing?" they demanded.

"Well to solve the problem, obviously you need a large sample size" the statistician replies.

59

• Power Calculation – a guess masquerading as mathematics

Stephen Senn

Statistical Issues In Drug Development

60

Sample versus population

61

• Power is the probability of finding an effect when an effect actually exists.

Power = Probability {correctly reject H0} = 1 – P (Type II Error)

• To increase power we want to decrease the Type II error

Power

62

• In our experiment with the coin we observed that changing the sample size from 4 to 10 changed the probability of a Type I error

• If we had rejected the Fair Coin hypothesis when we got 4/4 H we would have made a Type I error = 6.25%.

• If we rejected the Fair Coin hypothesis when we got 10/10 H the Type I error was 0.098%

– Assuming the coin was a Fair Coin

63

• Power = 1 – Type II error (β)• Type I error – α• Meaningful effect size - δ• Variability - σ

Sample size

64

Sample Size Rules of Thumb

• If variability (σ) increases, then n (sample size) increases

• If effect size (δ) increases, then n decreases• If either α or β decreases, then n increases

65

Effect Size

• Effect size is the biologically significant difference e.g. size of the effect produced by a treatment.

• It is the generic term to describe the magnitude of the relationship between an independent variable and a dependent variable.

• Statistical significance demonstrates that the observed effect is unlikely to have occurred by chance, whereas effect size addresses the magnitude of the effect.

• Usually the symbol δ is used to refer to effect size.

66

Estimating the effect size

• Estimating δ is definitely one of the most challenging aspect of these calculations.

• Specifically, we are conducting the study because the knowledge regarding the treatment under study is incomplete.

• We end up making guesses about the δ for the proposed research based on knowledge that is by necessity incomplete.

67


• When clinical questions the statistician as to how many subjects are needed, the cautious statistician replies that ‘ in order to do this, one needs the result from a study that is well designed, has ample powerand tests the same hypothesis’.

• Clinical however replies that if such study wereavailable, there would be no need to do the study.

68


• A similar study may have been done previously

• A pilot study can be done to provide an initial estimate of the δ

• A meta-analysis of prior research can be used to provide estimates

69


Specifying the Minimum Effect of Interest

• Specify the minimum δ that would be meaningful.

– Is a 2% increase meaningful? Depends on the context.

– Is a 50% reduction meaningful? 50% of what? Relative reduction or absolute reduction?

60% vs 30% or 60% vs 10%?

70

Type I error

• In most clinical trial settings the established standard is Type I error = 5%, i.e. There is a 5% chance that the null hypothesis was rejected even if it is true (i.e. no difference between treatments).

71

Hypothesis Testing – Normal Distribution

72

power

significancelevel

73

Two-tailed test

• Usually a 2 tailed test is performed with the risk of making a Type I error set at α/ 2 in each tail.

• If the null hypothesis is:

H0: Trt A = Trt B• and the alternative hypothesis is:

HA: Trt A ≠ Trt B

• Each of the two ways making a Type I error are equally undesirable.

74

One-tailed test

• Sometimes an investigator is only interested in a difference between treatments in one direction.

• This is appropriate when the scientific reasoning behind the experiment leads to a prediction in one direction.

• However FDA will not allow you to do One Tailed Tests at α = 0.05 but will use α = 0.025 even if the study is designed as a one-tailed test.

75

Calculation of sample size

• The calculation of sample size depends on the summary statistics chosen. The most common choices are

• Treatment mean

e.g. average blood pressure, average cholesterol, average days in hospital

• Treatment proportion

e.g. % of patients who die, recover, achieve some therapeutic goal or any defined state

76

Sample Size Calculations

• Janet Wittes – Sample Size Calculations for Randomized Controlled Trials. Epidemiologic Reviews Vol. 24, No. 1, 2002

“Most informed consent documents for randomized controlled trials implicitly or explicitly promise the prospective participant that the trial has a reasonable chance of answering a medically important question.”

77

• In order to fulfill that promise a clinical trial must be sized appropriately, have high enough power and long-enough follow up.

• Too many trials are designed with over optimistic assumptions about treatment effect, inappropriate assumptions about compliance or follow-up and inaccurate assumptions about the response in the control group.

78

Example

• A study is designed to determine the rate in the active treatment arm is significantly higher than the rate in the control arm.

• Compute the sample size with the following assumptions:

80% power

Rate of 35% in the active treatment arm

Rate of 15% in the control treatment arm

Level of significance is 5%.

79

How would the sample size change if we change study parameters?

Treat Cont Total

.35 .15 160

.35 .20 298

.30 .15 262

.30 .20 622

.25 .15 532

.25 .20 2262

Treat Cont Total

.35 .15 208

.35 .20 392

.30 .15 344

.30 .20 820

.25 .15 702

.25 .20 3002

Power = 80% Power = 90%

80

80% Power vs 90% Power?80% power for detecting a statistically meaningfuldifference is generally considered desirable,however 90% or higher is preferable.

Example: .35 vs. .15, 80% power …n=160 What if the rates are .34 vs. .16 instead? With n=160 patients the power is only 70%!

Notice that the difference between .30 and .20 is .1 and the difference between .25 and .15 is .1 but the sample size needed to detect the same difference (0.1) is different.

81

Power = success?

• Designing a study with 80% power does not imply that there is an 80% chance that the study will be a success.

• Other factors influence the success of a study:– Treatment does not work– Placebo is better than expected– Too much variability in the data

82

Implications of an under-powered study

• The power of the study provides us with a probability of rejecting the Null Hypothesis if the Null Hypothesis is incorrect.

• If we under-power the study then we have put patients at risk with a reduced chance of being able to reject the null hypothesis.

83

Sample size determination

• It is important to– Identify primary endpoint– Explicitly formulate hypothesis to be tested– Explicitly formulate statistical analysis of endpoint– Account for lost to follow-up, drop-outs,

compliance

84

Put science before statistics

• Studies should be designed to meet scientific goals.

• Although sometimes resources, time constraints and financial reasons may be issues, try not to estimate the number of patients who can be recruited into a trial and then ask the statistician to justify the sample size by calculating the "detectable" difference implied by the number of recruitable patients.

85

Put science before statistics (cont.)

• Clinical trials should be large enough to detect a clinically important difference between two treatments.

• The appropriate inputs to power/sample-size calculations are effect sizes that are deemed clinically important, based on careful considerations of the underlying scientific (not statistical) goals of the study.

• It is easy to get caught up in statistical significance; but statistical considerations are used to identify a plan that is effective in meeting scientific goals -- not the other way around.

86

Interim Analysis

87

What is interim analysis?

Interim analysis is analysis of the data at one ormore time points prior to the official close of thestudy with the intention of possibly terminating thestudy early.

88

Interim Analysis

• A Phase III study was designed with n=600. Recruitment is going slowly and the CEO asks you to do an interim analysis after 300 patients to see if the study can be stopped with a significant result.

• What are the problems with this approach?

89

1. Why was the study designed with 600 patients if 300 would be enough?

2. Is there any new evidence from outside the study that something has changed that would mean that 300 patients are enough?

3. What are the implications for the blinding, power and Type I error of the study?

90

Interim analysis

• You might need to continue a trial, even after you have accumulated substantial evidence that the new therapy is superior, because you need the extra data to accurately characterize side effects.

• Interim analyses should be pre-specified to be valid.

• The level of evidence that you need to stop a study early is higher than what is needed at the end of the study.

91

Interim analysis

Reasons for considering an interim analysis:• In a study where you expect the new therapy to be

better than placebo (for example, you might want to stop the study as soon as you have enough evidence that the new therapy is better).

• Ethical reasons (you want to minimize the number of subjects getting the placebo)

• Economic reasons (you don't want to spend extra money after enough evidence has been accumulated).

92

Interim analysis

• We need to be careful! There is no “free lunch” in statistics.

• If we carry out one or more interim analyses, the test at the end of the study can not be carried out at the 0.05 level. You have to “spend” some of your α level at each interim analysis leaving you with less at the end. This reduces the power at the final analysis unless you have designed the study appropriately.

93

Interim analysis

The two classic approaches to interim analysis are:

Pocock method and

O'Brien-Fleming method.

94

Interim Analysis

• Procedure– Clear details in the protocol– Identification of independent team for blinded study– Reporting and statistical analysis plan– Data management strategy

• Cleaning data, database lock, etc.

95

New Approaches to Clinical Trial Design

• Group Sequential Trial Design• Adaptive Trials

96

What have we learnt?

• Statistics is all about a way of thinking• If you don’t have uncertainty you don’t need

statistics• p-values are probability statements that tell you

something about your experiment • The sample size of any study depends on the

treatment effect you expect to see and the variability of the measurement in the sample

97

What haven’t we learnt?

• All the detailed theory and formulae that back up everything we have discussed

• How to be a statistician (for that you do have to go to graduate school)

• How to get the perfect answer each time we run a clinical trial:– We are working with patients not widgets and human

beings are incredibly complex

98

References

• ICH Guidelines E9, E3 and others• Statistical Issues in Drug Development – Stephen

Senn 1997 John Wiley & Sons• Janet Wittes – Sample Size Calculations for

Randomized Controlled Trials. Epidemiologic Reviews Vol. 24, No. 1, 2002

99

Thank You !

[email protected]

Date post:	14-Dec-2015
Category:	Documents
Upload:	randy-lamkins
View:	214 times
Download:	0 times

Clinical Statistics for Non-Statisticians – Part II Kay M. Larholt, Sc.D. Vice President,...

Documents