Download - 1 Study Design and Hypothesis Testing in Clinical Research Jonathan J. Shuster, Ph.D ([email protected]) Research Professor of Biostatistics Univ.

1

Study Design and Hypothesis Testing in Clinical Research

Jonathan J. Shuster, Ph.D ([email protected])

Research Professor of Biostatistics

Univ. of Florida, College of Medicine

2

Take-home Messages

• Rely on Evidence-Based Medicine. Conventional wisdom can easily lead us astray.

• The objective of Statistics is to make informed inferences about a population, based on a sample. It is imperative to quantify the uncertainty.

• The P-value is a quantity that allows us to infer something about whether a scientific hypothesis is false.

• Non-significant results are inconclusive • Randomization and intent-to-treat are vital

components in sound clinical research

3

4

Topics

1. Motivating Evidence-Based Clinical Studies

2. Objective of Statistics

3. Hypothesis testing and P-values

4. Real Examples and their lessons

5

6

1. Motivating Evidence-Based Medicine

• A coin is “loaded”, with a 70% chance of landing heads. One player picks a three outcome sequence (e.g. HTH), then the other picks a different sequence. Whoever’s sequence comes up first is the winner.

• Do you want to choose first, and if so, what sequence to you select?

7

Evidence-Based Medicine

• So you decided to go first and pick HHH, right?• OK, I pick THH.• HHH can only occur before THH if it is on the

first three flips. (If the first time HHH occurs is flips 6,7,8 then flip 5 is T, so flips 5,6,7 are THH, I win. (I make your first 2, my last 2, so I tend to stay ahead.)

• Your chance of winning=.73 =.343 (34.3%)

8

Evidence-Based Medicine

• Lesson from this example.

• Things are not always what they seem. You need to be a healthy skeptic.

• Reference: Shuster, J. A two-player coin game paradox in the classroom. American Statistician, 2006(Feb), vol 60, pp 68-70.

9

10

2. Objective of Statistics

• To make an inference about a defined target population from a representative sample.

• That is, for us, to start from a medical hypothesis about a medical condition, help design a study that can collect data to test the question, and draw conclusions. Quantifying the uncertainty about the inference is a key part.

11

2. Comment on This

• Should we compare treatment groups statistically in a randomized study with respect to baseline parameter (e.g. age, gender, ethnicity, blood pressure)?

12

2. Provenzano: Clin J Am Soc Nephrol 4, 386-93, 2009

• “Baseline characteristics were similar except for more men in the oral iron group compared with the ferumoxytol group (62.9% versus 50.0%, P 0.04). Mean baseline laboratory measures were similar between the two treatment groups.”

13

2. Comment on This

• For hypothesis driven research, should we test for normality before using a t-test, and if we reject try to transform the data?

14

Nissen Article

• JAMA. 2008;299(13):1561-1573. Comparison of Pioglitazone vs Glimepiride on Progression of Coronary Atherosclerosis in Patients With Type 2 Diabetes

• ‘For continuous variables with a normal distribution, the mean and 95% confidence intervals (CIs) are reported. For variables not normally distributed, median and interquartile ranges are reported and 95% CIs around median changes were computed using bootstrap resampling.’ (N=273 vs 270 in groups)

15

2. Testing Assumptions

Diagnostic Test

Passes Fails

16

17

3. Testing a Hypothesis (P-Value)

• Put a statement on Trial: “Null Hypothesis”

• ISIS #2 (International Sudden Infarct Study #2): The five week mortality rates for Streptokinase and Placebo are equivalent in patients with recent MIs

• Results: Strep(791/8592=9.2%) vs. Plac(1029/8595=12.0%)

18

3. P-Value

• P=3.8* 10-9

• If you replicated the experiment in a population where the null hypothesis was true, there is a 3.8 in a billion chance of seeing a difference at least as extreme in either direction (2-sided)

19

3. ISIS #2 Reference

• ISIS #2 Collaborative Group. (1988) Randomised trial of intravenous streptokinase, oral aspirin, both, or neither among 17,187 cases of acute myocardial infarction: ISIS 2, Lancet 2: 349-360.

20

3. P-Value and Proof by Contradiction

• What is the probability that if you replicated your experiment in a target population where your null hypothesis is true that you would see differences at least as extreme as what you actually observed. If this value (the p-value) is small it is evidence against this null hypothesis.

• Analogy is beyond a reasonable doubt. Science uses 5% arbitrarily as “reasonable” doubt in most cases.

21

3. Was this overkill in terms of sample size

• Suppose the results were 79/859 vs. 103/860 (same percentages of 9.2% vs. 12.0% but with one tenth the sample size).

• Now P=0.071 (7.1%), and would not be statistically significant. Would we be using this clot buster today? It was the biostatistician, Sir Richard Peto who determined this sample size.

22

3. ISIS #2:

• Any other questions about the study?

23

3. ISIS #2 Issues

• Who was watching the store. Accrual took 3.5 years and outcome was known for each patient within five weeks.

• Always report a sample size justification in your papers (Provenzano, slide 12, did not).

24

4. Real Example

• Coronary Drug Project

25

The Coronary Drug Project Research Group (1980)

• Influence of adherence to treatment and response of cholesterol on mortality in the Coronary Drug Project. NEJM 303: 1038-1041.

• Double blind randomized study of Clofibrate vs. Placebo in men who had prior MI.

26

Compliers vs. Not on Drug

Coronary Drug Project

0

5

10

15

20

25

C_Drug NC_Drug

5Yr

Mo

rtal

ity(

%)

C_Drug

NC_Drug

27

Compliers vs. Not

28

Drug vs. Placebo

29

Coronary Drug Project Take home Message

What can this study teach us about Clinical Studies?

30

Intent-to-Treat

• The gold standard for analyzing randomized clinical trials is Intent-to-treat. Patients are analyzed in the groups they were assigned to, irrespective of what they actually received.

31

32

4. Real UF Example:

• Effectiveness of Nesiritide on Dialysis or All-Cause Mortality in Patients Undergoing Cardiothoracic Surgery. Clinical Cardiology. 2006; Jan;29(1):18-24. with T. Beaver et. al.

• Motivation: Shands impression was that it was harmful and costly.

33

4. Nesiritide Example

• Study Null Hypothesis: 20 day death/dialysis rate in patients getting nesiritide within two days of surgery have the same death rate as “similar” patients not getting it.

• Design Suggestions?

34

4. Possible Designs (+/-)

• Observational: Historical Control (Compare period before drug) to period after drug started to be given to a sizable fraction (gap during ramping up of use). Must include all comers and use electronic chart review.

• Observational: Compare those getting to those not getting the drug.

• Randomized controlled prospective trial

35

4. Sources of Variation

• Within treatments, why might we not get the same result for every patient?

• Historical Control?

• Comparing concurrent nesiritide vs. not?

• Randomized prospective trial?

36

4. Sources of Bias (Confounders)

• Why might we see differences that might be totally unrelated to the treatment (nesiritide vs. not)?

• Historical Control?

• Comparing concurrent nesiritide vs. not?

• Randomized prospective trial?

37

4. Nesiritide: Propensity Scoring

• Actual Design: Compared Nesiritide vs. Not by Propensity Score Matching.

• Using 12 key covariates, we estimated the probability that a patient would get Nesiritide given these covariates. Then we matched the nesiritide patients to non-nesiritide patients for the propensity, and did a matched analysis.

38

4. Conclusions

• Nesiritide showed no significant difference (inconclusive) within CABG patients,

• Nesiritide showed promise in aneurysm subjects with baseline elevated SCR, but was inconclusive in other such patients.

• Run a future randomized double-blind trial in aneurisms with elevated SCR (Just completed and close to being in press with an inconclusive result.)

39

4. Conclusion (continued)

• Note that the Shands study data were very important in designing the randomized follow-up study, in terms of the number of subjects needed (power analysis).

40

Take-home Messages

• Rely on Evidence-Based Medicine. Conventional wisdom can easily lead us astray.

• The objective of Statistics is to make informed inferences about a population, based on a sample. It is imperative to quantify the uncertainty.

• The P-value is a quantity that allows us to infer something about whether a scientific hypothesis is false.

• Non-significant results are inconclusive • Randomization and intent-to-treat are vital

components in sound clinical research

41

Design One Together

• Medical Question: Does Caffeine Withdrawal cause Headaches?

42

Eligibility

43

Design

• What are the sources of variation besides caffeine consumption?

• How do we control caffeine consumption

• Should we use deception—hide purpose of study? Is this ethical?

44

Design

• Pre-Post?

• Double Blind Parallel Study?

• Double Blind Crossover Study?

45

Forensics for Irregularity

Phenylephrine

46

Phenylephrine Crossover Studies

47

Phenylephrine (Baseline NAR)Study (10 mg vs Placebo)

Std Dev CV=100SD/Mean

1 (N=16) (EB) 2.0 15.3%

2 (N=10) (EB) 0.9 6.7%

3 (N=16) 7.8 36.3%

4 (N=15) 9.5 35.6%

5 (N=16) 6.2 29.3%

6 (N=16) 9.8 40.4%

7 (N=14) 9.4 35.3%

48

How do we test for Data Irregularities?

• Background: Baseline NAR (Nasal Airway resistance) measures are typically xx.x (e.g. 20.2), and are always based on the mean of 10 observations (5 from each nostril).

• What null hypothesis can we test to find potential irregularities? What P-value might we use to declare significance?

49

Baseline Last Digit (3rd sign)

Study 1 Study 2

0:2 5

1:4 2

2:2 1

3:6 9

4:2 4

5:23 7

6:8 5

7:9 10

8:3 3

9:5 4

50

• Thank You!!

51

Coronary Drug ProjectCoronary Drug Project Data

Five Year Mortality (Clofibrate)

• Compliers: 15.0% (15.7%) (N=708)

• Non-Compliers: 24.6%(22.5%) (N=357)

• Compliers took >80% of their meds to death or to 5 years whichever was first.

• In () is 5 year mortality, adjusted for prognostic factors.

52


Five Year Mortality (Placebo)

• Compliers: 15.1% (16.4%) (N=1813)

• Non-Compliers: 28.2%(25.8%) (N=882)

• Compliers took >80% of their meds to death or to 5 years whichever was first.

• In () is 5 year mortality, adjusted for prognostic factors.

53


Five-year mortality (As randomized)

• Clofibrate: 20.0% (N=1103)

• Placebo: 20.9% (N=2789)

• NB: Compliance could not be assessed in a small number of patients.