Type I and II errors

transcript

Ana Jerončić

P value is a short form for probability value

P=0.07=7% There is 7% probability that we will incounter such

or more extreme differences by chance. OR In case when no real effect exsists if we repeat

experiment a 100 times, such difference (or more extreme) would be found in 7 experiments.

What is a p value?

P value is a short form for probability value

P=0.99=99% There is 99% probability that we will incounter such

or even more extreme differences by chance. OR In case when no real effect exsists if we repeat

experiment a 100 times, such difference (or more extreme) would be found in 99 experiments.

What is a p value?

What is a significance level α?

Interpretation of P-value (0.05)

P>=0.05

Significant difference between the treatmentsNull hypothesis is rejected, alternative is accepted

P<0.05 5%

No difference between the treatments (observed difference having happened by chance)Null hypothesis is accepted

The threshold of P-value that determines when to reject a null hypothesis

It refers to the chance that you are willing to take in being wrong ie. in concluding that there is a substantial difference when there is none.

The most common significance level: α=0.05=5%

We want to risk that only 5% of our predictions are wrong.

= Alpha=0.05Out of 40 decisions => we could expect that 2 are wrong

α is also called Type I error The probability of erroneously rejecting the

null hypothesis

Consequence of type I error Put an useless medicine into the market!

What is (Type I error)?

Watch out for…

The sample size calculation was based on the primary outcome, BMI or BMI z-score, which was assumed to have a SD of 1.5, or 1.0 respectively. To have 80% power to detect a difference in mean BMI of 0.38, or mean BMI z-score of 0.25 units between the groups at age 2 at the two sided 5% significance level, we needed a sample size of 252 per group

Example from the literatureEffectiveness of a home-based early intervention on children’s BMI at age two years: randomised controlled trial.” BMJ 2012;344:e3732

The sample size calculation was based on the primary outcome, BMI or BMI z-score, which was assumed to have a SD of 1.5, or 1.0 respectively. To have 80% power to detect a difference in mean BMI of 0.38, or mean BMI z-score of 0.25 units between the groups at age 2 at the two sided 5% significance level, we needed a sample size of 252 per group

Example from the literatureEffectiveness of a home-based early intervention on children’s BMI at age two years: randomised controlled trial.” BMJ 2012;344:e3732

…. The higher-degree RR was deemed significantly better if the P-value for the higher-degree model was 0.01.

Example from the literature Quantitative Trait Locus Analysis of Longitudinal Quantitative Trait Dana in Complex Pedigrees. Macgregor, S, Knott, S et al. Genetics 171, 1365-1376, 2005

…. The higher-degree RR was deemed significantly better if the P-value for the higher-degree model was 0.01.

Example from the literature Quantitative Trait Locus Analysis of Longitudinal Quantitative Trait Dana in Complex Pedigrees. Macgregor, S, Knott, S et al. Genetics 171, 1365-1376, 2005

Hippocampal gray matter volume change was assessed statistically using a two-tailed t contrast with a significance level set to 0.05 (corrected for multiple comparisons within the ROI). Uncorrected exploratory full-brain statistics were also performed with two-tailed t contrasts at a significance level set to 0.001.

Example: The Brain-Derived Neurotrophic Factor val66met Polymorphism and Variation in Human Cortical MorphologyLukas Pezawas, Beth A. Verchinski, et al.

Hippocampal gray matter volume change was assessed statistically using a two-tailed t contrast with a significance level set to 0.05 (corrected for multiple comparisons within the ROI). Uncorrected exploratory full-brain statistics were also performed with two-tailed t contrasts at a significance level set to 0.001.

Example: The Brain-Derived Neurotrophic Factor val66met Polymorphism and Variation in Human Cortical MorphologyLukas Pezawas, Beth A. Verchinski, et al.

The probability of erroneously failing to reject the null hypothesis.

The most common β = 0.2

Consequence of type I error Keep a good medicine away from patients!

What is (Type II error)?

Power quantifies the ability of the study to find true differences.

Power = 1- =P (accept H1 given H1 is true) the probability of correctly identifing H1

(correctly identify a better medicine)

If β=0.2, power=0.8=80%

What is Power ?

Example

Studies with the drug X have shown that usage of drug X induces very serious side effects. Therefore drug X was with-drawn from the market.

New alternative drug Y was examined and the reduction in harmful effects, compared to drug X, was observed.

What is the significance level that you will use to evaluate the significance of reduction in harmful effects of drug Y, compared to drug X?

Example

The effect of alcohol on the driver’s reaction time was investigated on a simple random sample. Observed reaction times, before and after the alcohol intake, have shown the increase in average reaction time after the alcohol intake.

What is the significance level that you will use to evaluate the significance of increase in reaction time?

1. the medical and practical consequences of the two kinds of errors

2. the desired impact of the results

The choice of and depends on:

< (the most common approach =0.05 and =0.2) ie. if the control treatment is already widely used and

is known to be reasonably safe and effective, whereas the test treatment is new, costly, or produces serious side effects.

> ie. if there is no established control treatment and

test treatment is relatively inexpensive, easy to apply and is not known to have any serious side effects.

The choice of and

Choices other than =0.05 and =0.2 =0.10 and =0.2 for preliminary trials that

are likely to be replicated.

=0.01 and =0.05 for the trial that are unlikely replicated.

The choice of and

A company who used to develop a clot-busting product in the indication of occluded central venous catheter - Nuvelo Pharmaceuticals was sewed by their investors for setting extraordinarily small significance level α=0.00125

http://onbiostatistics.blogspot.com/2010/01/significant-level-of-000125.html

Significance level at court

Power calculation

Power quantifies the ability of the study to find true differences.

Power = 1- =P (accept H1 given H1 is true)

the probability of correctly identifing H1

(correctly identify a better medicine)

If β=0.2, power=0.8=80%

What is the power of the study?

is the minimum difference between groups that is judged to be clinically important

1. Minimal effect which has clinical relevance in the management of patients

or2. The anticipated effect of the new treatment

What is delta ()?

Power Depends on 4 elements:

The real difference between the two medicines, Big big power

The variation among individuals, Small big power

The sample size, n Large n big power

Type I error, Large big power

Power Calculation(assuming we compare two medicines)

Sample size

The power 1- N

Sample size and , , and

“How large a sample do I need?”-Very commonly asked -Important question-Answer not so simple

Statistical power calculations-Use statistical software or

graphical method-Depends on data type

Sample Size

Braga L, Byrne R, Lorenzo A et al. Methodological quality assessment of RCTs in hypospadias literature. 23rd Annual ESPU Congress - Zurich, Switzerland - 2012

Analyses showed that publication after 2006 (p<0.01), RCT sample size >50 (p=0.03), significance level α=0.01 (p<0.01) and blinding of outcome assessor (p<0.01) were significantly associated with better quality of RCTs.

Interpret the results

Hypospadias is a birth defect of the urethra in males

Weir R. Randomised controlled trial to meta-analysis ratio: a replyfrom a group producing systematic reviews. 2007. The New Zel Med

Journal 120, 1-3

Antman et al showed that recommendations for routine use of thrombolytic therapy first appeared in 1987, 14 years after a statistically significant reduction in mortality was apparent on a subsequent cumulative meta-analysis of all relevant RCTs.

At the first time a significant reduction in mortality was apparent in the cumulative meta-analysis of IV streptokinase therapy (1973, p=0.01), 2432 patients had been randomised in eight small trials. The results of a further 25 studies (34,542 additionalpatients) published before routine recommendation of thrombolytic therapy, reduced the significance level to p=0.001 in 1979 and p=0.0001 in 1986.

Interpret the results

Based on the results presented in the abstract –

write down conclusion section

CONCLUSION: Overall advice to use steam inhalation, or

ibuprofen rather than paracetamol, does not help control symptoms in patients with acute respiratory tract infections and must be balanced against the possible progression of symptoms during the next month for a minority of patients. Advice to use ibuprofen might help short term control of symptoms in those with chest infections and in children.

Little P, Moore M, et al. Ibuprofen, paracetamol, and steam for patients with respiratory tract infections in primary care: pragmatic randomised factorial trial. BMJ 2013 Oct 25;347:f6041

CONCLUSION: Our findings suggest the presence of

heterogeneity in the associations between individual fruit consumption and risk of type 2 diabetes. Greater consumption of specific whole fruits, particularly blueberries, grapes, and apples, is significantly associated with a lower risk of type 2 diabetes, whereas greater consumption of fruit juice is associated with a higher risk.

Muraki I, Imamura F, et al. Fruit consumption and risk of type 2 diabetes: results from three prospective longitudinal cohort studies. BMJ. 2013 Aug 28;347:f5001

Conclusions Although limited in quantity, existing randomised trial evidence on exercise interventions suggests that exercise and many drug interventions are often potentially similar in terms of their mortality benefits in the secondary prevention of coronary heart disease, rehabilitation after stroke, treatment of heart failure, and prevention of diabetes.

Huseyin Naci, John P A Ioannidis et al. Comparative effectiveness of exercise and drug interventions on mortality outcomes: metaepidemiological study. BMJ 2013; 347

Sanjay Basu et al. Palm oil taxes and cardiovascular disease mortality in India: economic-epidemiologic model, BMJ. 2013 Oct 22;347;

Conclusions Curtailing palm oil intake through taxation may modestly reduce hyperlipidemia and cardiovascular mortality, but with potential distributional consequences differentially benefiting male and urban populations, as well as affecting food security.

Type I and II errors

Documents