Appropriate techniques of statistical analysis

Appropriate techniques of statistical analysis

Anil C Mathew PhD

Professor of Biostatistics &

General Secretary ISMS

PSG Institute of Medical Sciences and Research

Coimbatore 641 004

Types of studies

• Case study• Case series• Cross sectional studies• Case control study• Cohort study• Randomized controlled trials• Screening test evaluation

Data analysis-Case series

Measures of averages• Mean, Median, Mode• Length of stay for 5 patients

1,3,2,4,5

Mean length of stay 3 days

Median length of stay 3 days

Mode length of stay No mode

Which is the best average

Mean Median Mode

DBP 81 79 76

Height 180 180 180

SAL 7.5 7.6 8.1

Data analysis-case series

• Frequency distribution

RBC Frequency Relative frequency

5.95-7.95 1 0.029

7.95-9.95 8 0.229

9.95-11.95 14 0.400

11.95-13.95 9 0.257

13.95-15.95 2 0.057

15.95-17.95 1 0.029

Total 35 1.000

Design of Cohort Study

Time

Direction of inquiry

Population People without the disease

Exposed

Not Exposed

no disease

disease

no disease

disease

Is obesity associated with adverse pregnancy outcomes? Women with a Body Mass Index > 30 delivering singletons. Ref- University of Udine, Italy,2006

Preterm Birth No preterm birth

%

Obese 16 35T=51

31.4

Normal 46 487T=533

8.6

RR=3.65

Design of Case Control Study

Disease

No Disease

Not Exposed

Exposed

Not Exposed

Exposed

Results of a Case Control Study

Lung Cancer

(D+)

No Lung Cancer

(D-)

Totals

Exposed (E+) 80 a 30 b a + b

Non exposed (E-)

20 c 70 d c + d

Totals 100 a + c 100 b + d

Analysis of Case-control study

Odds ratio = a*d/b*c =80*70/30*20 =9.3

Data Analysis-Screening Test Evaluation-Whether the plasma levels of (Breast Carcinoma promoting factor) could be used to diagnose breast cancer?

Positive criterion of BCPF >150 units vs. Breast Biopsy (the gold standard)

D+ D-

BCPF Test

T+ 570 150 720

T- 30 850 880

600 1000 1600

TP = 570 FN = 30

FP = 150 TN = 850

Sensitivity = P (T+/D+)=570/600 = 95%

Specificity = P(T-/D-) = 850/1000 = 85%

False negative rate = 1 – sensitivity

False positive rate = 1 – specificity

Prevalence = P(D+) = 600/1600 = 38%

Positive predictive value = P (D+/T+) = 570/720 = 79%

Tradeoffs between sensitivity and specificity

When the consequences of missing a case are potentially grave

When a false positive diagnosis may lead to risky treatment

Data analysis-case series

Measures of variation

• Range• Standard deviation

Group 1 Group 2

29 25

30 30

31 35

Data analysis- Analytical studies

• Tests of significance

Case Study 1: Drug A and Drug B

• Aim: Efficacy of two drugs on lowering serum cholesterol levels

• Method: Drug A – 50 Patients

Drug B – 50 Patients

• Result: Average serum cholesterol level is lower in those receiving drug B

than

drug A at the end of 6 months

What is the Conclusion?

A) Drug B is superior to Drug A in lowering cholesterol levels :

Possible/Not possible

B) Drug B is not superior to Drug A, instead the difference may be due to chance:


C) It is not due to drug, but uncontrolled differences other than treatment between the sample of men receiving drug A and drug B account for the difference:


D) Drug A may have selectively administrated to patients whose serum cholesterol levels were more refractory to drug therapy:


Observed difference in a study can be due to

1) Random change

2) Biased comparison

3) Uncontrolled confounding variables

Solutions: A and B• Test of Significance – p value• P<0.05, means probability that the

difference is due to random chance is less than 5%

• P<0.01, means probability that the difference is due to random chance is less than 1%

• P value will not tell about the magnitude of the difference

Solutions: C and D

• Random allocation and compare the

baseline characteristics

Figure 1

Table 1-Baseline CharacteristicsCharacteristic Vitamin group

(n = 141) Placebo group

(n = 142)

Mean age ± SD, y 28.9 ± 6.4 29.8 ± 5.6

Smokers, n (%) 22 (15.6) 14 (9.9)

Mean body mass index ± SD, kg/m2 25.3 ± 6.0 25.6 ± 5.6

Mean blood pressure ± SD, mm HgSystolicDiastolic

112 ± 1567 ± 11

110 ± 1268 ± 10

Parity, n %)01 2 >2

91 (65)39 (28)

9 (6)2 (1)

87 (61)42 (30)

8 (6)5 (4)

Coexisting disease, n (%)Essential hypertensionLupus/antiphospholipid syndromeDiabetes

10 (7%)4 (3%) 2 (1%)

7 (5%) 1(1%) 3 (2%)

“t” TestHo: There is no difference in mean birth weight of children from HSE and LSE in the population

CR = t = | X1 - X2 |

SD 1 + 1

n1 n2

SD = (n1-1)SD12 + (n2-1)SD22

n1 + n2- 2

SD = 14*0.272 + 9*0.222 = 0.25

23

t = | 2.91 – 2.26| = 6.36

0.25 1 + 1

15 10

DF = n1 + n2 – 2

CAL > Table REJECT Ho

GENERAL STEPS IN HYPOTHESIS TESTING

1) State the hypothesis to be tested

2) Select a sample and collect data

3) Calculate the test statistics

4) Evaluate the evidence against the null hypothesis

5) State the conclusion

Commonly used statistical tests

• T test-compare two mean values• Analysis of variance-Compare more than

two mean values• Chi square test-Compare two proportions• Correlation coefficient-relationship of two

continuous variables

Data entry formatTreatment Age weight Diabetes Painscore-b Painscore-a Vomiting

1 21 50 1 9 6 0

1 24 53 0 10 9 0

1 25 55 1 9 9 1

1 28 50 0 10 6 1

1 29 60 0 10 5 0

1 20 65 0 10 8 0

0 26 60 0 9 9 0

0 25 90 1 9 9 1

0 24 80 1 9 9 1

0 28 89 0 10 8 1

0 22 86 1 10 9 1

0 22 45 0 10 9 0

Example t test

Body temperature c

Simple febrile seizureN = 25

Febrile without seizureN =25

P value

Mean 39.01 38.64 P<0.001

SD 0.56 0.45

Example-Analysis of variance

• Serum zinc level in simple febrile patients based on duration of seizure occurred

Duration min

n Mean SD P value

< 5 3 10.27 0.25 P <0.001

5 to 10 18 9.02 0.81

>10 4 6.90 0.98

Example Chi-square test

• Characteristics of patients in the two groups

Duration of fever (hour)

Simple febrile seizure

Febrile without seizure

P value

< 24 16 6 P<0.05

More than 24 9 19

Example Correlation

• We found a negative correlation between serum zinc level and simple febrile seizure event r = - 0.86 p <0.001

Type 1 and Type 2 Errors

Ho True Ho False / H1 True

Accept Ho

Reject Ho

Power = 1- β

Correct decision Type 2 error β = P (Type 2 error)

Type 1 errorα = P (Type 1 error)

Correct decision

Multivariate problem

• Main outcome

• Continuous variable-Linear regression• Dichotomous variable-Logistic regression

Bradford Hills Questions

• Introduction- Why did you start?• Methods-What did you do?• Results- What did you find?• Discussion- What does it mean?

How to begin writing?

• Data Tables Methods, Results Introduction , Discussion Abstract

Title, Key words, References

Thank you

Date post:	03-Jan-2016
Category:	Documents
Upload:	heidi-dixon
View:	44 times
Download:	0 times

Appropriate techniques of statistical analysis

Documents