Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | heidi-dixon |
View: | 44 times |
Download: | 0 times |
Appropriate techniques of statistical analysis
Anil C Mathew PhD
Professor of Biostatistics &
General Secretary ISMS
PSG Institute of Medical Sciences and Research
Coimbatore 641 004
Types of studies
• Case study• Case series• Cross sectional studies• Case control study• Cohort study• Randomized controlled trials• Screening test evaluation
Data analysis-Case series
Measures of averages• Mean, Median, Mode• Length of stay for 5 patients
1,3,2,4,5
Mean length of stay 3 days
Median length of stay 3 days
Mode length of stay No mode
Which is the best average
Mean Median Mode
DBP 81 79 76
Height 180 180 180
SAL 7.5 7.6 8.1
Data analysis-case series
• Frequency distribution
RBC Frequency Relative frequency
5.95-7.95 1 0.029
7.95-9.95 8 0.229
9.95-11.95 14 0.400
11.95-13.95 9 0.257
13.95-15.95 2 0.057
15.95-17.95 1 0.029
Total 35 1.000
Design of Cohort Study
Time
Direction of inquiry
Population People without the disease
Exposed
Not Exposed
no disease
disease
no disease
disease
Is obesity associated with adverse pregnancy outcomes? Women with a Body Mass Index > 30 delivering singletons. Ref- University of Udine, Italy,2006
Preterm Birth No preterm birth
%
Obese 16 35T=51
31.4
Normal 46 487T=533
8.6
RR=3.65
Design of Case Control Study
Disease
No Disease
Not Exposed
Exposed
Not Exposed
Exposed
Results of a Case Control Study
Lung Cancer
(D+)
No Lung Cancer
(D-)
Totals
Exposed (E+) 80 a 30 b a + b
Non exposed (E-)
20 c 70 d c + d
Totals 100 a + c 100 b + d
Analysis of Case-control study
Odds ratio = a*d/b*c =80*70/30*20 =9.3
Data Analysis-Screening Test Evaluation-Whether the plasma levels of (Breast Carcinoma promoting factor) could be used to diagnose breast cancer?
Positive criterion of BCPF >150 units vs. Breast Biopsy (the gold standard)
D+ D-
BCPF Test
T+ 570 150 720
T- 30 850 880
600 1000 1600
TP = 570 FN = 30
FP = 150 TN = 850
Sensitivity = P (T+/D+)=570/600 = 95%
Specificity = P(T-/D-) = 850/1000 = 85%
False negative rate = 1 – sensitivity
False positive rate = 1 – specificity
Prevalence = P(D+) = 600/1600 = 38%
Positive predictive value = P (D+/T+) = 570/720 = 79%
Tradeoffs between sensitivity and specificity
When the consequences of missing a case are potentially grave
When a false positive diagnosis may lead to risky treatment
Data analysis-case series
Measures of variation
• Range• Standard deviation
Group 1 Group 2
29 25
30 30
31 35
Data analysis- Analytical studies
• Tests of significance
Case Study 1: Drug A and Drug B
• Aim: Efficacy of two drugs on lowering serum cholesterol levels
• Method: Drug A – 50 Patients
Drug B – 50 Patients
• Result: Average serum cholesterol level is lower in those receiving drug B
than
drug A at the end of 6 months
What is the Conclusion?
A) Drug B is superior to Drug A in lowering cholesterol levels :
Possible/Not possible
B) Drug B is not superior to Drug A, instead the difference may be due to chance:
Possible/Not possible
C) It is not due to drug, but uncontrolled differences other than treatment between the sample of men receiving drug A and drug B account for the difference:
Possible/Not possible
D) Drug A may have selectively administrated to patients whose serum cholesterol levels were more refractory to drug therapy:
Possible/Not possible
Observed difference in a study can be due to
1) Random change
2) Biased comparison
3) Uncontrolled confounding variables
Solutions: A and B• Test of Significance – p value• P<0.05, means probability that the
difference is due to random chance is less than 5%
• P<0.01, means probability that the difference is due to random chance is less than 1%
• P value will not tell about the magnitude of the difference
Solutions: C and D
• Random allocation and compare the
baseline characteristics
Figure 1
Table 1-Baseline CharacteristicsCharacteristic Vitamin group
(n = 141) Placebo group
(n = 142)
Mean age ± SD, y 28.9 ± 6.4 29.8 ± 5.6
Smokers, n (%) 22 (15.6) 14 (9.9)
Mean body mass index ± SD, kg/m2 25.3 ± 6.0 25.6 ± 5.6
Mean blood pressure ± SD, mm HgSystolicDiastolic
112 ± 1567 ± 11
110 ± 1268 ± 10
Parity, n %)01 2 >2
91 (65)39 (28)
9 (6)2 (1)
87 (61)42 (30)
8 (6)5 (4)
Coexisting disease, n (%)Essential hypertensionLupus/antiphospholipid syndromeDiabetes
10 (7%)4 (3%) 2 (1%)
7 (5%) 1(1%) 3 (2%)
“t” TestHo: There is no difference in mean birth weight of children from HSE and LSE in the population
CR = t = | X1 - X2 |
SD 1 + 1
n1 n2
SD = (n1-1)SD12 + (n2-1)SD22
n1 + n2- 2
SD = 14*0.272 + 9*0.222 = 0.25
23
t = | 2.91 – 2.26| = 6.36
0.25 1 + 1
15 10
DF = n1 + n2 – 2
CAL > Table REJECT Ho
GENERAL STEPS IN HYPOTHESIS TESTING
1) State the hypothesis to be tested
2) Select a sample and collect data
3) Calculate the test statistics
4) Evaluate the evidence against the null hypothesis
5) State the conclusion
Commonly used statistical tests
• T test-compare two mean values• Analysis of variance-Compare more than
two mean values• Chi square test-Compare two proportions• Correlation coefficient-relationship of two
continuous variables
Data entry formatTreatment Age weight Diabetes Painscore-b Painscore-a Vomiting
1 21 50 1 9 6 0
1 24 53 0 10 9 0
1 25 55 1 9 9 1
1 28 50 0 10 6 1
1 29 60 0 10 5 0
1 20 65 0 10 8 0
0 26 60 0 9 9 0
0 25 90 1 9 9 1
0 24 80 1 9 9 1
0 28 89 0 10 8 1
0 22 86 1 10 9 1
0 22 45 0 10 9 0
Example t test
Body temperature c
Simple febrile seizureN = 25
Febrile without seizureN =25
P value
Mean 39.01 38.64 P<0.001
SD 0.56 0.45
Example-Analysis of variance
• Serum zinc level in simple febrile patients based on duration of seizure occurred
Duration min
n Mean SD P value
< 5 3 10.27 0.25 P <0.001
5 to 10 18 9.02 0.81
>10 4 6.90 0.98
Example Chi-square test
• Characteristics of patients in the two groups
Duration of fever (hour)
Simple febrile seizure
Febrile without seizure
P value
< 24 16 6 P<0.05
More than 24 9 19
Example Correlation
• We found a negative correlation between serum zinc level and simple febrile seizure event r = - 0.86 p <0.001
Type 1 and Type 2 Errors
Ho True Ho False / H1 True
Accept Ho
Reject Ho
Power = 1- β
Correct decision Type 2 error β = P (Type 2 error)
Type 1 errorα = P (Type 1 error)
Correct decision
Multivariate problem
• Main outcome
• Continuous variable-Linear regression• Dichotomous variable-Logistic regression
Bradford Hills Questions
• Introduction- Why did you start?• Methods-What did you do?• Results- What did you find?• Discussion- What does it mean?
How to begin writing?
• Data Tables Methods, Results Introduction , Discussion Abstract
Title, Key words, References
Thank you