Date post: | 13-Nov-2015 |
Category: |
Documents |
Upload: | aiman-arifin |
View: | 232 times |
Download: | 4 times |
Describing*&*Presen-ng*Data*
Nooriah*Mohamed*Salleh*MBBS(Malaya),*MPH(Tulane),*DrPH(Tulane)*
Recap:*Variables*&*Data* Variables*are*labels.*The*value*of*variables*can*vary**
* Example:**
age,** Gender* Occupa-on* Ethnicity**
*
Data*is*the*value*you*get*from*observa-on*thru*measuring,*coun-ng*etc.*
* Example:****#*of*pa-ent****weight*of*baby*(kg)****#*of*doctors******
Recap:*Variables*&*Data*
Variable Data
Age (of mother) 23 years old
Weight (of baby) 3.0 kg
Gender male
Ethnic group Malay
Occupation of mother
Housewife
Types*of*Data*
Data
Categorical Data
Numerical Data
Nominal Data
Ordinal Data
Discrete Data
Continuous Data
Nominal*categorical*data*
It*can*be*allocated*into*one*of*a*number*of*categories.*
Has*no*meaningful*order* Example:*
Blood*type*(A,B,*AB,O),*sex*(M,*F)*
Ordinal*categorical*data*
It*can*be*allocated*to*one*of*a*number*of*categories*arranged*in*a*meaningful*order.*
Example:*Very*sa-sed,*sa-sed,*neutral,*unsa-sed,*very*unsa-sed.**
Grade*I,*Grade*II,*Grade*III*(Tumor*Grading)*Moderate,*Severe,*Very*Severe*(Pain)*
Discrete*numerical*data*
Countable*variables.* Integer*form*(discrete)* Numbers*of*things* Example:*
number*of*pregnancies*Number*of*pa-ents*Number*of*teeth***
Con-nuous*numerical*data*
Measurable*variables.* Round*to*the*nearest*integer* Example:*
Weight*(Kg)*Height*(metre)*BP*(mmHg)*Age*(years)*Dura-on*of*surgery*(hour)*
Describing*data*with*tables*
1)*frequency*table** 2)*rela-ve*and*cumula-ve*frequency* 3)*grouped*frequency* 4)*open[*ended*groups* 5)*cross[tabula-on**
1.*Frequency*table*
A*picture*of*the*frequency*distribu-ons*
Mortality (%) Tally No. of ICU patients
11.2-15.1 1, 1, 1, 1, 1, 1, 1, 1, 1 9
15.2-20.1 1, 1, 1, 1, 1, 1, 1, 1 8
20.2-25.1 1, 1, 1, 1, 1 5
25.2-30.1 1, 1, 1 3
30.2-35.1 1, 1
variables frequency
2.*Rela-ve*frequency,*cumula-ve*frequency*
Rela-ve*frequency:*percentage*of*the*total*Cumula-ve*frequency:**
parity No.of women Percentage (relative frequency)
Cumulative percentage
0 5 12.5 12.5
1 6 15 27.5
2 14 35 62.5
3 10 25 87.5
4 3 7.5 95
7 1 2.5 97.5
8 1 25 100
3.*Grouped*frequency* Grouped*frequency:*for*con-nuous*metric*data*
Birthweight No. of infants 2700-2999 2 3000-3299 3 3300-3599 9 3600-3899 9 3900-4199 4 4200-4499 3
A group width of
300g
The class lower limit
The class upper limit
Table*for*display*of*Data*
Type of data Table
Ordinal numerical Discrete
Frequency table
Continuous numerical data
Grouped Frequency
4.*Open[ended*group*
One*or*two*values*which*are*called*outliers,*are*a*long*way*from*the*general*mass*of*the*data.*
Use**or***
5)*Cross[tabula-on*
2 or fewer children
Breast lump diagnosis
Totals
Malignant Benign
Yes 4 21 25
No 4 11 15
Totals 8 32 40
Association between breast lump and parity
3.*Describing*data*with*charts*
1.Nominal*data:* (1)*the*pie*chart* (2)*the*simple*bar*chart* (3)*the*cluster*bar*chart* (4)*the*stacked*bar*chart*
2.*Ordinal*data:* (1)*the*pie*chart* (2)*the*bar*chart* (3)*the*dotplot***3.*Discrete*numerical*data*
4.*Con-nuous*numerical*data*[*histogram*
*5.*Cumula-ve*ordinal*or*discrete*
data*[*step*chart**6.*Cumula-ve*con-nuous*data*[*
cumula-ve*frequency*or*ogive**7.*Time*based*data:*-meseries*
chart**
1.1.*Pie*chart*
Pie chart: Hair color of children reciving d-phenothrin
blonde, 18, 18%
brown, 55, 57%
red, 4, 4%
dark , 21, 21%
blonde brown red dark
4[5*categories* Describe*1*variable* Start*at*0*in*the*same*order*as*the*table*
1.2*Simple*bar*chart*
Bar Chart: Hair colar of the chidren receiving d-phenothrin
18
55
4
21
0
10
20
30
40
50
60
blonde brown red dark
Same*widths,*equal*spaces*between*bars*
PharmacistsNursesDoctorsDentists
6000
5000
4000
3000
2000
1000
0
Profession
Num
ber
of w
orke
rs
Bar chart for number of health professionals 1.3**Clustered*bar*chart*
Cluster percetage bar chart of the hair color receiving Malathion and d-
phenothrin
16 18
5256
4 4
2822
0
10
20
30
40
50
60
malathion d-penothrin
blondebrownreddark
PrivatePublic
Dentists Doctors Nurses Pharmacists
0
1000
2000
3000
4000
Profession
Num
ber
of w
orke
rsClustered bar chart for number of health professionals
Dentists Doctors Nurses Pharmacists
Private Public
0
1000
2000
3000
4000
Sector
Num
ber o
f wor
kers
Clustered bar charts of number of health professionals
Plotting by sector rather than by profession Look at the data from a different angle Highlight different aspects of the data
1.4*Stacked*bar*chart*
stacked bar chart
0%
20%
40%
60%
80%
100%
Breast-fed Bottle-fed
Non-smokersFomer smokersSmokers
PrivatePublic
PharmacistsNursesDoctorsDentists
6000
5000
4000
3000
2000
1000
0
Profession
Num
ber o
f wor
kers
Stacked bar chart for number of health professionals
Variation of the basic bar chart
Dentists Doctors Nurses Pharmacists
PublicPrivate
6000
5000
4000
3000
2000
1000
0
Sector
Num
ber
of w
orke
rs
Stacked bar charts by sector
PrivatePublic
PharmacistsNursesDoctorsDentists
100
9080
70
60
50
4030
20
100
Profession
Per
cent
by
sect
or
Segmented bar charts by profession
PrivatePublic
PharmacistsNursesDoctorsDentists
4000
3000
2000
1000
0
Profession
Num
ber
of w
orke
rs
Clustered bar chart for number of health professionals
PrivatePublic
Dentists Doctors Nurses Pharmacists
0
1000
2000
3000
4000
5000
6000
Profession
Num
ber
of w
orke
rs
Stacked bar chart for number of health professionals
PrivatePublic
Dentists Doctors Nurses Pharmacists
010
20
3040
50
60
70
8090
100
Profession
Per
cent
by
sect
or
Segmented bar charts by profession
Dentists Doctors Nurses Pharmacists
PublicPrivate
4000
3000
2000
1000
0
Sector
Num
ber
of w
orke
rs
Clustered bar chart of number of health professionals
Dentists Doctors Nurses Pharmacists
PublicPrivate
6000
5000
4000
3000
2000
1000
0
Sector
Num
ber
of w
orke
rs
Stacked bar charts by sector
Dentists Doctors Nurses Pharmacists
PublicPrivate
100
9080
70
60
50
4030
20
100
Sector
Per
cent
with
in s
ecto
r
Percentage bar charts by sector
Dentists Doctors Nurses Pharmacists
PublicPrivate
100
9080
70
60
50
4030
20
100
Sector
Per
cent
with
in s
ecto
r
Segmented bar charts by sector
Time Trend
Exaggerate visually the increase in # prescriptions written per person by starting at 8 rather than 0
Stacked bar chart of yearly mortality rate per 1000 births
Pagano & Gauvreau (1999) Principles of Biostatistics, Duxbury.
Table*1:*Response*under*two*treatments*
Response to Treatment
None Partial
Complete
Total
A 3
15 9
27
B 2 22 30
54
Treatment
NonePartial
Complete
BA
100
9080
70
60
50
4030
20
100
Treatment
With
in tr
eatm
ent p
erce
ntag
e
treatmentResponse to
Can compare the response type percentages for the two treatments
NonePartialComplete
A B
010
20
3040
50
60
70
8090
100
Treatment
With
in tr
eatm
ent p
erce
ntag
etreatmentResponse to
Stacked bar charts for percentage figures Histogram Divide the range of the data into a suitably chosen number of intervals, all of the same
width The number of observations that fall
within each interval is plotted
Relative frequency histogram Plot the proportions of observations that
fall within the class intervals
40 60 80 100 120 140 160 180 200 220
0
10
20
SysVol
Fre
quen
cy
Heart Attack PatientsHistogram of End-Systolic Volume for 45 Male
40 60 80 100 120 140 160 180 200 220
0
10
20
30
40
SysVol
Per
cent
Relative frequency polygon for SysVol
Histogram*
Exercise 3-5, Histogram
05
10152025303540
19 20-24 25-29 30-34 35
Percentage age distribution of pregnant women
Thrombosis cases
Step[up*chart*Exercise 3.8 Cumulative percetage o finfants
6.6716.67
36.67
60
90100
0
20
40
60
80
100
120
0 5 10
Cumulativepercetage ofinfants
Cumula-ve*frequency*curve*Exercise 3.9 Ogive
0
20
40
60
80
100
120
15-24 25-34 35-44 45-54 55-64 65-74 75-84 > 85
Percentage cumulative frequency curves of age for male suicide attempters and later succeeders
Attempting suicideLater successful
4.*Describing*data*from*its*distribu-onal*shape*
1.*symmetric*mound[shaped*distribu-ons*Exercise 3-5, Histogram
05
10152025303540
19 20-24 25-29 30-34 35
Percentage age distribution of pregnant women
Thrombosis cases
Non-Symmetrical Histograms
These histograms are skewed.
Common Shapes of Histograms
Skewed Histograms
Skewed left (negative skew)
Skewed right (positive skew)
Common Shapes of Histograms
Skewed Histograms
Skewed left (negative skew)
Skewed right (positive skew)
Note: the SKEW follows the TAIL
Skewed*distribu-ons*Exercise 4.2 shape
020406080
100120140160
15-24
25-34
35-44
45-54
55-64
65-74
75-84
>85
Age distribution for female suicide attempters and later succeeders
Attempting suicide
Shape*of*data*distribu-ons*******
Symmetrical*or*skewed*
Right-Skewed Left-Skewed Symmetric Mean = Median = Mode Mean Median Mode Median Mean Mode
Bimodal*distribu-ons*
A*bimodal*distribu-on*is*one*with*two*dis-nct*humps*or*peaks*
Scaeer*Plots*
Scaeer*plots*are*similar*to*line*graphs*in*that*each*graph*uses*the*horizontal*(*x*)*axis*and*ver-cal*(*y*)**axis*to*plot*data*points.*
* Scaeer*plots*are*most*ogen*used*to*show*correla-ons*or*rela-onships*among*data.*
Scaeer*Plots**Posi-ve*Correla-on*
Study Time Class Grade
0 55
0.5 61
1 67
1.5 73
2 81
2.5 89
3 91
3.5 93
4 95
4.5 97
How Study Time Affects Grades
0
20
40
60
80
100
120
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Time in hours
Ove
rall
grad
e
Scaeer*Plots**Nega-ve*Correla-on*Work out time Weight
0 200
0.5 205
1 190
1.5 195
2 180
2.5 190
3 170
3.5 177
4 160
4.5 170
5 150
5.5 168
6 140
6.5 150
7 130
7.5 170
8 120
8.5 130
9 110
9.5 115
10 100
10.5 120
11 90
11.5 90
12 80
Weight Loss Over Time
0
50
100
150
200
250
0 2 4 6 8 10 12
Days worked out per month
Wei
ght
Weight
Scaeer*Plot*of*the*Data*
Sandwich
Total Fat (g) (X)
Total Calories (y)
Hamburger 9 260
Cheeseburger 13 320
Quarter Pounder 21 420
Quarter Pounder with Cheese 30 530
Big Mac 31 560
Arch Sandwich Special 31 550
Arch Special with Bacon 34 590
Crispy Chicken 25 500
Fish Fillet 28 560
Grilled Chicken 20 440
Grilled Chicken Light 5 300
Fat Grams and Calories in Food
0
100
200
300
400
500
600
700
0 5 10 15 20 25 30 35 40
Total Fat Grams
Tota
l Cal
orie
s
Damaged*for*life*by*too*much*TV*
N Z Herald (04/10/2005)
Damaged*for*life*by*too*much*TV*
Damaged*for*life*by*too*much*TV*
TV watching
Hea
lth S
core
r = - 0.93
Causal relationship?
5.*Describing*data*with*numeric*summary*value*
1.*numbers,*percentages*and*propor-ons* 2.*summary*measures*of*central*loca-on/central*tendency*
3.*summary*measures*of*spread/dispersion*
5.1.*Numbers,*percentages*and*propor-ons**
Numbers[the*numerical*summaries*of*data* A*percentage*is*a*propor-on*mul-plied*by*100.**
1)*Prevalence:*number*of*exis-ng*cases*in*some*popula-on*at*a*given*-me.*
2)*Incidence*(incep-on):*the*number*of*new*cases*occurring*per*100,*or*per*1000,*of*the*popula-on,*during*some*period*of*-me.*
5.2.*Summary*measures*of*Central*loca-on*
1)*Mode:*category*or*value*occurs*the*most*ogen,*****[*Categorical,*numerical,*discrete*2)*Median:*middle*value*(data*in*ascending*order),*central[ness.*
****[*Use:*ordinal*and*numerical*data.*3)*Mean*(average):*divide*the*sum*of*the*values*by*the*number*of*values*
4)*Percen>le:*divide*the*total*number*of*the*values*into*100*equal[sized*groups.*
Choosing*the*most*appropriate*measure*
mode median mean
Nominal
Ordinal
Numerical discrete Numerical continuous
yes
yes
yes
yes
no
yes
Yes, if markedly skewed
Yes, if markedly skewed
no
no
yes
yes
5.3.*Summary*measure*of*spread/dispersion/variability*
* Range:*maximum*value**minimum*value*
IQR*(interquar>le*range):*=*(75th**25th)*percen-le************************************************=*Q3*Q1**
BoxHand*whiskerplot:*graphical*summary*of*the*three*quar-le*values,*the*minimum*and*maximum*values,*and*outliers.*
Box[and[Whisker*Plot*
*****Graphical*Display*of*Data*Using** *5HNumber*Summary*
Median
4 6 8 10 12
Q 3 Q 1 X Maximum value
X Minimum value
Standard*devia-on*
The*spread*in*a*set*of*data;**average*distance*of*all*the*data*values*from*the*mean*value.**
The*smaller*the*average*distance*is,*the*narrower*the*spread,*and*vice*versa.*
Use:*numerical*data*only.*