Date post: | 17-Dec-2015 |
Category: |
Documents |
Upload: | kathleen-alexander |
View: | 215 times |
Download: | 0 times |
1
Introduction to Biostatistics
(BIO/EPI 540)
Data Presentation Graphs and Tables
Acknowledgement: Thanks to Professor Pagano (Harvard School of Public Health) for lecture material
2
Class Plan
Data Presentation (Lec 2 overview)Example (hand/SAS)Mean and varianceDescribing Data (and in next class)Simulating Data (and in next class)
3
Outline• Descriptive Statistics – means of
organizing and summarizing observations
• Types of data
• Data presentation and numerical summary measures
4
Types of data
•Nominal Data
•Ordinal Data Rank Data
•Discrete Data
•Continuous Data
5
Types of data
•Nominal Data
1: male0:female
•Nominal data values fall into unordered categories or classes
6
Types of data
•Ordinal Data
•Observations with order among categories are referred to as ordinal
1.Mild2.Moderate3.Severe
7
8
Cause 1999 1998
Floodgates/Canal Lock
15 9
Human Related 8 6
Natural 43 21
Perinatal 52 53
Watercraft 82 66
Undetermined 69 76
Total 263 231
Example: Death of Manatees in Florida
Florida Fish and Wildlife Conservation Commission
Nominal categories
9
Cause 1999 1998 RankFloodgates/Canal Lock
15 9 4Human Related 8 6 5Natural 43 21 3Perinatal 52 53 2Watercraft 82 66 1Undetermined 69 76
Total 263 231
Example: Death of Manatees in Florida
Florida Fish and Wildlife Conservation Commission
Ranked data
10
Types of data
•Discrete Data
•Both order & magnitude important
•Data consists of restricted set of values
e.g. Data on number of children per subject
SubjectNumber of children
1 2
2 3
3 1
4 2
5 4
11
Types of data
•Continuous Data
•Data represents measurable quantities, but are not restricted to taking on specific values
•US adult heights
•US adult individual cholesterol measurements
12
Outline
• Descriptive Statistics – means of organizing and summarizing observations
• Types of data
• Data presentation and numerical summary measures
13
Data Presentation
• Nominal / Ordinal Data: – Frequency (relative frequency) tables– Bar charts
• Discrete/ Continuous Data: – Histogram (Frequency Polygon)– One way scatter plot
• Continuous Data:– Box plot– 2 way scatter plot– Line Graph
14
Example: Serum cholesterol level of men aged 25-34 years.
Cholesterol Level
(mg/100 ml)
Number ofMen
80—119 13
120—159 150
160—199 442
200—239 299
240—279 115
280—319 34
320—359 9
360—399 5
Total 1,067
Frequency Table
15
Example: Serum cholesterol level of men aged 25-34 years.
Cholesterol Level
(mg/100 ml)
Number ofMen
RelativeFrequency (%)
80—119 13 1.2
120—159 150 14.1
160—199 442 41.4
200—239 299 28.0
240—279 115 10.8
280—319 34 3.2
320—359 9 0.8
360—399 5 0.5
Total 1,067 100.0
Frequency Table
16
Bar Chart
http://www.ncsu.edu/labwrite/res/gh/gh-bargraph.html#horizbar
Label axes; Leave space between bars
Car defects in three factories
17
Data Presentation
• Nominal / Ordinal Data: – Frequency (relative frequency) tables– Bar charts
• Discrete/ Continuous Data: – Histogram (Frequency Polygon)
• Continuous Data:– Box plot– 2 way scatter plot– Line Graph
18
HistogramExample
19
Histogram
• Choosing the number of bins – depends on range of data
• Equal widths of bins recommended
• When data demands unequal bin widths, take care to plot area proportional to relative frequency
Key points
20
Histogram
• A histogram represents percentages by areas*
• Density scale (Y axis): the height of each block (bin) equals the percentage in that block (bin) divided by the bin width
• Total area of histogram = 100%
• When bin widths are equal – it is common for the histogram to show just the counts in each bin
Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf
Key points
21Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf
Histogram - example
22
Percent
Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf
Histogram - example
23
Histogram
Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf
24
HistogramConstructing a 100% area
histogram
Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf
25
Histogram
Constructing a 100% area histogram
Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf
26
Histogram
density
-2.0 -0.4 0.40 2.0
Constructing a 100% area histogram
Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf
27
Serum cholesterol level of men (1976-1980 survey)
Cholesterol Level
(mg/100 ml)
RelativeFrequency 25-34 yrs
(%)
RelativeFrequency 55-64 yrs
(%)
80—119 1.2 0.4
120—159 14.1 3.9
160—199 41.4 21.6
200—239 28.0 37.3
240—279 10.8 22.9
280—319 3.2 10.4
320—359 0.8 2.9
360—399 0.5 0.6
Total 100.0 100.0
Frequency Polygon - Example
28
Frequency polygon of cholesterol
05
1015202530354045
80-119
120-159
160-199
200-239
240-279
280-319
320-359
360-399
Levels
Per
cent
25-34 55-64
Frequency Polygon - Example
29
Serum choleterol level of men aged 25-34 years.
Cholesterol Level
(mg/100 ml)
RelativeFrequency (%)
Cumulative
80—119 1.2 1.2120—159 14.1 15.3160—199 41.4 56.7200—239 28.0 84.7240—279 10.8 95.5280—319 3.2 98.7320—359 0.8 99.5360—399 0.5 100.0
Total 100.0
Frequency Polygon - Example
30
Cumulative frequency polygon of cholesterol
0
20
40
60
80
100
80-119
120-159
160-199
200-239
240-279
280-319
320-359
360-399
Levels
Per
cent
25-34
Frequency Polygon - Example
31
Cumulative frequency polygon of cholesterol
0
20
40
60
80
100
80-119
120-159
160-199
200-239
240-279
280-319
320-359
360-399
Levels
Per
cent
25-34 55-64
Frequency Polygon - Example
32
Data Presentation
• Nominal / Ordinal Data: – Frequency (relative frequency) tables– Bar charts
• Discrete/ Continuous Data: – Histogram (Frequency Polygon)
• Continuous Data:– Box plot– 2 way scatter plot– Line Graph
33
Example - Dyslipidemia in HIV Cohort
Histogram reveals an asymmetric, skewed distribution
34
Example - Dyslipidemia in HIV Cohort
Natural log transformation of the dataresults in a more symmetric distribution
35
Box plotDyslipidemia in HIV Cohort
50th percentile
Natu
ral lo
g t
ransf
orm
ed
Tri
gly
ceri
de m
easu
rem
ents
25th percentile
75th percentile
UB
LB
UB (LB) = most extreme data point that is within 1.5 times box width (IQR) of the 75th (25th) percentile
Outliers
36
Box plotDyslipidemia in HIV Cohort
37
2 way scatter plotDyslipidemia in HIV Cohort
Reveals relationship between 2 continuous variables
38
Summary• Data Types:
– Nominal – Ordinal– Discrete– Continuous
• Data presentation (Nominal/Ordinal data):– Tables (Frequency, Relative Frequency) – Bar charts
• Data presentation (Discrete/Continuous)– Histogram (Frequency Polygon)
• Data presentation (Continuous) – Box plot, shapes of distributions– 2 way scatter plot
39
In-Class ExampleDistance willing to Travel to a
Household Hazardous waste site:Distance Freq< 1 mile 751>-2 miles 902>-5 miles 455>-10 miles 90
300
Histogram, Polygon, Cum % Dist.
40
In-Class ExampleDistance willing to Travel to a
Household Hazardous waste site:Distance Freq % %/mile< 1 mile 75 25 25>1-2 miles 90 30 30>2-5 miles 45 15 5>5-10 miles 90 30 6
300
Histogram, Polygon, Cum % Dist.
41
Histogram of Travel Distance (miles) for n=300
Densi
ty
Distance (Miles)0 1 2 3 4 5 10
42
Polygon of Travel Distance (miles) for n=300
Densi
ty
Distance (Miles)0 1 2 3 4 5 10
43
Cumulative % of Travel Distance (miles) for n=300
Cum
. Perc
ent
Distance (Miles)0 1 2 3 4 5 10
0
2
5
50
75
1
00