+ All Categories
Home > Documents > 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement:...

1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement:...

Date post: 17-Dec-2015
Category:
Upload: kathleen-alexander
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
43
1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables nowledgement: Thanks to Professor Pagano rvard School of Public Health) for lecture material
Transcript
Page 1: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

1

Introduction to Biostatistics

(BIO/EPI 540)

Data Presentation Graphs and Tables

Acknowledgement: Thanks to Professor Pagano (Harvard School of Public Health) for lecture material

Page 2: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

2

Class Plan

Data Presentation (Lec 2 overview)Example (hand/SAS)Mean and varianceDescribing Data (and in next class)Simulating Data (and in next class)

Page 3: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

3

Outline• Descriptive Statistics – means of

organizing and summarizing observations

• Types of data

• Data presentation and numerical summary measures

Page 4: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

4

Types of data

•Nominal Data

•Ordinal Data Rank Data

•Discrete Data

•Continuous Data

Page 5: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

5

Types of data

•Nominal Data

1: male0:female

•Nominal data values fall into unordered categories or classes

Page 6: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

6

Types of data

•Ordinal Data

•Observations with order among categories are referred to as ordinal

1.Mild2.Moderate3.Severe

Page 7: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

7

Page 8: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

8

Cause 1999 1998

Floodgates/Canal Lock

15 9

Human Related 8 6

Natural 43 21

Perinatal 52 53

Watercraft 82 66

Undetermined 69 76

Total 263 231

Example: Death of Manatees in Florida

Florida Fish and Wildlife Conservation Commission

Nominal categories

Page 9: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

9

Cause 1999 1998 RankFloodgates/Canal Lock

15 9 4Human Related 8 6 5Natural 43 21 3Perinatal 52 53 2Watercraft 82 66 1Undetermined 69 76

Total 263 231

Example: Death of Manatees in Florida

Florida Fish and Wildlife Conservation Commission

Ranked data

Page 10: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

10

Types of data

•Discrete Data

•Both order & magnitude important

•Data consists of restricted set of values

e.g. Data on number of children per subject

SubjectNumber of children

1 2

2 3

3 1

4 2

5 4

Page 11: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

11

Types of data

•Continuous Data

•Data represents measurable quantities, but are not restricted to taking on specific values

•US adult heights

•US adult individual cholesterol measurements

Page 12: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

12

Outline

• Descriptive Statistics – means of organizing and summarizing observations

• Types of data

• Data presentation and numerical summary measures

Page 13: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

13

Data Presentation

• Nominal / Ordinal Data: – Frequency (relative frequency) tables– Bar charts

• Discrete/ Continuous Data: – Histogram (Frequency Polygon)– One way scatter plot

• Continuous Data:– Box plot– 2 way scatter plot– Line Graph

Page 14: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

14

Example: Serum cholesterol level of men aged 25-34 years.

Cholesterol Level

(mg/100 ml)

Number ofMen

80—119 13

120—159 150

160—199 442

200—239 299

240—279 115

280—319 34

320—359 9

360—399 5

Total 1,067

Frequency Table

Page 15: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

15

Example: Serum cholesterol level of men aged 25-34 years.

Cholesterol Level

(mg/100 ml)

Number ofMen

RelativeFrequency (%)

80—119 13 1.2

120—159 150 14.1

160—199 442 41.4

200—239 299 28.0

240—279 115 10.8

280—319 34 3.2

320—359 9 0.8

360—399 5 0.5

Total 1,067 100.0

Frequency Table

Page 16: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

16

Bar Chart

http://www.ncsu.edu/labwrite/res/gh/gh-bargraph.html#horizbar

Label axes; Leave space between bars

Car defects in three factories

Page 17: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

17

Data Presentation

• Nominal / Ordinal Data: – Frequency (relative frequency) tables– Bar charts

• Discrete/ Continuous Data: – Histogram (Frequency Polygon)

• Continuous Data:– Box plot– 2 way scatter plot– Line Graph

Page 18: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

18

HistogramExample

Page 19: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

19

Histogram

• Choosing the number of bins – depends on range of data

• Equal widths of bins recommended

• When data demands unequal bin widths, take care to plot area proportional to relative frequency

Key points

Page 20: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

20

Histogram

• A histogram represents percentages by areas*

• Density scale (Y axis): the height of each block (bin) equals the percentage in that block (bin) divided by the bin width

• Total area of histogram = 100%

• When bin widths are equal – it is common for the histogram to show just the counts in each bin

Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf

Key points

Page 21: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

21Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf

Histogram - example

Page 22: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

22

Percent

Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf

Histogram - example

Page 23: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

23

Histogram

Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf

Page 24: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

24

HistogramConstructing a 100% area

histogram

Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf

Page 25: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

25

Histogram

Constructing a 100% area histogram

Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf

Page 26: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

26

Histogram

density

-2.0 -0.4 0.40 2.0

Constructing a 100% area histogram

Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf

Page 27: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

27

Serum cholesterol level of men (1976-1980 survey)

Cholesterol Level

(mg/100 ml)

RelativeFrequency 25-34 yrs

(%)

RelativeFrequency 55-64 yrs

(%)

80—119 1.2 0.4

120—159 14.1 3.9

160—199 41.4 21.6

200—239 28.0 37.3

240—279 10.8 22.9

280—319 3.2 10.4

320—359 0.8 2.9

360—399 0.5 0.6

Total 100.0 100.0

Frequency Polygon - Example

Page 28: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

28

Frequency polygon of cholesterol

05

1015202530354045

80-119

120-159

160-199

200-239

240-279

280-319

320-359

360-399

Levels

Per

cent

25-34 55-64

Frequency Polygon - Example

Page 29: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

29

Serum choleterol level of men aged 25-34 years.

Cholesterol Level

(mg/100 ml)

RelativeFrequency (%)

Cumulative

80—119 1.2 1.2120—159 14.1 15.3160—199 41.4 56.7200—239 28.0 84.7240—279 10.8 95.5280—319 3.2 98.7320—359 0.8 99.5360—399 0.5 100.0

Total 100.0

Frequency Polygon - Example

Page 30: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

30

Cumulative frequency polygon of cholesterol

0

20

40

60

80

100

80-119

120-159

160-199

200-239

240-279

280-319

320-359

360-399

Levels

Per

cent

25-34

Frequency Polygon - Example

Page 31: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

31

Cumulative frequency polygon of cholesterol

0

20

40

60

80

100

80-119

120-159

160-199

200-239

240-279

280-319

320-359

360-399

Levels

Per

cent

25-34 55-64

Frequency Polygon - Example

Page 32: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

32

Data Presentation

• Nominal / Ordinal Data: – Frequency (relative frequency) tables– Bar charts

• Discrete/ Continuous Data: – Histogram (Frequency Polygon)

• Continuous Data:– Box plot– 2 way scatter plot– Line Graph

Page 33: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

33

Example - Dyslipidemia in HIV Cohort

Histogram reveals an asymmetric, skewed distribution

Page 34: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

34

Example - Dyslipidemia in HIV Cohort

Natural log transformation of the dataresults in a more symmetric distribution

Page 35: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

35

Box plotDyslipidemia in HIV Cohort

50th percentile

Natu

ral lo

g t

ransf

orm

ed

Tri

gly

ceri

de m

easu

rem

ents

25th percentile

75th percentile

UB

LB

UB (LB) = most extreme data point that is within 1.5 times box width (IQR) of the 75th (25th) percentile

Outliers

Page 36: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

36

Box plotDyslipidemia in HIV Cohort

Page 37: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

37

2 way scatter plotDyslipidemia in HIV Cohort

Reveals relationship between 2 continuous variables

Page 38: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

38

Summary• Data Types:

– Nominal – Ordinal– Discrete– Continuous

• Data presentation (Nominal/Ordinal data):– Tables (Frequency, Relative Frequency) – Bar charts

• Data presentation (Discrete/Continuous)– Histogram (Frequency Polygon)

• Data presentation (Continuous) – Box plot, shapes of distributions– 2 way scatter plot

Page 39: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

39

In-Class ExampleDistance willing to Travel to a

Household Hazardous waste site:Distance Freq< 1 mile 751>-2 miles 902>-5 miles 455>-10 miles 90

300

Histogram, Polygon, Cum % Dist.

Page 40: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

40

In-Class ExampleDistance willing to Travel to a

Household Hazardous waste site:Distance Freq % %/mile< 1 mile 75 25 25>1-2 miles 90 30 30>2-5 miles 45 15 5>5-10 miles 90 30 6

300

Histogram, Polygon, Cum % Dist.

Page 41: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

41

Histogram of Travel Distance (miles) for n=300

Densi

ty

Distance (Miles)0 1 2 3 4 5 10

Page 42: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

42

Polygon of Travel Distance (miles) for n=300

Densi

ty

Distance (Miles)0 1 2 3 4 5 10

Page 43: 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

43

Cumulative % of Travel Distance (miles) for n=300

Cum

. Perc

ent

Distance (Miles)0 1 2 3 4 5 10

0

2

5

50

75

1

00


Recommended