© 2014 by Pearson Higher Education, IncUpper Saddle River, New Jersey 07458 • All Rights Reserved
HLTH 300 Biostatistics for Public Health Practice,
Raul Cruz-Cano, Ph.D.2/3/2014 Spring 2014
Fox/Levin/Forde, Elementary Statistics in Social Research, 12e
Chapter 2: Organizing the Data
1
2
2.1
WRT the homework: You are allowed to literally “copy” and “paste” the problem from the book
Announcement
© 2014 by Pearson Higher Education, IncUpper Saddle River, New Jersey 07458 • All Rights Reserved
Create frequency distributions of nominal data
Calculate proportions, percentages, ratios, and rates
Create simple and grouped frequency distributions
Create cross-tabulations
Distinguish between various forms of graphic presentations
CHAPTER OBJECTIVES
2.1
2.2
2.3
2.4
2.5
Create frequency distributions of nominal data
Learning ObjectivesAfter this lecture, you should be able to complete the following Learning Outcomes
2.1
5
2.1
Formulas and statistical techniques are used by researchers to:
• Organize raw data• Test hypotheses
Raw data is often difficult to synthesize
Frequency tables make raw data easier to understand
Introduction
6
2.1 Frequency Distributions of Nominal Data
Responses of Young Boys to Removal of Toy
Response of Child fCry 25Express Anger 15Withdraw 5Ply with another toy 5
N=50
Characteristics of a frequency distribution of nominal data:• Title• Consists of two columns:
• Left column: characteristics (e.g., Response of Child)
• Right column: frequency (f)
7
2.1
Comparisons clarify results, add information, and allow for comparisons
Comparing Distributions
Response to Removal of Toy by Gender of Child
Gender of ChildResponse of Child Male FemaleCry 25 28Express Anger 15 3Withdraw 5 4Play with another toy 5 15 Total 50 50
Calculate proportions, percentages, ratios, and rates
Learning ObjectivesAfter this lecture, you should be able to complete the following Learning Outcomes
2.2
9
2.2
Allows for a comparison of groups of different sizes
Proportion – number of casescompared to the total size of distribution
Percentage – the frequency of occurrence of a category per 100 cases
Proportions and Percentages
fPN
% (100) fN
10
Examples
Responses of Young Boys to Removal of Toy
Response of Child fCry 25Express Anger 15Withdraw 5Ply with another toy 5
N=50
fPN
% (100) fN
Proportion of children that cried
5.5025
%505025100
Percentage of children that cried
11
2.2
Ratio – compares the frequencyof one category to another
Rate – compares betweenactual and potential cases
Ratio and Rates
1
2
Ratioff
actual casesRate 1,000 potential casesff
12
Examples
Responses of Young Boys to Removal of Toy
Response of Child fCry 25Express Anger 15Withdraw 5Ply with another toy 5
N=50
Ratio of children that cried for every child that withdraw
5.525
children 1000per 50050251000
Rate of children that cried
1
2
Ratioff
actual casesRate 1,000 potential casesff
Create simple and grouped frequency distributions
Learning ObjectivesAfter this lecture, you should be able to complete the following Learning Outcomes
2.3
© 2014 by Pearson Higher Education, IncUpper Saddle River, New Jersey 07458 • All Rights Reserved
2.3
Table 2.4
© 2014 by Pearson Higher Education, IncUpper Saddle River, New Jersey 07458 • All Rights Reserved
2.3
Table 2.5
Not in Order!
16
2.3
Used to clarify the presentation of interval-level scores spread over a wide range
Class Intervals• Smaller categories or groups containing more than one score• Class interval size determined by the number of score values it
contains
Grouped Frequency Distribution of Interval Data
17
2.3
Class Limits• The point halfway between
adjacent intervals • Upper and lower limits
– Distance from upper and lower limit determines the size of class interval
The Midpoint• The middlemost score value in a class interval
– The sum of the lowest and highest value in a class interval divided by two
Class Limits and the Midpoint
i U L
size of a class interval upper limit of a class interval lower limit of a class interval
iUL
lowest score value highest score value2
m
Careful, many time they are not as evident as they seem
18
More about the length of class intervals
f50-54 455-59 560-64 565-69 1270-74 1775-79 1280-84 785-89 490-94 295-95 3
71
TABLE 2.7 Grouped FrequencyDistribution of Final-ExaminationGrades for 71 Students
Usually the second category would be considered to be from 54.5 to 59.5
But notice that in a survey about age the respondents would consider to be from 55.0 to 55 + (364/365)
In other words “…comes down to personal preference, feasibility and logical sense, not what is strictly right or wrong” (page 52)
Midpoint = (55 +59)/2 = 114/2 =57
19
2.3
Cumulative Frequencies• Total number of cases having a given score or a score that is
lower– Shown as cf– Obtained by the sum of frequencies in that category plus all
lower categories’ frequencies
Cumulative Percentage• Percentage of cases having a given score or a score that is
lower
Cumulative Distributions
% 100 cfcN
© 2014 by Pearson Higher Education, IncUpper Saddle River, New Jersey 07458 • All Rights Reserved
2.3
Table 2.7
f % cf c%50-54 4 5.63 4 5.6355-59 5 7.04 9 12.6860-64 5 7.04 14 19.7265-69 12 16.90 26 36.6270-74 17 23.94 43 60.5675-79 12 16.90 55 77.4680-84 7 9.86 62 87.3285-89 4 5.63 66 92.9690-94 2 2.82 68 95.7795-95 3 4.23 71 100.00
71 100.00
TABLE 2.7 Grouped FrequencyDistribution of Final-ExaminationGrades for 71 Students
21
2.3
The percentage of cases falling at or below a given score
• Deciles – points that divide a distribution into 10 equally sized portions
• Quartiles – points that divide a distribution into quarters• Median – the point that divides a distribution in two, half above
it and half below it
Let’s talk about it after the Frequency Polygons and Line Charts
Percentiles
Create cross-tabulations
Learning ObjectivesAfter this lecture, you should be able to complete the following Learning Outcomes
2.4
© 2014 by Pearson Higher Education, IncUpper Saddle River, New Jersey 07458 • All Rights Reserved
2.4
Table 2.17
Notice that sometimes is useful to divide the data using more than one variable, e.g. by Relationship and by Victim Sex
24
2.4 Percents within Cross-Tabulations
Total Percents: % 100
Row Percents: % 100
Column Percents: % 100
total
row
column
ftotalNfrowN
fcolumnN
The choice comes down to which is more relevant to the purpose of the analysis
• If the independent variable is on the rows, use row percents• If the independent variable is on the columns, use column
percents• If the independent variable is unclear, use whichever percent is
most meaningful
25
26
Solution
a) Does you class determine if you buy a new car? Or Does buying a new car determines your class?
b) Pct. New Car = 100(17/ 73) = 23.28%c) Pct. Upper class with new car = 100(23/33) = 69.69%d) Pct. Middle class with new car = 100(6/27) = 22.22%e) Pct. Lower class with new car = 100(1/13) = 7.69%f) Effect of social class in buying a car?
No New Car New Car Total rowUpper Class 23 10 33Middle Class 21 6 27Lower Class 12 1 13Total Column 56 17 73
27
28
Solution
Score Value f Class Interval f Midpoint Percentage cf Cum. Pct.
39 4 15-19 12 (15+19)/2 =17 100(12/74)=16.21 12 100(12/74)=16.21
38 4 20-24 23 (20+24)/2=22 100(23/74)=31.08 12+23=35 100(35/74)=47.29
35 2 25-29 22 (25+29)/2=27 100(22/74)=29.72 35+22=57 100(57/74)=77.02
32 3 30-34 7 (30+34)/2=32 100(7/74)=9.45 57+7=64 100(64/74)=86.48
31 4 35-39 10 (35+39)/2=37 100(10/74)=13.51 64+10=74 100
27 9 74 99.97=100 approx.
26 7
25 6
21 13
20 10
17 5
15 774
Distinguish between various forms of graphic presentations
Learning ObjectivesAfter this lecture, you should be able to complete the following Learning Outcomes
2.5
© 2014 by Pearson Higher Education, IncUpper Saddle River, New Jersey 07458 • All Rights Reserved
2.5
Figure 2.4 Pie Charts
© 2014 by Pearson Higher Education, IncUpper Saddle River, New Jersey 07458 • All Rights Reserved
2.5
Figure 2.6 Bar Graph & Histograms
© 2014 by Pearson Higher Education, IncUpper Saddle River, New Jersey 07458 • All Rights Reserved
2.5
Figure 2.9 Frequency polygons
(frequency indicated at midpoint of each class)
© 2014 by Pearson Higher Education, IncUpper Saddle River, New Jersey 07458 • All Rights Reserved
Midpoint f % cf c%50-54 52.5 4 5.63 4 5.6355-59 57.5 5 7.04 9 12.6860-64 62.5 5 7.04 14 19.7265-69 67.5 12 16.9 26 36.6270-74 72.5 17 23.94 43 60.5675-79 77.5 12 16.9 55 77.4680-84 82.5 7 9.86 62 87.3285-89 87.5 4 5.63 66 92.9690-94 92.5 2 2.82 68 95.7795-95 97.5 3 4.23 71 100
71 100
From Table 2.7
52.5 57.5 62.5 67.5 72.5 77.5 82.5 87.5 92.5 97.50
10
20
30
40
50
60
70
80
90
100
50 percentile =70 approx
The smaller the class interval the better
© 2014 by Pearson Higher Education, IncUpper Saddle River, New Jersey 07458 • All Rights Reserved
2.5
Figure 2.11Taller than who? Flatter than who?
© 2014 by Pearson Higher Education, IncUpper Saddle River, New Jersey 07458 • All Rights Reserved
2.5
Figure 2.12
© 2014 by Pearson Higher Education, IncUpper Saddle River, New Jersey 07458 • All Rights Reserved
2.5
Figure 2.14 Line Chart(discrete values)
© 2014 by Pearson Higher Education, IncUpper Saddle River, New Jersey 07458 • All Rights Reserved
2.5
Figure 2.15 Maps
38
Let’s work in MS Excel
© 2014 by Pearson Higher Education, IncUpper Saddle River, New Jersey 07458 • All Rights Reserved
Frequency distributions can be created to help researchers visualize distributions
Proportions, percentages, ratios, and rates can be calculated as a way to describe data
Simple frequency distributions can be created using data at any level of measurement, while interval level
data is needed to create a grouped frequency data
Cross-tabulations can be created to illustrate the relationship between two variables
CHAPTER SUMMARY
2.1
2.2
2.3
2.4
Several forms of graphs can be used to demonstrate patterns and relationships between variables 2.5
40
2.3
Problems: 14 and 31I know that they are not exactly the same as those solved in class
No Excel this time, but maybe next
Homework