3
Objectives
• Organize data using frequency distributions.
• Represent data in frequency distributions graphically using histograms, frequency polygons, and ogives.
• Represent data using Pareto charts, time series graphs, and pie graphs.
• Draw and interpret a stem and leaf plot.
• Draw and Interpret a scatter plot for a set of paired data.
4
Introduction
This chapter will show how to organize data
and then construct appropriate graphs to
represent the data in a concise, easy-to-
understand form.
5
Section 2.1 Organizing Data
Basic Vocabulary
• When data are collected in original form, they are called raw data.
• A frequency distribution is the organization of raw data in table form, using classes and frequencies.
• The two most common distributions are categorical frequency distribution and the grouped frequency distribution.
6
Frequency Distributions
Categorical Frequency Distributions
count how many times each distinct
category has occurred and
summarize the results in a table
format
7
Example 1: Letter grades for Math 227 Spring 2005:
C A B C D F B B A C C F C
B D A C C C F C C A A C
a) Construct a frequency distribution for the categorical data.
Answer: Class Tally Frequency Percent
A //// 5 20
B //// 4 16
D // 2 8
F /// 3 12
Total 25 100
/ //// //// / C / 11 44
/
8
b) What percentage of the students pass the class with
the grade C or better?
Answer: Total number of letter grade = 25
Number of grade C or better = 20
Percentage =
9
Frequency Distributions
Group Frequency Distributions -
When the range of the data is large, the
data must be grouped into classes
that are more than one unit in width
10
Grouped Frequency Distributions
The lower class limit represents the smallest value
that can be included in the class.
The upper class limit represents the largest value
that can be included in the class.
11
Lower and Upper Class Limit
Lower Class
Limit
Class
Limits Frequency
24 - 30 3
31 - 37 1
38 - 44 5
45 - 51 9
52 - 58 6
Class
Limits Frequency
24 - 30 3
31 - 37 1
38 - 44 5
45 - 51 9
52 - 58 6
Upper Class
Limit
12
Grouped Frequency
Distributions (cont.)
Class
Limits Frequency 23.5
24 - 30 3 30.5
31 - 37 1 37.5
38 - 44 5 44.5
45 - 51 9 51.5
52 - 58 6 58.5
Class
Limits Frequency 23.5
24 - 30 3 30.5
31 - 37 1 37.5
38 - 44 5 44.5
45 - 51 9 51.5
52 - 58 6 58.5
Class
Boundaries
The class boundaries are used to separate the classes so
that there are no gaps in the frequency distribution.
13
• Rule of Thumb: Class limits should have the same decimal place value as the data, but the class boundaries have one additional place value and end in a 5.
e.g. data were whole numbers
lower class boundary = lower class limit – 0.5
upper class boundary = upper class limit + 0.5
e.g. data were one decimal place
lower class boundary = lower class limit – 0.05
upper class boundary = upper class limit +0.05
Class Boundaries Significant Figures
14
Class Midpoints
•The class midpoint (mark) is found by
adding the lower and upper boundaries (or
limits) and dividing by 2.
15
Class Midpoints
Class
Limits Frequency
24 27 30 3
31 34 37 1
38 41 44 5
45 48 51 9
52 55 58 6
Class
Midpoints
16
Class Width
The class width for a class in a frequency
distribution is found by subtracting the lower
(or upper) class limit of one class from the
the lower (or upper) class limit of the next
class.
17
Class Width
Class
Limits Frequency
24 - 30 3
7 31 - 37 1
7 38 - 44 5
7 45 - 51 9
7 52 - 58 6
Class
Width
18
Class Rules
• There should be between 5 and 20 classes.
• The class width should be an odd number but not absolutely necessary.
• The classes must be mutually exclusive.
• The classes must be continuous.
• The classes must be exhaustive.
• The classes must be equal width.
19
Class width as an odd
number The class width being an odd number is preferable since it
ensures that the midpoint of each class has the sample place value as the data.
If the class width is an even number, the midpoint is in tenths. For example, if the class width is 6 and the class limits are 6 and 11, the midpoint is:
*This is only a suggestion, and it is not rigorously followed.
20
Relative Frequency Relative Frequency is the frequency of each class
divided by the total number.
relative frequency = class frequency
sum of all frequencies
Class Relative
Limits Frequency Frequency
24 - 30 3 3/24 =0.125
31 - 37 1 1/24 =0.042
38 - 44 5 5/24 =0.208
45 - 51 9 9/24 =0.375
52 - 58 6 6/24 =o.25
Total 24 1
21
Cumulative Frequency
Cumulative Frequency is the sum of the
frequencies accumulated up to the upper
boundary of a class.
Class Cumulative
Limits Frequency Frequency
24 - 30 3 3
31 - 37 1 4
38 - 44 5 9
45 - 51 9 18
52 - 58 6 24
Total 24
22
Procedure for constructing a
grouped frequency distribution 1. Decide on the number of classes you want. ( 5 to 20 classes)
2. Calculate (round up) the class width
*Round the answer up to the nearest whole number if there is a remainder: ex) the class width of 14.167 –round up to 15.
*If there is no remainder, you will need to add an extra class to accommodate all the data.
3. Choose a number for the lower limit of the first class
4. Use the lower limit of the first class and the class width to list
the other lower class limits.
5. Enter the upper class limits.
6. Tally the frequency for each class
class width (highest value) – (lowest value)
number of classes
range
number of classes =
23
Example 1 : Construct a grouped frequency table for the
following data values.
44, 32, 35, 38, 35, 39, 42, 36, 36, 40, 51, 58
58, 62, 63, 72, 78, 81, 25, 84, 20
Tip: Consider reordering the data.
24
Answer: 20, 25, 32, 35, 35, 36, 36, 38, 39, 40, 42, 44,
51, 58, 58, 62, 63, 72, 78, 81, 84
1. Let number of classes be 5
2. Range = High – Low = 84 – 20 = 64
Class width = 64 / 5 = 12.8 ≈ 13 (Round-up)
Class Tally Frequency
20 –
+13
33 –
+13
46 –
+13
59 –
+13
72 –
///
//// ////
///
//
////
/
3
9
3
2
4
32
45
58
71
84
25
Ex) Complete the table.
Class Class Midpt Tally Frequency Cumulative Relative
limit boundaries frequency frequency
Total 21 1
20+32
2
=26
20 – 32
33 – 45
46 – 58
59 – 71
72– 84
19.5 – 32.5
32.5 – 45.5
45.5 – 58.5
58.5 – 71.5
71.5 – 84.5
39
52
65
78
///
//// ////
///
//
////
/
3
9
3
2
4
3
12
15
17
21
3
21
=0.14
0.43
0.14
0.10
0.19
26
Frequency Distributions
An ungrouped frequency distribution is used
for numerical data and when the range of
data is small.
27
Example: The number of incoming telephone calls
per day over the first 25 days of business:
4, 4, 1, 10, 12, 6, 4, 6, 9, 12, 12, 1, 1, 1,
12, 10, 4, 6, 4, 8, 8, 9, 8, 4, 1
Construct an ungrouped frequency distribution
28
Answer:
Class limits Class Cumulative
Number of Calls boundaries Tally Frequency frequency
1 0.5 – 1.5 //// 5 5
2 1.5 – 2.5 0 5
3 2.5 – 3.5 0 5
4 3.5 – 4.5 //// / 6 11
5 4.5 – 5.5 0 11
6 5.5 – 6.5 /// 3 14
7 6.5 – 7.5 0 14
8 7.5 – 8.5 /// 3 17
9 8.5 – 9.5 // 2 19
10 9.5 – 10.5 // 2 21
11 10.5 –11.5 0 21
12 11.5 –12.5 //// 4 25
/ /
29
Types of Frequency Distributions
(summary)
A categorical frequency distribution is used when the data is nominal.
• A grouped frequency distribution is used when the range is large and classes of several units in width are needed.
• An ungrouped frequency distribution is used for numerical data and when the range of data is small.
30
Why Construct Frequency
Distributions? • To organize the data in a meaningful, intelligible way.
• To enable the reader to make comparisons among
different data sets.
• To facilitate computational procedures for measures of
average and spread.
• To enable the reader to determine the nature or shape of
the distribution.
• To enable the researcher to draw charts and graphs for
the presentation of data.
31
Section 2.2 Histogram, Frequency
Polygons, Ogives
This chapter will show how to organize data
and then construct appropriate graphs to
represent the data in a concise, easy-to-
understand form.
32
The Role of Graphs
• The purpose of graphs in statistics is to convey
the data to the viewer in pictorial form.
• Graphs are useful in getting the audience’s
attention in a publication or a presentation.
33
Three Most Common Graphs
• The histogram displays the data by using vertical
bars of various heights to represent the
frequencies.
0
2
4
6
8
0 10.5 20.5 30.5 40.5 50.5 60.5 70.5 80.5
Class Boundaries
Fre
quency x-axis: class boundaries
y-axis: frequency
34
Three Most Common Graphs
(cont’d.) • The frequency polygon displays the data by using
lines that connect points plotted for the
frequencies at the midpoints of the classes.
x-axis: midpoints
y-axis: frequency
0
1
2
3
4
5
6
7
8
0.5 10.5 20.5 30.5 40.5 50.5 60.5 70.5 80.5
Class Midpoints
Frequency
35
Three Most Common Graphs
(cont’d.)
• The cumulative frequency or ogive
represents the cumulative frequencies for the
classes in a frequency distribution.
x-axis: class boundaries
y-axis: cumulative frequency
0
10
20
30
0 10.5 21 31.5 42 52.5 63 73.5 84 94.5
Class Boundaries
Cum
ula
tive
Fre
quen
cy
36
Relative Frequency Graphs
• A relative frequency graph is a graph that
uses proportions instead of frequencies.
Relative frequencies are used when the
proportion of data values that fall into a given
class is more important than the frequency.
37
Example 1 :
The following data are the number of the English-language Sunday Newspaper per state in the United States as of February 1, 1996.
2 3 3 4 4 4 4 4 5 6 6 6 7
7 7 8 10 11 11 11 12 12 13 14 14 14
15 15 16 16 16 16 16 16 18 18 19 21 21
23 27 31 35 37 38 39 40 44 62 85
38
(for part b) (for part e) (for part c) (for part d)
a) Using 1 as the starting value and a class width of 15, construct a grouped
frequency distribution.
Class
Limit Freq
Class
Boundaries
Relative
Frequency
Cumulative
Frequency
Class
Midpoint
Cumulative
Relative
Frequency
39
b) Construct a histogram for the grouped frequency
distribution.
(x-axis: class boundaries; y-axis: frequency)
Answer:
0
5
10
15
20
25
30
15.530.5
45.560.5
75.590.5
More
Class Boundaries
Fre
qu
en
cy
40
c) Construct a frequency polygon.
(x-axis: class midpoints(marks); y-axis: frequency)
Answer:
0
5
10
15
20
25
30
-7 8 23 38 53 68 83 98
Class Marks
Fre
qu
en
cy
41
d) Construct an ogive.
(x-axis: class boundaries; y-axis: cumulative frequency)
Answer:
0
10
20
30
40
50
60
0.5 15.5 30.5 45.5 60.5 75.5 90.5
Class Boundaries
Cu
mu
lati
ve F
req
uen
cy
42
e) Construct a (i) relative frequency histogram,
(ii) relative frequency polygon,
and (iii) relative cumulative frequency ogive.
Answer:
(i) relative frequency histogram
(x-axis: class boundaries; y-axis: relative frequency)
0
0.1
0.2
0.3
0.4
0.5
0.6
15.530.5
45.560.5
75.590.5
More
Class Boundaries
Re
lati
ve
Fre
qu
en
cy
43
(ii) relative frequency polygon
(x-axis: class midpoints (marks); y-axis: relative
frequency)
0
0.1
0.2
0.3
0.4
0.5
0.6
-7 8 23 38 53 68 83 98
Class Marks
Rela
tive F
req
uen
cy
44
(iii) Ogive using relative cumulative frequency
(x-axis: class boundaries; y-axis: relative cumulative
frequency)
0
0.2
0.4
0.6
0.8
1
1.2
0.5 15.5 30.5 45.5 60.5 75.5 90.5
Class Boundary
Re
lati
ve
Cu
mu
lati
ve
Fre
qu
en
cy
46
Section 2.3 Other Types of Graphs
A Pareto chart is used to represent a frequency distribution for
categorical variable, and the frequencies are displayed by the
heights of vertical bars, which are arranged in order from highest to
lowest.
(x-axis: categorical variables; y-axis: frequencies, which are arranged in
order from highest to lowest)
How People Get to Work
0
10
20
30
Auto Bus Trolley Train Walk
Fre
quency
47
Other Types of Graphs (cont) A pie graph is a circle
that is divided into
sections or wedges
according to the
percentage of
frequencies in each
category of the
distribution.
Favorite American Snacks
potato
chips
38%
tortilla
chips
27%
pretzels
14%
popcorn
13%
snack nuts
8%
48
Example 1: Grade received for Math 227
C A B B D C C C C B B A F F
a) Construct a pareto chart.
Answer: Grade Frequency
A 2
B 4
C 5
D 1
F 2
Next, arrange the frequency in descending order
Grade Frequency
C 5
B 4
A 2
F 2
D 1
50
Grade Frequency
Relative
Frequency Degree
A 2
B 4
C 5
D 1
F 2
14
b) Construct a pie chart.
Answer:
52
Other Types of Graphs (cont.)
• A time series graph represents data that occur
over a specific period of time.
Temperature Over a 5-hour Period
35
40
45
50
55
12 1 2 3 4 5
Time
Tem
p.
53
Example 1: The percentages of voters voting in the last 5 Presidential
elections are shown here. Construct a time series graph.
Answer:
Year 1984 1988 1992 1996 2000
% of voters voting 74.63% 72.48% 78.01% 65.97% 67.50%
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
1984 1988 1992 1996 2000
Year
% o
f vo
ters
vo
tin
g
54
Stem-and-Leaf Plots
• A stem-and-leaf plot is a data plot that uses part
of a data value as the stem and part of the data
value as the leaf to form groups or classes.
• It has the advantage over grouped frequency
distribution of retaining the actual data while
showing them in graphic form.
55
Stem-and-Leaf Plots (cont) Digits of each data to the left of a vertical bar are called the stems.
Digits of each data to the right of the appropriate stem are called the leaves.
Example 1: The test scores on a 100-point test were recorded for 20 students
61 93 91 86 55 63 86 82 76 57
94 89 67 62 72 87 68 65 75 84
Construct an ordered stem-and-leaf plot
Answer: Reorder the data:
55 57 61 62 63 65 67 68 72 75 76 82 84 86 86 87 89 91 93 94
Stem Leaf
5 5 7
6 1 2 3 5 7 8
7 2 5 6
8 2 4 6 6 7 9
9 1 3 4
56
Example 2 :
0 – 4 → belongs to the first stem 5 – 9 → belongs to the second stem
5
5 5 7
6 1 2 3
6 5 7 8
7 2
7 5 6
8 2 4
8 6 6 7 9
9 1 3 4
9
Stem Leaf
Use the data in example 1 to construct a double stem and leaf plot.
e.g. split each stem into two parts, with leaves 0 – 4 on one part
and 5 – 9 on the other.
Answer:
57
Stem-and-Leaf Plots (cont)
A stem-and-leaf plot portrays the shape of a
distribution and restores the original data values.
It is also useful for spotting outliers.
Outliers are data values that are extremely large or
extremely small in comparison to the norm.
58
Misleading Graphs
Example: Graph of Automaker’s Claim
Using a Scale from 95% to 100% Using a Scale from 0% to 100%
59
2.4 Paired Data and Scatter Plots
• Many times researchers are interested in determining if a relationship between two variables exist.
• To do this, the researcher collects data consisting of two measures that are paired with another.
• The variable first mentioned is called the independent variable; the second variable is the dependent variable.
60
• Scatter Plot – is a graph of order pairs
values that is used to determine
if a relationship exists between two
variables.
61
Analyzing the Scatter Plot
• A positive linear relationship exists when the points fall approximately in an ascending straight line and both the x and y values increase at the same time.
• A negative linear relationship exists when the points fall approximately in a straight line descending from left to right.
• A nonlinear relationship exists when the points fall along a curve.
• No relationship exists when there is no discernable pattern of the points.
63
Example 1: A researcher wishes to determine if there is a relationship
between the number of days an employee missed a year and the
person’s age. Draw a scatter plot and comment on the nature of
the relationship.
Age (x) 22 30 25 35 65 50 27 53 42 58
Days missed (y) 0 4 1 2 14 7 3 8 6 4
0
2
4
6
8
10
12
14
16
0 20 40 60 80Age (x)
Da
ys
Mis
se
d (
y)
The relationship of the data
shows a positive linear
relationship.
64
Summary of Graphs and Uses
• Histograms, frequency polygons, and
ogives are used when the data are
contained in a grouped frequency
distribution.
• Pareto charts are used to show
frequencies for nominal variables.
65
Summary of Graphs and Uses
(cont.)
• Time series graphs are used to show a
pattern or trend that occurs over time.
• Pie graphs are used to show the
relationship between the parts and the
whole.
• When data are collected in pairs, the
relationship, if one exists, can be determined by
looking at a scatter plot.