Chapter 4 Displaying and Summarizing Quantitative Data CHAPTER OBJECTIVES At the conclusion of this chapter you should be able to: 1) Construct graphs that appropriately describe quantitative data 2) Calculate and interpret numerical summaries of quantitative data. 3) Combine numerical methods with graphical methods to analyze a data set. 4) Apply graphical methods of summarizing data to choose appropriate numerical summaries. 5) Apply software and/or calculators to automate graphical and numerical summary procedures.
Transcript
Slide 1
Slide 2
Chapter 4 Displaying and Summarizing Quantitative Data CHAPTER
OBJECTIVES At the conclusion of this chapter you should be able to:
n 1)Construct graphs that appropriately describe quantitative data
n 2)Calculate and interpret numerical summaries of quantitative
data. n 3)Combine numerical methods with graphical methods to
analyze a data set. n 4)Apply graphical methods of summarizing data
to choose appropriate numerical summaries. n 5)Apply software
and/or calculators to automate graphical and numerical summary
procedures.
Slide 3
Displaying Quantitative Data Histograms Stem and Leaf
Displays
Slide 4
Relative Frequency Histogram of Exam Grades 0.05.10.15.20.25.30
405060708090 Grade Relative frequency 100
Slide 5
Frequency Histogram
Slide 6
Histograms A histogram shows three general types of
information: n It provides visual indication of where the
approximate center of the data is. n We can gain an understanding
of the degree of spread, or variation, in the data. n We can
observe the shape of the distribution.
Slide 7
All 200 m Races 20.2 secs or less
Slide 8
Histograms Showing Different Centers
Slide 9
Histograms Showing Different Centers (football head coach
salaries)
Slide 10
Histograms - Same Center, Different Spread (football head coach
salaries)
Frequency Distribution of Grades Class Limits Frequency 40 up
to 50 50 up to 60 60 up to 70 70 up to 80 80 up to 90 90 up to 100
Total 2 6 8 7 5 2 30
Slide 15
Relative Frequency Distribution of Grades Class Limits Relative
Frequency 40 up to 50 50 up to 60 60 up to 70 70 up to 80 80 up to
90 90 up to 100 2/30 =.067 6/30 =.200 8/30 =.267 7/30 =.233 5/30
=.167 2/30 =.067
Slide 16
Relative Frequency Histogram of Grades 0.05.10.15.20.25.30
405060708090 Grade Relative frequency 100
Slide 17
Based on the histo- gram, about what percent of the values are
between 47.5 and 52.5? 1. 50% 2. 5% 3. 17% 4. 30% Countdown 10
Slide 18
Stem and leaf displays n Have the following general appearance
stemleaf 18 9 21 2 8 9 9 32 3 8 9 40 1 56 7 64
Slide 19
Stem and Leaf Displays n Partition each no. in data into a stem
and leaf n Constructing stem and leaf display 1) deter. stem and
leaf partition (5-20 stems) 2) write stems in column with smallest
stem at top; include all stems in range of data 3) only 1 digit in
leaves; drop digits or round off 4) record leaf for each no. in
corresponding stem row; ordering the leaves in each row helps
Suppose a 95 yr. old is hired stemleaf 18 9 21 2 8 9 9 32 3 8 9
40 1 56 7 64 7 8 95
Slide 22
Number of TD passes by NFL teams: 2012-2013 season ( stems are
10s digit) stemleaf 4343 03 247 26677789 201222233444 113467889
08
Slide 23
Pulse Rates n = 138
Slide 24
Advantages/Disadvantages of Stem-and-Leaf Displays n Advantages
1) each measurement displayed 2) ascending order in each stem row
3) relatively simple (data set not too large) n Disadvantages
display becomes unwieldy for large data sets
Slide 25
Population of 185 US cities with between 100,000 and 500,000 n
Multiply stems by 100,000
Below is a stem-and-leaf display for the pulse rates of 24
women at a health clinic. How many pulses are between 67 and 77?
Stems are 10s digits 1. 4 2. 6 3. 8 4. 10 5. 12 Countdown 10
Slide 28
Interpreting Graphical Displays: Shape n A distribution is
symmetric if the right and left sides of the histogram are
approximately mirror images of each other. Symmetric distribution
Complex, multimodal distribution Not all distributions have a
simple overall shape, especially when there are few observations.
Skewed distribution A distribution is skewed to the right if the
right side of the histogram (side with larger values) extends much
farther out than the left side. It is skewed to the left if the
left side of the histogram extends much farther out than the right
side.
Slide 29
Heights of Students in Recent Stats Class
Slide 30
Shape (cont.)Female heart attack patients in New York state
Age: left-skewedCost: right-skewed
Slide 31
AlaskaFlorida Shape (cont.): Outliers An important kind of
deviation is an outlier. Outliers are observations that lie outside
the overall pattern of a distribution. Always look for outliers and
try to explain them. The overall pattern is fairly symmetrical
except for 2 states clearly not belonging to the main trend. Alaska
and Florida have unusual representation of the elderly in their
population. A large gap in the distribution is typically a sign of
an outlier.
Slide 32
Center: typical value of frozen personal pizza? ~$2.65
Slide 33
Spread: fuel efficiency 4, 8 cylinders 4 cylinders: more
spread8 cylinders: less spread
Slide 34
Other Graphical Methods for Economic Data n Time plots plot
observations in time order, with time on the horizontal axis and
the vari- able on the vertical axis ** Time series measurements are
taken at regular intervals (monthly unemployment, quarterly GDP,
weather records, electricity demand, etc.)
Slide 35
Heat Maps
Slide 36
Unemployment Rate, by Educational Attainment
Slide 37
Water Use During Super Bowl
Slide 38
Winning Times 100 M Dash
Slide 39
Numerical Summaries of Quantitative Data Numerical and More
Graphical Methods to Describe Univariate Data
Slide 40
2 characteristics of a data set to measure n center measures
where the middle of the data is located n variability measures how
spread out the data is
Slide 41
The median: a measure of center Given a set of n measurements
arranged in order of magnitude, Median=middle valuen odd mean of 2
middle values,n even n Ex. 2, 4, 6, 8, 10; n=5; median=6 n Ex. 2,
4, 6, 8; n=4; median=(4+6)/2=5
Medians are used often n Year 2014 baseball salaries Median
$1,450,000 (max=$28,000,000 Zack Greinke; min=$500,000) n Median
fan age: MLB 45; NFL 43; NBA 41; NHL 39 n Median existing home
sales price: May 2011 $166,500; May 2010 $174,600 n Median
household income (2008 dollars) 2009 $50,221; 2008 $52,029
Slide 44
The median splits the histogram into 2 halves of equal
area
Slide 45
Examples n Example: n = 7 17.5 2.8 3.2 13.9 14.1 25.3 45.8 n
Example n = 7 (ordered): n 2.8 3.2 13.9 14.1 17.5 25.3 45.8 n
Example: n = 8 17.5 2.8 3.2 13.9 14.1 25.3 35.7 45.8 n Example n =8
(ordered) 2.8 3.2 13.9 14.1 17.5 25.3 35.7 45.8 m = 14.1 m =
(14.1+17.5)/2 = 15.8
Slide 46
Below are the annual tuition charges at 7 public universities.
What is the median tuition? 4429 4960 4971 5245 5546 7586 1. 5245
2. 4965.5 3. 4960 4. 4971 Countdown 10
Slide 47
Below are the annual tuition charges at 7 public universities.
What is the median tuition? 4429 4960 5245 5546 4971 5587 7586 1.
5245 2. 4965.5 3. 5546 4. 4971 Countdown 10
Slide 48
Measures of Spread n The range and interquartile range
Slide 49
Ways to measure variability range=largest-smallest OK
sometimes; in general, too crude; sensitive to one large or small
data value The range measures spread by examining the ends of the
data A better way to measure spread is to examine the middle
portion of the data
Slide 50
m = median = 3.4 Q 1 = first quartile = 2.3 Q 3 = third
quartile = 4.2 Quartiles: Measuring spread by examining the middle
The first quartile, Q 1, is the value in the sample that has 25% of
the data at or below it (Q 1 is the median of the lower half of the
sorted data). The third quartile, Q 3, is the value in the sample
that has 75% of the data at or below it (Q 3 is the median of the
upper half of the sorted data).
Slide 51
Quartiles and median divide data into 4 pieces Q1 M Q3 Q1 M Q3
1/4 1/41/4 1/4
Slide 52
Quartiles are common measures of spread n
http://oirp.ncsu.edu/ir/admit http://oirp.ncsu.edu/ir/admit n
http://oirp.ncsu.edu/univ/peer http://oirp.ncsu.edu/univ/peer n
University of Southern California University of Southern California
n Economic Value of College Majors Economic Value of College
Majors
Slide 53
Rules for Calculating Quartiles Step 1: find the median of all
the data (the median divides the data in half) Step 2a: find the
median of the lower half; this median is Q 1 ; Step 2b: find the
median of the upper half; this median is Q 3. Important: when n is
odd include the overall median in both halves; when n is even do
not include the overall median in either half.
Slide 54
Example n 2 4 6 8 10 12 14 16 18 20 n = 10 n Median n m =
(10+12)/2 = 22/2 = 11 n Q 1 : median of lower half 2 4 6 8 10 Q 1 =
6 n Q 3 : median of upper half 12 14 16 18 20 Q 3 = 16 11
Slide 55
Quartile example: odd no. of data values n HRs hit by Babe Ruth
in each season as a Yankee 54 59 35 41 46 25 47 60 54 46 49 46 41
34 22 Ordered values: 22 25 34 35 41 41 46 46 46 47 49 54 54 59 60
Median: value in ordered position 8. median = 46 Lower half
(including overall median): 22 25 34 35 41 41 46 46 Upper half
(including overall median): 46 46 47 49 54 54 59 60
Slide 56
Pulse Rates n = 138 Median: mean of pulses in locations 69
& 70: median= (70+70)/2=70 Q 1 : median of lower half (lower
half = 69 smallest pulses); Q 1 = pulse in ordered position 35; Q 1
= 63 Q 3 median of upper half (upper half = 69 largest pulses); Q 3
= pulse in position 35 from the high end; Q 3 =78
Slide 57
Below are the weights of 31 linemen on the NCSU football team.
What is the value of the first quartile Q 1 ? #stemleaf 22255 42357
62426 7257 1026257 122759 (4)281567 152935599 1030333 73145 532155
2336 1340 1. 287 2. 257.5 3. 263.5 4. 262.5 Countdown 10
Slide 58
Interquartile range n lower quartile Q 1 n middle quartile:
median n upper quartile Q 3 n interquartile range (IQR) IQR = Q 3 Q
1 measures spread of middle 50% of the data
Slide 59
Example: beginning pulse rates n Q 3 = 78; Q 1 = 63 n IQR = 78
63 = 15
Slide 60
Below are the weights of 31 linemen on the NCSU football team.
The first quartile Q 1 is 263.5. What is the value of the IQR?
#stemleaf 22255 42357 62426 7257 1026257 122759 (4)281567 152935599
1030333 73145 532155 2336 1340 1. 23.5 2. 39.5 3. 46 4. 69.5
Countdown 10
Slide 61
5-number summary of data n Minimum Q 1 median Q 3 maximum n
Pulse data 45 63 70 78 111
Slide 62
m = median = 3.4 Q 3 = third quartile = 4.2 Q 1 = first
quartile = 2.3 Largest = max = 6.1 Smallest = min = 0.6 Five-number
summary: min Q 1 m Q 3 max Boxplot: display of 5-number summary
BOXPLOT
Slide 63
Boxplot: display of 5-number summary n Example: age of 66 crush
victims at rock concerts 1999-2000. 5-number summary: 13 17 19 22
47
Slide 64
Boxplot construction 1) construct box with ends located at Q1
and Q3; in the box mark the location of median (usually with a line
or a +) 2) fences are determined by moving a distance 1.5(IQR) from
each end of the box; 2a) upper fence is 1.5*IQR above the upper
quartile 2b) lower fence is 1.5*IQR below the lower quartile Note:
the fences only help with constructing the boxplot; they do not
appear in the final boxplot display
Slide 65
Box plot construction (cont.) 3) whiskers: draw lines from the
ends of the box left and right to the most extreme data values
found within the fences; 4) outliers: special symbols represent
each data value beyond the fences; 4a) sometimes a different symbol
is used for far outliers that are more than 3 IQRs from the
quartiles
Slide 66
Q 3 = third quartile = 4.2 Q 1 = first quartile = 2.3 Largest =
max = 7.9 Boxplot: display of 5-number summary BOXPLOT 8
Interquartile range Q 3 Q 1 = 4.2 2.3 = 1.9 Distance to Q 3 7.9 4.2
= 3.7 1.5 * IQR = 1.5*1.9=2.85. Individual #25 has a value of 7.9
years, which is 3.7 years above the third quartile. This is more
than 2.85 = 1.5*IQR above Q 3. Thus, individual #25 is a suspected
outlier.
Slide 67
ATM Withdrawals by Day, Month, Holidays
Slide 68
Slide 69
Beg. of class pulses (n=138) n Q 1 = 63, Q 3 = 78 n IQR=78
63=15 n 1.5(IQR)=1.5(15)=22.5 n Q 1 - 1.5(IQR): 63 22.5=40.5 n Q 3
+ 1.5(IQR): 78 + 22.5=100.5 70 63 78 40.5 100.5 45
Slide 70
Below is a box plot of the yards gained in a recent season by
the 136 NFL receivers who gained at least 50 yards. What is the
approximate value of Q 3 ? 0 136 273 410 547 684 821 958 1095 1232
1369 Pass Catching Yards by Receivers 1. 450 2. 750 3. 215 4. 545
Countdown 10
Slide 71
Rock concert deaths: histogram and boxplot
Slide 72
Automating Boxplot Construction n Excel out of the box does not
draw boxplots. n Many add-ins are available on the internet that
give Excel the capability to draw box plots. n Statcrunch
(http://statcrunch.stat.ncsu.edu) draws box plots.
Slide 73
Q 3 = third quartile = 4.2 Q 1 = first quartile = 2.3 Largest =
max = 7.9 Statcrunch Boxplot
Slide 74
Tuition 4-yr Colleges
Slide 75
Statcrunch: 2012-13 NFL Salaries by Position
Slide 76
College Football Head Coach Salaries by Conference
Slide 77
2013 Major League Baseball Salaries by Team
Slide 78
End of General Numerical Summaries. Next: Numerical Summaries
of Symmetric Data