7/30/2019 Doane Chapter 04A
1/69
7/30/2019 Doane Chapter 04A
2/69
Descriptive Statistics (Part 1)
Numerical Description
Central Tendency
Dispersion
Chapter
4
7/30/2019 Doane Chapter 04A
3/69
Statistics are descriptive measures derived from asample (n items).
Parameters are descriptive measures derived from
a population (Nitems).
Numerical Description
7/30/2019 Doane Chapter 04A
4/69
Three key characteristics of numerical data:
Characteristic Interpretation
Central Tendency Where are the data values concentrated?
What seem to be typical or middle datavalues?
Numerical Description
Dispersion How much variation is there in the data?How spread out are the data values?
Are there unusual values?
Shape Are the data values distributed symmetrically?Skewed? Sharply peaked? Flat? Bimodal?
7/30/2019 Doane Chapter 04A
5/69
Numerical statistics can be used to summarize this
random sample of brands.
Defect rate = total no. defectsno. inspected
x 100
Must allow for sampling error since the analysis isbased on sampling.
Numerical Description
Example: Vehic le Quali ty Consider the data set of vehicle defect rates from
J. D. Power and Associates.
7/30/2019 Doane Chapter 04A
6/69
Numerical Description
Number of defects per 100 vehicles, 1004 models.
7/30/2019 Doane Chapter 04A
7/69
To begin, sort thedata in Excel.
7/30/2019 Doane Chapter 04A
8/69
Sorted data provides insight into central tendencyand dispersion.
Numerical Description
7/30/2019 Doane Chapter 04A
9/69
The dot plot offers a visual impression of the data.
Visual Disp laysNumerical Description
7/30/2019 Doane Chapter 04A
10/69
Histograms with 5 bins (suggested by Sturges
Rule) and 10 bins are shown below.
Both are symmetric with no extreme values andshow a modal class toward the low end.
Visual Disp laysNumerical Description
7/30/2019 Doane Chapter 04A
11/69
DescriptiveStatistics in Excel
Go to Tools | Data Analysisand selectDescriptive Statistics
7/30/2019 Doane Chapter 04A
12/69
Highlight the datarange, specify a cellfor the upper-leftcorner of the outputrange, checkSummary Statistics
and click OK.
7/30/2019 Doane Chapter 04A
13/69
Here is the resulting analysis.
7/30/2019 Doane Chapter 04A
14/69
Descriptive Statistics in MegaStat
7/30/2019 Doane Chapter 04A
15/69
7/30/2019 Doane Chapter 04A
16/69
The central tendency is the middle or typicalvalues of a distribution.
Central tendency can be assessed using a dot
plot, histogram or more precisely with numericalstatistics.
Central Tendency
7/30/2019 Doane Chapter 04A
17/69
Statistic Formula Excel Formula Pro Con
Mean =AVERAGE(Data)
Familiar anduses all the
sampleinformation.
Influenced
by extremevalues.1
1 n
i
i
x
n
Central Tendency
Six Measures o f Cen tral Tendency
Median
Middlevalue in
sortedarray
=MEDIAN(Data)
Robust when
extreme datavalues exist.
Ignoresextremesand can be
affected bygaps in datavalues.
7/30/2019 Doane Chapter 04A
18/69
Statistic Formula Excel Formula Pro Con
Mode
Most
frequentlyoccurringdata value
=MODE(Data)
Useful forattribute
data ordiscrete datawith a smallrange.
May not beunique,
and is nothelpful forcontinuousdata.
Central Tendency
Six Measures o f Cen tral Tendency
Midrange=0.5*(MIN(Data)
+MAX(Data))
Easy tounderstandandcalculate.
Influenced
by extremevalues andignoresmost datavalues.
min max
2
x x
7/30/2019 Doane Chapter 04A
19/69
Statistic Formula Excel Formula Pro Con
Geometricmean (G)
=GEOMEAN(Data)
Useful forgrowth
rates andmitigateshighextremes.
Lessfamiliar
andrequirespositivedata.
Trimmedmean
Same as the
mean exceptomit highestand lowestk% of datavalues (e.g.,
5%)
=TRMEAN(Data, %)
Mitigateseffects ofextremevalues.
Excludessome datavaluesthat couldberelevant.
Central Tendency
Six Measures o f Cen tral Tendency
1 2 ...n nx x x
7/30/2019 Doane Chapter 04A
20/69
A familiar measure of central tendency.
In Excel, use function =AVERAGE(Data) whereData is an array of data values.
Population Formula Sample Formula
1
N
i
i
x
N
1
n
i
i
x
xn
Central Tendency
Mean
7/30/2019 Doane Chapter 04A
21/69
For the sample ofn = 37 car brands:
1 87 93 98 ... 159 164 173 4639 125.3837 37
n
i
ix
xn
Central Tendency
Mean
7/30/2019 Doane Chapter 04A
22/69
Arithmetic mean is the most familiar average.
Affected by every sample item.
The balancing point or fulcrum for the data.
Central Tendency
Character ist ics of the Mean
7/30/2019 Doane Chapter 04A
23/69
Regardless of the shape of the distribution,absolute distances from the mean to the datapoints always sum to zero.
1( ) 0
n
i
ix x
Central Tendency
Character ist ics of the Mean
Consider the followingasymmetric distribution of quizscores whose mean = 65.
1
( )n
i
i
x x
= (42 65) + (60 65) + (70 65) + (75 65) + (78 65)
= ( 23) + ( 5) + (5) + (10) + (13) = 28 + 28 = 0
7/30/2019 Doane Chapter 04A
24/69
The median (M) is the 50th percentile or midpointof the sortedsample data.
Mseparates the upper and lower half of the sortedobservations.
Ifn is odd, the median is the middle observation in
the data array. Ifn is even, the median is the average of the
middle two observations in the data array.
Central Tendency
Median
7/30/2019 Doane Chapter 04A
25/69
Central Tendency
Median
Forn = 8, the median is between the fourth andfifth observations in the data array.
7/30/2019 Doane Chapter 04A
26/69
Central Tendency
Median
Forn = 9, the median is the fifth observation in thedata array.
7/30/2019 Doane Chapter 04A
27/69
Consider the following n = 6 data values:11 12 15 17 21 32
What is the median?
M= (x3+x4)/2 = (15+17)/2 = 16
11 12 15 16 17 21 32
For even n, Median = / 2 ( / 2 1)
2
n nx x
n/2 = 6/2 = 3 and n/2+1 = 6/2 + 1 = 4
Central Tendency
Median
7/30/2019 Doane Chapter 04A
28/69
Consider the following n = 7 data values:12 23 23 25 27 34 41
What is the median?
M=x4 = 25
12 23 23 25 27 34 41
For odd n, Median = ( 1) / 2nx
(n+1)/2 = (7+1)/2 = 8/2 = 4
Central Tendency
Median
7/30/2019 Doane Chapter 04A
29/69
Use Excels function =MEDIAN(Data) where Data
is an array of data values.
For the 37 vehicle quality ratings (odd n) theposition of the median is(n+1)/2 = (37+1)/2 = 19.
So, the median isx19 = 121.
When there are several duplicate data values, themedian does not provide a clean 50-50 split inthe data.
Central Tendency
Median
7/30/2019 Doane Chapter 04A
30/69
The median is insensitive to extreme data values.
For example, consider the following quiz scores for3 students:
Toms scores:20, 40, 70, 75, 80 Mean =57, Median = 70, Total = 285
Jakes scores:60, 65, 70, 90, 95 Mean = 76, Median = 70, Total = 380
Marys scores:50, 65, 70, 75, 90 Mean = 70, Median = 70, Total = 350
What does the median for each student tell you?
Central Tendency
Character ist ics of the Median
7/30/2019 Doane Chapter 04A
31/69
The most frequently occurring data value.
Similar to mean and median if data values occur
often near the center of sorted data.
May have multiple modes or no mode.
Central Tendency
Mode
7/30/2019 Doane Chapter 04A
32/69
Lees scores:
60, 70, 70, 70, 80 Mean =70, Median = 70, Mode = 70Pats scores:
45, 45, 70, 90, 100 Mean = 70, Median = 70, Mode = 45Sams scores:
50, 60, 70, 80, 90 Mean = 70, Median = 70, Mode = noneXiaos scores:
50, 50, 70, 90, 90 Mean = 70, Median = 70, Modes = 50,90
Central Tendency
Mode For example, consider the following quiz scores for
3 students:
What does the mode for each student tell you?
7/30/2019 Doane Chapter 04A
33/69
Easy to define, not easy to calculate in largesamples.
Use Excels function =MODE(Array)- will return #N/A if there is no mode.- will return first mode found if multimodal.
May be far from the middle of the distribution andnot at all typical.
Central Tendency
Mode
7/30/2019 Doane Chapter 04A
34/69
Generally isnt useful for continuous data since
data values rarely repeat.
Best for attribute data or a discrete variable with asmall range (e.g., Likert scale).
Central Tendency
Mode
7/30/2019 Doane Chapter 04A
35/69
Consider the following P/Eratios for a randomsample of 68 Standard & Poors 500 stocks.
What is the mode?
Central Tendency
Examp le: Price/Earn ings Ratios and Mode
7 8 8 10 10 10 10 12 13 13 13 13 13 13 13 14 1414 15 15 15 15 15 16 16 16 17 18 18 18 18 19 19 19
19 19 20 20 20 21 21 21 22 22 23 23 23 24 25 26 26
26 26 27 29 29 30 31 34 36 37 40 41 45 48 55 68 91
7/30/2019 Doane Chapter 04A
36/69
Excels descriptive
statistics results are:
The mode 13 occurs7 times, but whatdoes the dot plotshow?
Mean 22.7206
Median 19
Mode 13
Range 84
Minimum 7
Maximum 91
Sum 1545
Count 68
Central Tendency
Examp le: Price/Earn ings Ratios and Mode
7/30/2019 Doane Chapter 04A
37/69
The dot plot shows local modes (a peak withvalleys on either side) at 10, 13, 15, 19, 23, 26, 29.
These multiple modes suggest that the mode isnot a stable measure of central tendency.
Central Tendency
Examp le: Price/Earn ings Ratios and Mode
l d
7/30/2019 Doane Chapter 04A
38/69
Points scored by the winning NCAA football teamtends to have modes in multiples of 7 becauseeach touchdown yields 7 points.
Central Tendency
Example: Rose Bowl Winners Points
Consider the dot plot of the points scored by thewinning team in the first 87 Rose Bowl games.
What is the mode?
C l d
7/30/2019 Doane Chapter 04A
39/69
A bimodal distribution refers to the shape of thehistogram rather than the mode of the raw data.
Occurs when dissimilar populations are combinedin one sample. For example,
Central Tendency
Mode
C l T d
7/30/2019 Doane Chapter 04A
40/69
Compare mean and median or look at histogram todetermine degree of skewness.
Central Tendency
Skewness
C l T d
7/30/2019 Doane Chapter 04A
41/69
Distributions
ShapeHistogram Appearance Statistics
Skewed left(negativeskewness)
Long tail of histogram points left(a few low values but most data onright)
Mean < Median
Central Tendency
Symptoms o f Skewness
SymmetricTails of histogram are balanced(low/high values offset)
Mean Median
Skewed right(positiveskewness)
Long tail of histogram points right(most data on left but a few highvalues)
Mean > Median
C t l T d
7/30/2019 Doane Chapter 04A
42/69
For the sample of J.D. Power quality ratings, themean (125.38) exceeds the median (121). Whatdoes this suggest?
Central Tendency
Skewness
C t l T d
7/30/2019 Doane Chapter 04A
43/69
The geometric mean (G) is amultiplicative average.
For the J. D. Power quality data (n=37):
1 2 ...n
nG x x x
37 7737 (87)(93)(98)...(164)(173) 2.37667 10 123.38G
In Excel use =GEOMEAN(Array) The geometric mean tends to mitigate the effects
of high outliers.
Central Tendency
Geometric Mean
C t l T d
7/30/2019 Doane Chapter 04A
44/69
A variation on the geometric mean used to find theaverage growth rate for a time series.
For example, from
1998 to 2002, SpiritAirlines revenuesare:
1
1nnx
Gx
Year Revenue (mil)1998 131
1999 227
2000 311
2001 354
2002 403
Central Tendency
Grow th Rates
C t l T d
7/30/2019 Doane Chapter 04A
45/69
The average growth rate is given by taking thegeometric mean of the ratios of each years
revenue to the preceding year.
Due to cancellations, only the first and last yearsare relevant:
227G
311
131
227
354
311
403
354
55403
1 1131
= 1.2421 = .242 or 24.2% per year
In Excel use =(403/131)^(1/5)-1
Central Tendency
Grow th Rates
C t l T d
7/30/2019 Doane Chapter 04A
46/69
The midrange is the point halfway between thelowest and highest values of X.
Easy to use but sensitive to extreme data values.
min max
2x xMidrange =
For the J. D. Power quality data (n=37):
min max
2
x x
Midrange =1 37 87 173
1302 2
x x
= Here, the midrange (130) is higher than the mean
(125.38) or median (121).
Central Tendency
Midrange
Central Tendency
7/30/2019 Doane Chapter 04A
47/69
To calculate the trimmed mean, first remove thehighest and lowest kpercent of the observations.
For example, for the n = 68 P/E ratios, we want a 5percent trimmed mean (i.e., k= .05).
To determine how many observations to trim,multiply kx n = 0.05 x 68 = 3.4 or 3 observations.
So, we would remove the three smallest and threelargest observations before averaging theremaining values.
Central Tendency
Trimmed Mean
Central Tendency
7/30/2019 Doane Chapter 04A
48/69
Here is a summary of all the measures of centraltendency for the n = 68 P/E values.
The trimmed mean mitigates the effects of veryhigh values, but still exceeds the median.
Mean: 22.72 =AVERAGE(PERatio)
Median: 19.00 =MEDIAN(PERatio)
Mode: 13.00 =MODE(PERatio)
Geometric Mean: 19.85 =GEOMEAN(PERatio)
Midrange: 49.00 =(MIN(PERatio)+MAX(PERatio))/2
5% Trim Mean: 21.10 =TRIMMEAN(PERatio,0.1)
Central Tendency
Trimmed Mean
Central Tendency
7/30/2019 Doane Chapter 04A
49/69
Central Tendency
Trimmed Mean
The FederalReserve uses a
16% trimmedmean to mitigatethe effects ofextremes in its
analysis of theConsumer PriceIndex.
Dispersion
7/30/2019 Doane Chapter 04A
50/69
Variation is the spread of data points about thecenter of the distribution in a sample. Consider thefollowing measures of dispersion:
Statistic Formula Excel Pro Con
Range xmaxxmin=MAX(Data)-
MIN(Data)Easy to calculate
Sensitive toextreme datavalues.
Dispersion
Variance(s2)
=VAR(Data)Plays a key rolein mathematicalstatistics.
Non-intuitivemeaning.
2
1
1
n
i
i
x x
n
Measures o f Variat ion
Dispersion
7/30/2019 Doane Chapter 04A
51/69
Statistic Formula Excel Pro Con
Standard
deviation(s)
=STDEV(Data)
Most commonmeasure. Uses
same units as theraw data ($ , , ,etc.).
Non-intuitivemeaning.
2
1
1
n
ii
x x
n
Dispersion
Measures o f Variat ion
Coef-ficient. ofvariation(CV)
None
Measures relativevariation in
percentso cancompare datasets.
Requiresnon-negativedata.
100s
x
Dispersion
7/30/2019 Doane Chapter 04A
52/69
Statistic Formula Excel Pro Con
Meanabsolute
deviation(MAD)
=AVEDEV(Data)Easy to
understand.
Lacks nice
theoreticalproperties.
Dispersion
Measures o f Variat ion
1
n
i
i
x x
n
Dispersion
7/30/2019 Doane Chapter 04A
53/69
The difference between the largest and smallestobservation.
Range =xmaxxmin
For example, for the n = 68 P/E ratios,
Range = 91 7 = 84
Dispersion
Range
Dispersion
7/30/2019 Doane Chapter 04A
54/69
Thepopulation variance (s2) isdefined as the sum of squareddeviations around the mean
divided by the population size.
For the sample variance (s2), wedivide by n 1 instead ofn,
otherwise s2 would tend tounderestimate the unknownpopulation variance s2.
2
2 1
N
i
i
x
N
s
2
2 1
1
n
ii
x x
sn
Dispersion
Variance
Dispersion
7/30/2019 Doane Chapter 04A
55/69
The square root of the variance.
Units of measure are the same asX.
Population
standarddeviation
2
1
N
i
i
x
N
s
Sample
standarddeviation
2
1
1
n
i
i
x x
sn
Explains how individual values in a data set varyfrom the mean.
Dispersion
Standard Deviat ion
Dispersion
7/30/2019 Doane Chapter 04A
56/69
Excels built in functions are
Statist ic Excel popu lat ionformula Excel samp leformula
Variance =VARP(Array) =VAR(Array)
Standard deviation =STDEVP(Array) =STDEV(Array)
Dispersion
Standard Deviat ion
Dispersion
7/30/2019 Doane Chapter 04A
57/69
Consider the following five quiz scores forStephanie.
Dispersion
Calcu lat ing a Standard Deviat ion
Dispersion
7/30/2019 Doane Chapter 04A
58/69
Now, calculate the sample standard deviation:
2
1 2380
595 24.391 5 1
n
i
i
x x
s n
Somewhat easier, the two-sum formula can alsobe used:
2
212
2 1
(360)28300
28300 259205 595 24.391 5 1 5 1
n
ini
i
i
x
xn
sn
Dispersion
Calcu lat ing a Standard Deviat ion
Dispersion
7/30/2019 Doane Chapter 04A
59/69
The standard deviation is nonnegative becausedeviations around the mean are squared.
When every observation is exactly equal to themean, the standard deviation is zero.
Standard deviations can be large or small,depending on the units of measure.
Compare standard deviations onlyfor data setsmeasured in the same units and only if the meansdo not differ substantially.
Dispersion
Calcu lat ing a Standard Deviat ion
Dispersion
7/30/2019 Doane Chapter 04A
60/69
Useful for comparing variables measured indifferent units or with different means.
A unit-free measure of dispersion Expressed as a percent of the mean.
Only appropriate for nonnegative data. It isundefined if the mean is zero or negative.
100s
CVx
Dispersion
Coeff icient o f Variat ion
Dispersion
7/30/2019 Doane Chapter 04A
61/69
For example:
Defect rates(n = 37)
s = 22.89= 125.38 gives CV= 100 (22.89)/(125.38) = 18%
ATM deposits(n = 100)
s = 280.80= 233.89 gives CV= 100 (280.80)/(233.89) = 120%
P/E ratios(n = 68)
s = 14.28= 22.72 gives CV= 100 (14.08)/(22.72) = 62%
x
x
x
100s
CVx
Dispersion
Coeff icient o f Variat ion
Dispersion
7/30/2019 Doane Chapter 04A
62/69
The Mean Absolute Deviation (MAD) reveals theaverage distance from an individual data point tothe mean (center of the distribution).
Uses absolute values of the deviations around themean.
Excels function is =AVEDEV(Array)
1
n
i
i
x x
MADn
p
Mean Abso lute Deviat ion
Dispersion
7/30/2019 Doane Chapter 04A
63/69
Consider the histograms of hole diameters drilled ina steel plate during manufacturing.
The desired distribution is outlined in red.
p
Machine A Machine B
Central Tendency vs . Dispersion:Manufactur ing
Dispersion
7/30/2019 Doane Chapter 04A
64/69
Desired mean (5mm)but too much variation.
Acceptable variation butmean is less than 5 mm.
Take frequent samples to monitor quality.
Machine A Machine B
p
Central Tendency vs . Dispersion:Manufactur ing
Dispersion
7/30/2019 Doane Chapter 04A
65/69
Consider student ratings of four professors on eightteaching attributes (10-point scale).
p
Central Tendency vs . Dispersion:Job Performance
Dispersion
7/30/2019 Doane Chapter 04A
66/69
Jones and Wu have identical means but differentstandard deviations.
p
Central Tendency vs . Dispersion:Job Performance
Dispersion
7/30/2019 Doane Chapter 04A
67/69
Smith and Gopal have different means but identicalstandard deviations.
p
Central Tendency vs . Dispersion:Job Performance
Dispersion
7/30/2019 Doane Chapter 04A
68/69
A high mean (better rating) and low standarddeviation (more consistency) is preferred. Which
professor do you think is best?
p
Central Tendency vs . Dispersion:Job Performance
7/30/2019 Doane Chapter 04A
69/69
Applied Statistics inBusiness and Economics
End of Part 1 of Chapter 4