Post on 05-Jan-2016
transcript
populations vs. samples
• we want to describe both samples and populations
• the latter is a matter of inference…
“outliers”
• minority cases, so different from the majority that they merit separate consideration– are they errors?– are they indicative of a different pattern?
• think about possible outliers with care, but beware of mechanical treatments…
• significance of outliers depends on your research interests
summaries of distributions
• graphic vs. numeric– graphic may be better for visualization– numeric are better for statistical/inferential
purposes
• resistance to outliers is usually an advantage in either case
general characteristics
• kurtosis
-5 50.00
0.22
-5 5D
0.0
0.4X
-5 5D
0.0
0.8
X
‘leptokurtic’ ’platykurtic’
[“peakedness”]
0.0 0.2 0.4 0.6 0.8 1.0 1.2D
0
1
2
3
4
5X
right(positive)
skew
0.0 0.2 0.4 0.6 0.8 1.0 1.2D
0
1
2
3
4
5
X
left(negative)
skew
• skew (skewness)
central tendency
• measures of central tendency– provide a sense of the value expressed by
multiple cases, over all…
• mean
• median
• mode
mean
• center of gravity
• evenly partitions the sum of all measurement among all cases; average of all measures
n
xx
n
ii
1
• crucial for inferential statistics
• mean is not very resistant to outliers
• a “trimmed mean” may be better for descriptive purposes
mean – pro and con
meanrim diameter (cm)
unit 1 unit 212.6 16.211.6 16.416.3 13.813.1 13.212.1 11.326.9 14.09.7 9.0
11.5 12.514.8 15.613.5 11.212.4 12.213.6 15.5
11.7
n 12 13total 168.1 172.6total/n 14.0 13.3
unit 1 unit 29 26
252423222120191817
3 16 2415 56
14.0== 8 14 0651 13 28 ==13.3641 12 25
65 11 23710
7 9 0
R: mean(x)
trimmed meanrim diameter (cm)
unit 1 unit 29.7 9.0
11.5 11.211.6 11.312.1 11.712.4 12.212.6 12.513.1 13.213.5 13.813.6 14.014.8 15.516.3 15.626.9 16.2
16.4
n 10 11total 131.5 147.2total/n 13.2 13.4
unit 1 unit 29 26
252423222120191817
3 16 2415 56
8 14 013.2== 651 13 28 ==13.4
641 12 2565 11 237
107 9 0
R: mean(x, trim=.1)
median
• 50th percentile…
• less useful for inferential purposes
• more resistant to effects of outliers…
median
rim diameter (cm)
unit 1 unit 29.7 9.0
11.5 11.211.6 11.312.1 11.712.4 12.212.6 12.5
12.9 <-- 13.2 13.213.1 13.813.5 14.013.6 15.514.8 15.616.3 16.226.9 16.4
unit 1 unit 29 26
252423222120191817
3 16 2415 56
8 14 0651 13 28 ==13.20
12.85== 641 12 2565 11 237
107 9 0
mode
• the most numerous category• for ratio data, often implies that data have
been grouped in some way• can be more or less created by the grouping
procedure• for theoretical distributions—simply the
location of the peak on the frequency distribution
isol
ated
sca
tter
s
ham
lets
vill
ages
regi
onal
cen
ters
regi
onal
cen
ters
modal class = ‘hamlets’
-5 50.00
0.22
1.0 1.5 2.0 2.5
dispersion
• measures of dispersion – summarize degree of clustering of cases, esp.
with respect to central tendency…
• range
• variance
• standard deviation
range
unit 1 unit 29.7 9.0
11.5 11.211.6 11.312.1 11.712.4 12.212.6 12.513.1 13.213.5 13.813.6 14.014.8 15.516.3 15.626.9 16.2
16.4
unit 1 unit 2* 9 26| 25| 24| 23| 22| 21| 20| 19| 18| 17| 3 16 24 *| 15 56 || 8 14 0 || 651 13 28 || 641 12 25 || 65 11 237 || 10 |* 7 9 0 *
• would be better to use midspread…R: range(x)
variance
• analogous to average deviation of cases from mean
• in fact, based on sum of squared deviations from the mean—“sum-of-squares”
11
2
2
n
xxs
n
ii
R: var(x)
variance
• computational form:
1
/2
11
2
2
n
nxx
s
n
ii
n
ii
• note: units of variance are squared…
• this makes variance hard to interpret
• ex.: projectile point sample:mean = 22.6 mmvariance = 38 mm2
• what does this mean???
standard deviation
• square root of variance:
11
2
n
xxs
n
ii
1
/1
2
1
2
n
nxx
s
n
i
n
iii
standard deviation
• units are in same units as base measurements
• ex.: projectile point sample:mean = 22.6 mmstandard deviation = 6.2 mm
• mean +/- sd (16.4—28.8 mm)– should give at least some intuitive sense of where most
of the cases lie, barring major effects of outliers