Date post: | 20-Jan-2016 |
Category: |
Documents |
Upload: | ophelia-glenn |
View: | 218 times |
Download: | 0 times |
Distribution and Outliers
Screening
(Significant Effects)
Hadlum vs Hadlum
A univariate example that illustrates deviation from a normal pattern.
Duration of Pregnancy
Bannet (1978) Appl. Statist. 27, 242-250
Normal duration
Per
cent
age
(n=
1363
4)
Normal duration
Per
cent
age
(n=
1363
4)
Hadlum Jr.
Comparison of Hadlum Jr. to normal pattern
Model validation
Deviation = observed value - predicted valueresidual Modelmeasurement
y y
Normally distributed population
2
2
2
)(1)(
y
econstyp
iy
i dyypyP )()(
P(yi)
Normal Population - Cumulative plots
Traditional Graphical paper
Normal distribution paper
)(100)(% ii yPyP
Normal plot1) Sort the observations in increasing order
2) Let each observation present a percent interval that equals
of the normal distribution
nsobservatioofNumber
100
If the observations are normally distributed, they plot like a straight line in the normal plot!
Deviation from straight line implies outlying observations or non-normal distribution
Scull capacity of the Maoris
Sculls from a cemetery
1230 1380 1364 1630 14101348 1260 1420 13601540 1380 1445 15451318 1470 1410 1378
Karl Pearson (1931) Tables for Statisticans and Biometricans, Biometric Lab., London
maximum
Is the largest scull from a Maori?
Hypothesis:
The Maoris have less scull capacity
than the whites - the largest scull is a contaminant
shipwrecked sailor or missionary?
Probability plot
Scull Capacity
What to do with the damned point destroying the curve?
The easy way: Erase it!
Example
P. Garrigues
R. De Sury
M. L. Angelin
J. Bellocq
J. L. Oudin
M. Ewald
Geochemica et Cosmochimica Acta, 52, (1988) 375-384
Data
S a m p l e N o . P r e d i c t o r R e s p o n s e( D e p t h ) )(
P
MP
2 1 2 9 0 0 . 9 23 1 5 9 0 1 . 1 64 1 7 7 0 1 . 3 05 1 9 2 0 2 . 0 96 2 2 5 0 1 . 8 07 2 4 8 0 1 . 9 48 2 6 7 0 1 . 5 09 2 8 0 5 2 . 3 8
1 0 3 0 1 5 2 . 6 11 1 3 1 8 0 2 . 5 7
r 2 = 0 . 9 8
?
?
Robust regression?
Two outliers
Useful tool to avoid thinking?
Sloppy data analyst can find relief in robust regression
Result of “pooled” regression
r=0.995
Observation
r=0.865 Two phenomena influencing the ratio (predictor)
)(P
MP
No prediction possible!
Parallel displacement - perfect result for the one who wants to be
“straight-lined”
Let the computer restore harmony and beauty