Date post: | 01-Jun-2018 |
Category: |
Documents |
Upload: | dyonisius-h-s-jewaru |
View: | 222 times |
Download: | 0 times |
of 81
8/9/2019 DISTRIBUSI Multivariate Normal
1/81
DISTRIBUSI MULTIVARIATE
NORMAL
Pertemuan 5 mtv
8/9/2019 DISTRIBUSI Multivariate Normal
2/81
Distribusi MultivariateNormal
• adala !euba a"a# berdistribusi p-variatenormal den$an ve#tor mean dan matri#s varian%#ovarians &oint !d'n(a da!at ditulis#an seba$ai)
den$an
• *! variate Normal+
•
8/9/2019 DISTRIBUSI Multivariate Normal
3/81
Distribusi MultivariateNormal
• n-. Univariate Normal
• n-/ Bivariate Normal
8/9/2019 DISTRIBUSI Multivariate Normal
4/81
N/*Bivariate Normal+
0
•
8/9/2019 DISTRIBUSI Multivariate Normal
5/81
Surface Plots of the bivariate
Normal distribution
8/9/2019 DISTRIBUSI Multivariate Normal
6/81
Contour dari Distribusi BivariateNormal
X1
X2
Semua pasangan titik (x,y) yang memiliki f(x,y) yang samadisebut suatu contour , dideniskan dalam ruang dimensip , semua nilai x sedikian sehingga
!ontours
( ) ( )− −'
-1 2x =Σ cx1 1
µ µ
f(X1, X2)
Bivariate Normal Response Surface
8/9/2019 DISTRIBUSI Multivariate Normal
7/81
"he !ontours
f(X1, X2)
X2
#here
Contour
unt suatukonstan
!
membentuk suatu elipsoid yang terpusat di µ dgn sumbu
X1
f(X1,
X2)
( ) ( )− −'
-1 2x =Σ cx1 11 1
µ µ
i i±cλ e
1 1±cλ e
2 2±cλ e
∑ i ii
ieλ e for i == 1, , p$
1
2
μμ =
μ1
8/9/2019 DISTRIBUSI Multivariate Normal
8/81
Contour Plots of the bivariate
Normal distribution
8/9/2019 DISTRIBUSI Multivariate Normal
9/81
Scatter Plots of data from the
bivariate Normal distribution
8/9/2019 DISTRIBUSI Multivariate Normal
10/81
Bentuk umum dari !ontours untuk suatu bivariate normalprobability distribution dengan e%ual varian!e (
σ&& ' σ)
dapat diturunkan sbb
"ntukan eigenvalues dari Σ
( )
( ) ( )
,
2 211 12 11 12
12 11
11 12 11 12
1 11 12 2 11 12
Σ - λI = 0 or
σ - λ σ0 = =σ - λ - σσ σ - λ
=λ - σ - σ λ - σ + σ
so λ = σ + σ λ = σ - σ
1 1
8/9/2019 DISTRIBUSI Multivariate Normal
11/81
$emudian tentukan eigenve!tors (normalili*ed)dari Σ
( )
( )
i i i
11 12 1 11
12 11 2 2
11 1 12 2 11 12 1
12 1 11 2 11 12 2
1 2 1
2 11 12 2
Σe = λ e or
σ σ e e=λσ σ e e
or σ e + σ e = σ + σ e
σ e + σ e = σ + σ e
1 2 which implies e = e or e =
1 2
1 2and λ = σ - σ similarl leads !o e =
-1 2
11 1
1
1
8/9/2019 DISTRIBUSI Multivariate Normal
12/81
+ ntuk nilai positive !ovarian!eσ&, merupakan eigenvalus
terbesar (eigenvalue yang pertama) dan eigenve!tor yangbersesuaian terletak sepan-ang garis ./0 melalui !entroid
µ
f(X1, X2)
X2
!ontour
for!onstant
X11pa yang ter-adi -ika !ovarian!e bernilai negative23engapa2
f(X1,
X2)
11 12cσ - σ( ) ( )− −
'-1c = xΣ x
1 11 1
µ µ
11 12cσ + σ
8/9/2019 DISTRIBUSI Multivariate Normal
13/81
+ for a negative !ovarian!eσ&, maka tmerupakan
eigenvalues terbesarhe se!ond eigenvalue and itsasso!iated eigenve!tor lie at right angles to the ./0 line
running through the !entroid µ
f(X1, X2)
X2
!ontour
for!onstant
X14hat do you suppose happens #hen the !ovarian!e is*ero2 4hy2
f(X1,
X2)
11 12cσ - σ( ) ( )− −
'-1c = xΣ x
1 11 1
µ µ
11 12cσ + σ
8/9/2019 DISTRIBUSI Multivariate Normal
14/81
8/9/2019 DISTRIBUSI Multivariate Normal
15/81
+ for !ovarian!eσ& of *ero the t#o eigenvalues and
eigenve!tors are e%ual (ex!ept for signs) + one runs alongthe ./0 line running through the !entroid µ and the otheris perpendi!ular
f(X1, X2)
X2
!ontour
for!onstant
X14hat do you suppose happens #hen the !ovarian!e is*ero2 4hy2
11 12cσ - σ( ) ( )− −
'-1c = xΣ x
1 11 1
µ µ
11 12cσ + σ
8/9/2019 DISTRIBUSI Multivariate Normal
16/81
8/9/2019 DISTRIBUSI Multivariate Normal
17/81
8/9/2019 DISTRIBUSI Multivariate Normal
18/81
6 "he density
is symmetri! along its !onstant density !ontours and is!entered at
µ
, i6e6, the mean is e%ual to the median:
;6 6 Conditional distributions of the !omponents of 5 are(multivariate) normal
( )
( ) ( )− − −=
Σ
'-1
xΣ x
21 2 p 2
1f#x$ e
2
1 11 1
1
8/9/2019 DISTRIBUSI Multivariate Normal
19/81
D6 Some ?mportant @esults@egarding the 3ultivariateNormal Distribution
&6 ?f 5 Np(µ,Σ), then any linear !ombination
9urthermore, if aA5 Np(µ,Σ) for every a, then 5 Np(µ,Σ)
( )∑ p
' ' '
i i p
i = 1
a ( = a ( ) * aμ, a Σa1 11 1 1 1
1
8/9/2019 DISTRIBUSI Multivariate Normal
20/81
6 ?f 5 Np(µ,Σ), then any set of % linear !ombinations
9urthermore, if d is a !onformable ve!tor of !onstants,then 5 d Np(µ d,Σ)
( )
∑
∑
∑
p
1i i
i=1
p
2i i' ' '
i=1
p
i i
i=1
= )
a (
a ( ( * μ, Σ
a (
11 1 11113
8/9/2019 DISTRIBUSI Multivariate Normal
21/81
;6 ?f 5 Np(µ,Σ), then all subsets of 5 are (multivariate)
normally distributed, i6e6, for any partition
then 5& N%(µ&, Σ&&), 5 Np+%(µ, Σ)
( )
( )
( )( )
( )
( )
( )( )
( )
( ) ( )( )
( )( ) ( ) ( )( )
___ , ___ ,
÷ ÷
÷ ÷ ÷ ÷ ÷ ÷ ÷ ÷
11 121 1x x p- x1 x1
px1 pxp px1
2 2 21 22 p- x1 p- x1 p- x p- x p-
Σ Σ(
( = =Σ =
(Σ Σ
8/9/2019 DISTRIBUSI Multivariate Normal
22/81
.6 ?f 5& N%&(µ&,Σ&&) and 5 N%(µ,Σ) are independent, then
Cov(5&, 5) ' Σ& ' 0
and if
then 5& and 5 are independent i Σ& ' 0
and if 5& N%&(µ&,Σ&&) and 5 N%(µ,Σ) and are
independent, then
÷ ÷ ÷
,1 1 11
1 1 11
1 2
11 11 12
+
2 2 21 22
(Σ Σ) *
(Σ Σ
÷ ÷ ÷
,1 11
1 11
1 2
11 11
+
2 2 22
(Σ 0) *
( 0Σ
8/9/2019 DISTRIBUSI Multivariate Normal
23/81
and
the Np(µ,Σ) distribution assigns probability & 7 α to the
solid ellipsoid
/6 ?f 5 Np(µ,Σ) and Σ E 0, then
( ) ( )− −'
-1 pxΣ x ) &
1 11 1
( ) ( ) ( ){ }− −'
-1 2
px &xΣ x %1 11 1
µ µ ≤
8/9/2019 DISTRIBUSI Multivariate Normal
24/81
>6
8/9/2019 DISTRIBUSI Multivariate Normal
25/81
H6Sampling 9rom a 3ultivariateNormal Distribution and
3aximum
8/9/2019 DISTRIBUSI Multivariate Normal
26/81
3aximum
8/9/2019 DISTRIBUSI Multivariate Normal
27/81
9or a k x k symmetri! matrix 1 and a k x & ve!tor x
+ xA1x ' tr(xI1x) ' tr(1xxA)
+ tr(1) ' #hereλi, ? ' &F, k are the eigenvalues of 1
"hese t#o results !an be used to simplify the -oint density ofn mutually independent random observations 5 -Is, ea!h
have distribution Np(µ,Σ) 7 #e rst re#rite
∑
i
i=1
λ
( ) ( ) ( ) ( )( ) ( )
' '-1 -1
. . . .
'-1
. .
x -μ Σ x - μ = !r x - μ Σ x - μ
=Σ x - μ x - μ
1 1 1 1 1 11 1 1 1
1 1 11 1
8/9/2019 DISTRIBUSI Multivariate Normal
28/81
"hen #e re#rite
sin!e the tra!e of the
sum of matri!es ise%ual to the sum of
their individual tra!es
( ) ( ) ( ) ( )
( ) ( )
( ) ( )
÷
∑ ∑∑
∑
n n' '-1 -1
. . . .
.=1 .=1
n '-1
. .
.=1
n '-1
. .
.=1
x -μ Σ x - μ = !r x - μ Σ x - μ
= !rΣ x - μ x - μ
= !rΣ x - μ x - μ
1 1 1 1 1 11 1 1 1
1 1 11 1
1 1 11 1
8/9/2019 DISTRIBUSI Multivariate Normal
29/81
4e !an further state that
Be!ause the!rossprodu!t
terms
are both matri!esof *eros
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
'
'
∑ ∑∑ ∑
∑
n n' '
. . . ..=1 .=1
n n '
. ..=1 .=1
n '
. ..=1
x -μ x - μ = x - x + x - μ x - x + x - μ
= x - x x - x + x -μ x - μ
= x - x x - x + n x -μ x - μ
1 1 1 1 1 1 1 11 1 1 1
1 1 1 1 1 11 1
1 1 1 1 1 11 1
( ) ( )
( ) ( )
'
∑
∑
n
.
.= 1n
'
.
.= 1
x - x x -μ and
x -μ x - x
1 1 1 1
1 1 11
8/9/2019 DISTRIBUSI Multivariate Normal
30/81
Substitution of these t#o results yield an alternativeexpression of the -oint density of a random sample from ap+dimensional population
Substitution of the observed values x&,F,xn into the -oint
density yields the likelihood fun!tion for the !orrespondingsample 5, #hi!h is often denoted as
8/9/2019 DISTRIBUSI Multivariate Normal
31/81
So for observed values x&,F,xn that !omprise random
sample 5 dra#n from a p+dimensional normally distributedpopulation, the likelihood fun!tion is
( )
( ) ( ) ( ) ( )'
÷ ÷
∑
Σ
n '-1
. .
.= 1
-!rΣ x -x x -x +n x-μ x-μ
2n 2np 2
1#μ, Σ$ = e2
1 1 1 1 1 1 11 1
111
π
8/9/2019 DISTRIBUSI Multivariate Normal
32/81
9inally, note that #e !an express the exponent of thelikelihood fun!tion in many #ays 7 one parti!ular alternateexpression #ill be parti!ularly !onvenient
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
'
'
'
÷
÷ ÷
÷
∑
∑
∑
n '-1
. .
.=1
n '-1 -1
. .
.= 1
n '-1 -1
. .
.= 1
!rΣ x - x x - x + n x - μ x - μ
= !rΣ x - x x - x + n !r Σ x - μ x - μ
= !rΣ x - x x - x + n x - μ Σ x - μ
1 1 1 1 1 1 11 1
1 1 1 1 1 1 1 11 1
1 1 1 1 1 1 1 11 1
8/9/2019 DISTRIBUSI Multivariate Normal
33/81
#hi!h, by another substitution, yields the likelihoodfun!tion
1gain, keep in mind that #e are pursuing estimates ofµ
and
Σ
that maximi*e the likelihood fun!tion
8/9/2019 DISTRIBUSI Multivariate Normal
34/81
"his result #ill also be helpful in deriving the maximumlikelihood estimates of
µ
andΣ
6
9or a p x p symmetri! positive denite matrix B and s!alarb E 0, it follo#s that
for all positive deniteΣ
of dimension p x p, #ith e%ualityholding only for
( )
( )≤
-1-!rΣ 3 p -p
2
1 1e 2 e
Σ 3
1 1
1 1
1Σ = 3
21 1
8/9/2019 DISTRIBUSI Multivariate Normal
35/81
No# #e are ready for maximum likelihood estimation of µ and
Σ
6
9or a random sample 5&,F,5n from a normal population#ith mean µ and !ovarian!e Σ, the maximum likelihoodestimators
µ
andΣ
ofµ
andΣ
are
"heir observed values for observed data x&,F,xn
J J
are the maximum likelihood estimates of µ and Σ6
( ) ( )∑ '
ˆˆ1 1 1 1 1 11
n
. .
.= 1
1 n - 1μ = (, Σ = ( - ( ( - ( = 4n n
( ) ( )'
∑n
. .
.=1
1x and x - x x - x
n1 1 1 1 1
8/9/2019 DISTRIBUSI Multivariate Normal
36/81
Note that the maximum of the likelihood is a!hieved at
and sin!e
#e have that
generali*ed
varian!e
!onstant
( )ˆˆ
÷ ÷ ÷ ÷ ÷ ÷ ÷ ÷ ÷ ÷ ÷ ÷
÷ ÷
np-np n2- -
2 2np 2 n 2
p
1 1 n - 1#μ,Σ$ = e = 2 e 4
n2 n - 14
n
1 11
1
π
π
( )ˆˆ
ˆ
÷ ÷ ÷ ÷ ÷
np
- 2np 2 n 21 1#μ, Σ$ = e
2 Σ111
π
ˆ ÷
pn - 1Σ = 4
n1 1
8/9/2019 DISTRIBUSI Multivariate Normal
37/81
?t !an be sho#n that maximum likelihood estimators (or3
8/9/2019 DISTRIBUSI Multivariate Normal
38/81
?t !an be also be sho#n that
are suK!ient for the multivariate normal -oint density
i6e6, the density depends on the entire set of observationsx&,F,xn only through
"hus, #e refer to 5 and S as the suK!ient statisti!s for themultivariate normal distribution6
SuK!ient Statisti!s !ontain all information ne!essary toevaluate a parti!ular density for a given sample6
L
( ) ( )x and n - 1 4 or 4
( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
∑
∑
'
'
1 1 1 1 1 1 11 1
1
1
1 1 1 1 1 1 1 11 1
n'
-1. .
.=1
-!rΣ x -x x -x +n x-μ x-μ
2n 2np 2
n '-n 2-np 2 -1
. .
.=1
1f#x$ = e
2Σ
1 = 2Σ exp - !r Σ x - x x - x + n x - μ x - μ
2
π
π
( ) ( ) .x and n + & S or S
8/9/2019 DISTRIBUSI Multivariate Normal
39/81
96 "he Sampling Distributions of 5and S
"he assumption that 5&,F,5n !onstitute a random sample
#ith mean µ and !ovarian!e Σ !ompletely determines thesampling distributions of 5 and S6
9or a univariate normal distribution, 5 is normal #ith
L
1nalogously, for the multivariate (p≥
) !ase (i6e6, 5 is
normal #ith mean µ and !ovarian!e Σ), 5 is normal #ith
L
L
21 pop5la!ion 6ariance mean μ and 6ariance σ =
n sample si7e
1
1 mean μ and co6ariance ma!rix Σ
n
8/9/2019 DISTRIBUSI Multivariate Normal
40/81
Similarly, for random sample 5&,F, 5n from a univariate
normal distribution #ith meanµ
and varian!eσ
1nalogously, for the multivariate (p ≥ ) !ase (i6e6, 5 isnormal #ith mean
µ
and !ovarian!eΣ
), S is Wishart
distributed (denoted 4m(
Σ) #here
#here
( )
∑
1
11
m
m '
. ..=1
8 9Σ = 8ishar! dis!ri5!ion wi!h m de:rees of freedom
= dis!ri5!ion of ; ;
⋅
( ) ( )∑ ∑1 1
n n-122 2 2 2
. n-1 .
.=1 .=1
n - 1 s = ( - ( )& = σ ;
( )2 2.; ) * 0,σ , . = 1,
8/9/2019 DISTRIBUSI Multivariate Normal
41/81
Some important properties of the 4ishart distribution
+ "he 4ishart distribution exists only if n E p
+ ?f
then
independently of !ommon!ovarian
!ematrix
+ and
( )1 1 1
11 m 1 ) 8 9Σ
( )1 1 1
22 m 2 ) 8 9Σ
( )1 1 1 1 1
1 21 2 m +m 1 2 + ) 8 + 9Σ
( )1' ' '
1 m 1 ) 8 9 Σ
8/9/2019 DISTRIBUSI Multivariate Normal
42/81
+ 4hen it exists, the 4ishart distribution has a density of
for a positive symmetri! denite matrix 16
( )( ) ( )
( ) ( ) ( ) ( ) ∏
11
11 1
1
-1n-p-2 2 -!r Σ 2
n-1 pn-1 2 p n-1 2 p p-1 >
i=1
e8 9Σ =
12? Σ @ n - i
2
8/9/2019 DISTRIBUSI Multivariate Normal
43/81
96
8/9/2019 DISTRIBUSI Multivariate Normal
44/81
+ "he
8/9/2019 DISTRIBUSI Multivariate Normal
45/81
3ultivariate impli!ations of the
8/9/2019 DISTRIBUSI Multivariate Normal
46/81
"hese statements are sometimes #ritten as
and
or similarly
→ 1 p
n -B ( - μ B 1
→ ∞
≤
→ 11 1 1 p
n -B 4 - Σ B 1→ ∞
≤
→ 11 1 1 p
n n -B 4 - Σ B 1→ ∞≤
8/9/2019 DISTRIBUSI Multivariate Normal
47/81
+ "hese results !an be used to support the (3ultivariate)Central
8/9/2019 DISTRIBUSI Multivariate Normal
48/81
Be!ause the sample !ovarian!e matrix S (or Sn) !onverges
to the population !ovarian!e matrixΣ
so %ui!kly (i6e6, at
relatively small values of n 7 p), #e often substitute thesample !ovarian!e for the population !ovarian!e #ith little!on!ern for the rami!ations 7 so #e have
6
for n large relative to p6
"his !an be restated as
again for n large relative to p6
6
1 11 n
1( ) *μ, 4n
( ) 1 1 11
nn ( -μ ) * 0,4
8/9/2019 DISTRIBUSI Multivariate Normal
49/81
Qne nal important result due to the C
8/9/2019 DISTRIBUSI Multivariate Normal
50/81
R6 1ssessing the 1ssumption ofNormality
"here are t#o general !ir!umstan!es in multivariatestatisti!s under #hi!h the assumption of multivariatenormality is !ru!ial
+ the te!hni%ue to be used relies dire!tly on the ra#observations 5
-
+ the te!hni%ue to be used relies dire!tly on sample mean
ve!tor 5 - (in!luding those #hi!h rely on distan!es of theform n(5 7 µ)AS+&(5 7 µ))
?n either of these situations, the %uality of inferen!es tobe made depends on ho# !losely the true parentpopulation resembles the assumed multivariate normalform:
8/9/2019 DISTRIBUSI Multivariate Normal
51/81
Based on the properties of the 3ultivariate NormalDistribution, #e kno#
+ all linear !ombinations of the individual normal are normal
+ the !ontours of the multivariate normal density are!on!entri! ellipsoids
"hese fa!ts suggest investigation of the follo#ing %uestions(in one or t#o dimensions)
+ Do the marginal distributions of the elements of 5 appearnormal2 4hat about a fe# linear !ombinations2
+ Do the bivariate s!atterplots appear ellipsoidal2
+ 1re there any unusual looking observations (outliers)2
8/9/2019 DISTRIBUSI Multivariate Normal
52/81
"ools fre%uently used for assessing univariate normalityin!lude
+ the empiri!al rule
+ dot plots (for small samples sets) and histograms or stem leaf plots (for larger samples)
+ goodness+of+t tests su!h as the Chi+S%uare RQ9 "est andthe $olmogorov+Smirnov "est
+ the test developed by Shapiro and 4ilk M&T>/O !alled theShapiro+4ilk test
+ U+U plots (of the sample %uantiles against the expe!ted%uantile for ea!h observation given normality)
≤ ≤
≤ ≤
≤ ≤
'#μ - 1σ x μ + 1σ$ 0
8/9/2019 DISTRIBUSI Multivariate Normal
53/81
Hxample 7 suppose #e had the follo#ing fteen (ordered)sample observations on some random variable 5
Do these data support the
assertion that they #eredra#n from a normal parentpopulation2
Ordered
Observations
x(j)
1!"
1#2
2!#
2!$2%&
!'"
!!&
##1
##$#&%
&!#
&$$
$%2
%!2
8/9/2019 DISTRIBUSI Multivariate Normal
54/81
?n order to assess normality by the the empiri!al rule, #eneed to !ompute the generali*ed distan!e from the !entroid(!onvert the data to a standard normal random variable) 7for our data #e have
so the !orresponding standard
normal values for our data are
Nine of the observations (or >0V) lie#ithin one standard deviation of themean, and all fteen of the
observations lie #ithin t#o standarddeviation of the mean 7 does thissupport the assertion that they#ere dra#n from a normal parentpopulation2
x = F
8/9/2019 DISTRIBUSI Multivariate Normal
55/81
+& 0 & ; . / > W X T &0&&
6
66 6 6 6 6 6 666 6 6 6 6
1 simple dot plot !ould look like this
"his doesnAt seem to tell us mu!h (of !ourse, fteen datapoints isnAt mu!h to go on)6
Yo# about a histogram2
"his doesnAt seem totell us mu!h either:
/isto0ram
'
1
2
"
!
#
' + 2 2 + ! ! + # # + $ $ + 1'
lasses
bsolute
3re.uec4
4e !ould use S1S to !al!ulate the Shapiro+4ilk test statisti!d di l
8/9/2019 DISTRIBUSI Multivariate Normal
56/81
and !orresponding p+value
,J, s!5ffK
I*LJ xK
,3M x='Nser6ed /al5es of ('K,O4K
1G
1
8/9/2019 DISTRIBUSI Multivariate Normal
57/81
Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.935851 Pr W 0.3331 !olmo"oro#-Smir$o# % 0.159&93 Pr ' % '0.1500 (ramer-#o$ )ises W-S* 0.058+,+ Pr ' W-S* '0.500 $/erso$-%arli$" -S* 0.3,,15 Pr ' -S* '0.500
Stem eaf 2oplot 9 & 1 4 8 9 1 4 + 59 ----- , ,+8 3 4 4 5 8 1 6----6 & 05 4 4 3 0 1 4 4
55 ----- 1 &, 4 ----------------
Normal Pro7a7ility Plot 9.5 6 4 6 4 66
4 666 5.5 6 4 66 4 4 6 6 6 1.5 6 6 ---------------------------------------- - -1 0 1
Qr a U+U plot
8/9/2019 DISTRIBUSI Multivariate Normal
58/81
+ put the observed values in as!ending order + !all these thex(-)
+ !al!ulate the !ontinuity !orre!ted !umulative probabilitylevel (- 7 06/)Zn for the sample data
+ nd the standard normal %uantiles (values of the N(0,&)distribution) that have a !umulative probability of level (- 706/)Zn 7 !all these the %(-), i6e6, nd * su!h that
+ plot the pairs (%(-), x(-) )6 ?f the points lie onZnear a straight
line, the observations support the !ontention that theycould have been dra#n from a normal parent population6( ) ( )
≤
2-7 2
. .
1. -
1 2 p 7 + = e d7 = p =n2?
8/9/2019 DISTRIBUSI Multivariate Normal
59/81
"he results of !al!ulations for the U+U plot look like this
Ordered
Observations
x(j)
djusted
-robabilit45evel
(j+')6n
Standard
Normaluantiles
.(j)
1!" ''"" +1$"!
1#2 '1'' +12$2
2!# '1#& +'%#&
2!$ '2"" +'&2$2%& '"'' +'2!
!'" '"#& +'"!1
!!& '!"" +'1#$
''' ''''
##1 '#& '1#$
##$ '#"" '"!1#&% '&'' '2!
&!# '& '&2$
&$$ '$"" '%#&
$%2 '%'' 12$2
%!2 '%#& 1$"!
8/9/2019 DISTRIBUSI Multivariate Normal
60/81
Fand the resulting U+U plot looks like this
"here donAt appear to be great departures from thestraight line dra#n through the points, but it doesnAt t
terribly #ell, eitherF
+ -lot
Standard Normal uantiles .(j)
Observed
alues x j
8/9/2019 DISTRIBUSI Multivariate Normal
61/81
8/9/2019 DISTRIBUSI Multivariate Normal
62/81
9or our previous example, the intermediate !al!ulations aregiven in the table belo#
x(j) + x (x(j) + x)2 .(j) + . (.(j) + .)
2 (x(j) + x)(.(j) + .)
+"$" 1!#%& +1$"! ""#" &'"1
+"# 1""1! +12$2 1#!2 !#
+2$' &$ +'%#& '%"# 2&11
+2&% &&&& +'&2$ '"' 2'"'
+2"' 2&' +'2! '2& 12'!
+12" 11! +'"!1 '11# '!1%
+'$' '#"& +'1#$ ''2$ '1"!
'!% '2!! '''' '''' ''''
1" 1$1' '1#$ ''2$ '22#
1!1 2''1 '"!1 '11# '!$2
12 2"1 '2! '2& '&%$21% !$'$ '&2$ '"' 1%#
2#2 #$#' '%#& '%"# 2"!
"## 1""$& 12$2 1#!2 !#$%
!1 1&2" 1$"! ""#" !
''' %%&2! '''' 1"&$1 "#1!"
8/9/2019 DISTRIBUSI Multivariate Normal
63/81
Hvaluation of the 8earsonAs !orrelation !oeK!ient bet#een%(-) and x(-) yields
"he sample si*e is n ' &/, so !riti!al points for the test ofnormality are 06T/0; at α ' 06&0, 06T;XT at α ' 060/, and06T&> at
α
' 060&6 "hus #e do not re-e!t the hypothesis ofnormality at any
α
larger than 060&6
( )( ) ( )( )
( )( ) ( )( )
∑
∑ ∑
n
. .
.=1
Qn n2 2
. .
.=1 .=1
x - x -
r =
x - x -
GCG =
EE 1G
8/9/2019 DISTRIBUSI Multivariate Normal
64/81
4hen addressing the issue of multivariate normality, thesetools aid in assessment of normality for the univariatemarginal distributions6 Yo#ever, #e should also !onsiderbivariate marginal distributions (ea!h of #hi!h should be
normal if the overall -oint distribution is multivariatenormal)6
"he methods most !ommonly used for assessing bivariatenormality are
+ s!atter plots
+ Chi+S%uare 8lots
Hxample suppose #e had the follo#ing fteen (ordered)
8/9/2019 DISTRIBUSI Multivariate Normal
65/81
Hxample 7 suppose #e had the follo#ing fteen (ordered)sample observations on some random variables 5& and 5
Do these data support the
assertion that they #ere dra#nfrom a bivariate normal parentpopulation2
x j1 x j2
1!" +'#%
1#2 +''
2!# +11"
2!$ +2'
2%& +#"%
!'" 2$&
!!& +&$$
+"%&
##1 2"2
##$ +"2!
#&% +"#
&!# 1#1
&$$ +1$&
$%2 +##'
%!2 +&#!
"he s!atter plot of pairs (x x ) support the assertion that
8/9/2019 DISTRIBUSI Multivariate Normal
66/81
"he s!atter plot of pairs (x&, x) support the assertion that
these data #ere dra#n from a bivariate normal distribution(and that they have little or no !orrelation)6
Scatter -lot
X1
X2
"o !reate a Chi S%uare plot #e #ill need to !al!ulate the
8/9/2019 DISTRIBUSI Multivariate Normal
67/81
"o !reate a Chi+S%uare plot, #e #ill need to !al!ulate thes%uared generali*ed distan!e from the !entroid for ea!hobservation x -
9or our bivariate data #e have
( ) ( )1 1 1 11
'2 -1
. . .d = x - x 4 x - x ,. = 1, ,n$
→
1
-1
F
8/9/2019 DISTRIBUSI Multivariate Normal
68/81
Fso the s%uared generali*ed distan!es from the !entroidare
if #e order theobservationsrelative to theirs%uaredgenerali*eddistan!es →
x j1 x j2 d2 j
+"%& ''%'
##$ +"2! '2$1
#&% +"# '"""
&$$ +1$& 11"$
2!# +11" 1""#
2!$ +2' 1!$
2%& +#"% 1&"%
!!& +&$$ 2''
1#2 +'' 22&%
1!" +'#% 2!''
&!# 1#1 2#22
$%2 +##' 2#$#
##1 2"2 2&"&
!'" 2$& 2%
%!2 +&#! "$1%
x j1 x j2 d2 j
1!" +'#% 2!''
1#2 +'' 22&%
2!# +11" 1""#
2!$ +2' 1!$
2%& +#"% 1&"%
!'" 2$& 2%
!!& +&$$ 2''
+"%& ''%'
##1 2"2 2&"&
##$ +"2! '2$1
#&% +"# '"""
&!# 1#1 2#22
&$$ +1$& 11"$
$%2 +##' 2#$#
%!2 +&#! "$1%
4 th d th di til
!h1. -
2
8/9/2019 DISTRIBUSI Multivariate Normal
69/81
4e then nd the !orresponding per!entile
No# #e !reate a s!atterplot of the pairs
(d -&, %!,M(-+6/)ZnO) ?f these points lie on astraight line, the datasupport the assertionthat they #ere dra#n
from a bivariate normalparent population6
of the Chi+S%uare distribution #ith p degrees of freedom6
x j1 x j2 d2 j (j+')6n .c,27(j+')6n8
+"%& ''%' ''"" ''#$
##$ +"2! '2$1 '1'' '211
#&% +"# '""" '1#& '"#&$$ +1$& 11"$ '2"" '"1
2!# +11" 1""# '"'' '&1"
2!$ +2' 1!$ '"#& '%1!
2%& +#"% 1&"% '!"" 11"#
!!& +&$$ 2'' ''' 1"$#
1#2 +'' 22&% '#& 1#&21!" +'#% 2!'' '#"" 2''&
&!# 1#1 2#22 '&'' 2!'$
$%2 +##' 2#$# '& 2%11
##1 2"2 2&"& '$"" "$!
!'" 2$& 2% '%'' !#'
%!2 +&#! "$1% '%#& #$'2
2100n
"hese data donAt seem to support the assertion that they#ere dra#n from a bivariate normal parent populationF
8/9/2019 DISTRIBUSI Multivariate Normal
70/81
#ere dra#n from a bivariate normal parent populationF
possibleoutliers:
9i+S.uare -lot
.c,27(j+')6n8
d2
(j)
S t l l ki t if hl h lf th
8/9/2019 DISTRIBUSI Multivariate Normal
71/81
Some suggest also looking to see if roughly half thes%uared distan!es d - are less than or e%ual to %!,p(06/0)
(i6e6, lie #ithin the ellipsoid !ontaining /0V of all potentialp+dimensional observations)6
9or our example, W of our fteen observations (about.>6>WV) of all observations are less than %!,p(06/0) ' &6;X>
standardi*ed units from the !entroid (i6e6, lie #ithin theellipsoid !ontaining /0V of all potential p+dimensionalobservations)6
Note that the Chi+S%uare plot !an easily be extended to pE dimensions6
Note also that some resear!hers also !al!ulate the!orrelation bet#een d
-&
and %!,p
M(-+6/)ZnO6 9or our example
this is 06XT/6
Y Qutlier Dete!tion
8/9/2019 DISTRIBUSI Multivariate Normal
72/81
Y6 Qutlier Dete!tion
Dete!ting outliers (extreme or unusual observations) in pE dimensions is very tri!ky6 Consider the follo#ing
situation
T0V!onden!eellipsoid
T0V
!onden!e
T0V!onden!einterval for
5
5
5&
1 strategy for multivariate outlier dete!tion
8/9/2019 DISTRIBUSI Multivariate Normal
73/81
gy
+
8/9/2019 DISTRIBUSI Multivariate Normal
74/81
Yere are !al!ulated standardi*ed values (* -iAs) and s%uared
generali*ed distan!es (d -As) for our previous data
"his one looks a little
unusual in p '
x j1 * j1 x j2 * j2 d2 j
'1$ +"%& +'2' ''%'
##$ '"' +"2! +''!" '2$1
#&% '&' +"# +'1"1 '"""
&$$ '%$1 +1$& '"!& 11"$
2!# +1'' +11" '# 1""#
2!$ +1'! +2' +'%% 1!$
2%& +'$#' +#"% +'%"# 1&"%
!!& +'2%% +&$$ +1"% 2''
1#2 +1"#& +'' +'!1 22&%
1!" +1!"# +'#% '#$1 2!''
&!# '$22 1#1 1""" 2#22
$%2 1"&1 +##' +'%%! 2#$#
##1 ''! 2"2 1"# 2&"&
!'" +'!#1 2$& 1#%1 2%
%!2 1# +&#! +12%1 "$1%
? "ransformations to Near
8/9/2019 DISTRIBUSI Multivariate Normal
75/81
?6 "ransformations to NearNormality
"ransformations to make nonnormal data approximatelynormal are usually suggested by
+ theory
+ the ra# data
Some !ommon transformations in!lude
Qriginal S!ale "ransformed S!ale
Counts y
8roportions p
Correlations rJ
( )
ˆˆ
ˆ
1 plo:i! p = lo:
2 1 - p
( )
1 1 + rRisher's 7 r = lo:
2 1 - r
i d i bl i
8/9/2019 DISTRIBUSI Multivariate Normal
76/81
9or !ontinuous random variables, an appropriatetransformation !an usually be found among the family ofpo#er 7 Box and Cox M&T>.O suggest an approa!h tonding an appropriate transformation from this family6
Box and Cox !onsider the slightly modied family of po#ertransformations
( )
( )
λ
λ
x - 1 λ 0
x = λ
ln xλ = 0
≠
9 b i h B C h i f i
8/9/2019 DISTRIBUSI Multivariate Normal
77/81
9or observations x&,F,xn, the Box+Cox !hoi!e of appropriate
po#er λ for the normali*ing transformation is that #hi!hmaximi*es
#here
and
( ) ( ) ( )( ) ( ) ( )
∑ ∑n n2
λ λ
. . .
.=1 .=1
n 1λ = - ln x - x + λ - 1 ln x
2 nl
( )
( )
λ
λ
x - 1 λ 0
x = λ
ln xλ = 0
≠
( ) ( )( )
∑λn.λ λ
. .
.=1
x - 11 1x = x =
n nλ
4 th l t l ( ) t i t h t i t l
8/9/2019 DISTRIBUSI Multivariate Normal
78/81
4e then evaluate l (λ
) at many points on an short interval(say M+&,&O or M+,O), plot the pairs (
λ
, l (λ
)) and look for amaximum point6
Qften a logi!al value ofλ
nearλ
\ is !hosen6
l (λ
)
λ
l (λ
)
λ
nfortunately, l is very volatile asλ
!hanges (#hi!h
8/9/2019 DISTRIBUSI Multivariate Normal
79/81
!reate some other analyti! problems to over!ome)6 "hus#e !onsider another transformation to avoid thisadditional problem
#here
is the geometri! mean of the responses and is fre%uently!al!ulated as the antilog of
( )
( )
÷
∏
λ λ
. .
< λ-11 n
λ-1 n
.λi
.i=1
<
x - 1 x - 1 = for λ 0
λxλ x =
xln x for λ = 0
≠
÷ ∏
1
n n<
ii=1
= xx
( )( )
∑
n<
-1
ii=1
= n ln xln x
<λ 1
8/9/2019 DISTRIBUSI Multivariate Normal
80/81
"heλ
that results in minimum varian!e of this transformedvariable also maximi*es our previous !riterion
and is the nth po#er of the appropriate [a!obian ofthe transformation (#hi!h !onverts the responses (xiAs into
As)6
9rom this point for#ard pro!eed substituting the Is forthe As in the previous analysis6
( ) ( ) ( )( ) ( ) ( ) ∑ ∑n n2
λ λ. . .
.=1 .=1
n 1λ = - ln x - x + λ - 1 ln x2 n
l
λ-1x
( )λ.
( )λ.
x
( )λ
.
8/9/2019 DISTRIBUSI Multivariate Normal
81/81
Note that
+ the value of λ generated by the Box+Cox transformation isonly optimal in a mathemati!al sense 7 use something !losethat has some meaning6
+ an approximate !onden!e interval forλ
!an be found
+ other means for estimatingλ
exist
+ if #e are dealing #ith a response variable, transformationsare often use to Istabili*eA the varian!e
+ for a p+dimensional sample, transformations are!onsidered independently for each of the p variables
+ #hile the Box+Cox methodology may help !onvert ea!hmarginal distribution to near normality, it does notguarantee the resulting transformed set of p variables #ill
have a multivariate normal distribution6