Date post: | 13-Dec-2015 |
Category: |
Documents |
Upload: | elisabeth-sharp |
View: | 240 times |
Download: | 4 times |
The Multivariate Gaussian
Jian Zhao
Outline
What is multivariate Gaussian? Parameterizations Mathematical Preparation Joint distributions, Marginalization and
conditioning Maximum likelihood estimation
What is multivariate Gaussian?
Where x is a n*1 vector, ∑ is an n*n, symmetric matrix
11/ 2/ 2
1 1( | , ) exp{ ( ) ( )}
2(2 )T
np x x x
2
1 1 2
211 2 2
,
,
x x x
x x x
Geometrical interpret
Exclude the normalization factor and exponential function, we look the quadratic form
1( ) ( )Tx x
This is a quadratic form. If we set it to a constant value c, considering the case that n=2, we obtained
22
2 211 1 12 1 2 22a x a x x a x
Geometrical interpret (continued)
This is a ellipse with the coordinate x1 and x2.
Thus we can easily image that when n increases the ellipse became higher dimension ellipsoids
Geometrical interpret (continued)
Thus we get a Gaussian bump in n+1 dimension
Where the level surfaces of the Gaussian bump are ellipsoids oriented along the eigenvectors of ∑
Parameterization
1
1
( )
( )( )TE x
E x x
Another type of parameterization, putting it into the form of exponential family
1( log(2 ) log | | )
2Ta n
1( | , ) exp{ }
2T Tp x a x x x
Mathematical preparation
In order to get the marginalization and conditioning of the partitioned multivariate Gaussian distribution, we need the theory of block diagonalizing a partitioned matrix
In order to maximum the likelihood estimation, we need the knowledge of traces of the matrix.
Partitioned matrices
Consider a general patitioned matrix
E FM
G H
To zero out the upper-right-hand and lower-left-hand corner of M, we can premutiply and postmultiply matrices in the following form
1
1
0 0
0 0
I FH E F I E FH G
I G H H G I H
Partitioned matrices (continued)
Define Schur complement of Matrix M with respect to H , denote M/H as the term 1E FH G
Since1 1 1 1 1
1 1
( )XYZ Z Y X W
Y ZW X
1 1 1
1 1
1 1 1
1 1 1 1 1 1
0 ( / ) 0
0 0
( / ) ( / )
( / ) ( / )
E F I M H I FH
G H H G I H I
M H M H FH
H G M H H H G M H FH
So
Partitioned matrices (continued)
Note that we could alternatively have decomposed the matrix m in terms of E and M/E, yielding the following expression for the inverse.
1 1
1 1 1
0 0
0 00 0
I E F E F EI E F I E F
GE I G H GE F H GE F HI I
1 1 1
11
1 1 1
11
1 1 1 1 1 1
1 1 1
00
0 0 ( / )
0( / )
0 ( / )
( / ) ( / )
( / ) ( / )
E F II E F E
G H GE II M E
IE E F M E
GE IM E
E E F M E GE E F M E
M E GE M E
Partitioned matrices (continued)
Thus we get
1 1 1 1 1 1 1( ) ( )E FH G E E F H GE F GE 1 1 1 1 1 1( ) ( )E FH G FH E F H GE F
At the same time we get the conclusion
|M| = |M/H||H|
Theory of traces
[ ]iii
i i
tr A a
[ ] [ ]T T Tx Ax tr x Ax tr xx A
Define
It has the following properties:
tr[ABC] = tr[CAB] = tr[BCA]
Theory of traces (continued)
[ ] kl lk jik lij ij
tr AB a b ba a
[ ] Ttr BA BA
so
[ ] [ ]T T T T Tx Ax tr xx A xx xxA A
Theory of traces (continued)
We want to show that
Since
Recall
This is equivalent to prove
Noting that
log | | TA AA
1
log | | | |ij ij
A Aa A a
1 1
| |A A
A
| |ij
A Aa
| | ( 1)i j ij ijj
A a M
Joint distributions, Marginalization and conditioning
1
2
11 12
21 22
1
1 1 11 12 1 1
1/ 2( ) / 22 2 21 22 2 2
1 1( | , ) exp{ }
2(2 )
T
p q
x xp x
x x
We partition the n by 1 vector x into p by 1 and q by 1, which n = p + q
Marginalization and conditioning
1
1 1 11 12 1 1
2 2 21 22 2 2
11 1 22
12 2 22 21 22
11 112 22
2 2
1 11 1 12 22 2 2 22
1exp{ }
2
0 ( / ) 01exp{
2 0
}0
1exp ( ( )) ( / ) (
2
T
T
T
x x
x x
x I
x I
xI
xI
x x x
11 1 12 22 2 2
12 2 22 2 2
( ))
1exp ( ) ( )
2T
x
x x
Normalization factor
1/ 2 ( ) / 2 1/ 2( ) / 222 22
/ 2 1/ 2 / 2 1/ 222 22
1 1
(2 ) (| / || |)(2 )
1 1
(2 ) (| / |) (2 ) (| |)
p qp q
p q
Marginalization and conditioning
Thus
12 2 2 22 2 2/ 2 1/ 2
22
1 1( ) exp ( ) ( )
(2 ) (| |) 2T
qp x x x
1 2 / 2 1/ 222
1 1 11 1 12 22 2 2 22 1 1 12 22 2 2
1( | )
(2 ) (| / |)
1exp ( ( )) ( / ) ( ( ))
2
p
T
p x x
x x x x
Marginalization & Conditioning
Marginalization
Conditioning
2 2
2 22
m
m
11|2 1 12 22 2 2
11|2 11 12 22 21
( )c
c
x
In another form
1|2 1 12 2
1|2 11
c
c
x
Marginalization
Conditioning
12 2 21 11 1
12 22 21 11 12
m
m
Maximum likelihood estimation
1
1
1( , | ) log | | ( ) ( )
2 2
NT
i ii
Nl D x x
1
1
( )N
Ti
i
lx
Calculating µ
1
1ˆ
N
ii
xN
Taking derivative with respect to µ
Setting to zero
Estimate ∑
We need to take the derivative with respect to ∑
according to the property of traces
1
1 1
1 1
1( | ) log | | ( ) ( )
2 2
1log | | [( ) ( )]
2 2
1log | | [( )( ) ]
2 2
T
n
T
n
T
n
Nl D x x
Ntr x x
Ntr x x
1
1( )( )
2 2T
n nn
l Nx x
Estimate ∑
Thus the maximum likelihood estimator is
The maximum likelihood estimator of canonical parameters are
1ˆ ( )( )TML n nn
x xN
1
1
ˆˆ
ˆˆ ˆ
ML
ML ML