1
Quadratic Forms, Characteristic Roots and Characteristic Vectors
Mohammed Nasser Professor, Dept. of Statistics, RU,Bangladesh
Email: [email protected]
1
The use of matrix theory is now widespread .- - - -- are essential in ----------modern treatment of univeriate and multivariate statistical methods. ----------C.R.Rao
2
Contents Linear Map and MatricesQuadratic Forms and Its Applications in MMClassification of Quadratic Forms Quadratic Forms and Inner ProductDefinitions of Characteristic Roots and Characteristic Vectors Geometric InterpretationsProperties of Grammian MatricesSpectral Decomposition and ApplicationsMatrix Inequalities and MaximizationComputations
2
3
Statistical Concepts/Techniques
Concepts in Vector space
Variance Length of a vector, Qd. forms
Covariance Dot product of two vectors
Correlation Angle bt.two vectors
Regression and Classification
Mapping bt two vector sp.
PCA/LDA/CCA Orthogonal/oblique projection on lower dim.
Relation between MM (ML) and Vector space
4
Some Vector Concepts• Dot product = scalar
ii
iT yxyxyxyx
yyy
xxx
3
1332211
3
2
1
321yx
|| x || = (x12+ x2
2 + x32 )1/2
Inner product of a vector with itself = (vector length)2
xT x =x12+ x2
2 +x32 = (|| x
||)2
x1
x2 ||x||
Right-angle triangle
Pythagoras’ theorem
• Length of a vector
1
21 2
1...
nT
n i ii
n
yy
x x x xy
y
x y
|| x || = (x1
2+ x22)1/2
x1
x2
5
2 1
2 1
1 1 2 2
sin cos
sin cos
cos cos( ) cos cos sin sin
x ycos
x y cos
T
T
y yy y
x xx x
yx y xx y
x yx y
• Angle between two vectors
Orthogonal vectors: xT y = 0
x
y
=/2
||x||||y||
y2
y1
x
Some Vector Concepts
6
Linear Map and Matrices
Linear mappings are almost omnipresent
If both domain and co-domain are both finite-dimensional vector space, each linear mapping can be uniquely represented by a matrix w.r.t. specific couple of bases
We intend to study properties of linear mapping from properties of its matrix
7
Linear map and MatricesThis isom
orphism is basis
dependent
8
Linear map and Matrices
Let A be similar to B, i.e. B=P-1AP
Similarity defines an equivalent relation in the vector space of square matrices of orde n, i.e. it partitions the vector space in to different equivalent classes.
Each equivalent class represents unique linear operator
How can we choose
i) the simplest one in each equivalent class and
ii) The one of special interest ??
9
A major concern of ours is to make the best choice of basis, so that the linear operator with which we are working will have a representing matrix in the chosen basis that is as simple as possible.
Linear map and Matrices
Two matrices representing the same linear transformation with respect to different bases must be similar.
A diagonal matrix is a very useful matrix, for example,
Dn=P-1AnP
10
Linear map and Matrices
Each equivalent class represent unique linear operator
Can we characterize the class in simpler way?
Yes, we can Under extra conditions
The concept , characteristic roots plays an important role in this regards
11
Definition: The quadratic form in n variables x1, x2, …, xn is the general homogeneous function of second degree in the variables
n
i
n
jjiijn xxaxxxfY ),...,,( 21
Axx
x
xx
aaa
aaaaaa
xxxY T
nnnnn
n
n
n
...
...............
...
...
... 2
1
21
22221
11211
21
In terms of matrix notation, the quadratic form is given by
Quadratic Form
12
Examples of Some Quadratic Forms
1 2.3.
2221
21 762 xxxxAxxY T
13322123
22
21 1283021121 xxxxxxxxxAxxY T
2222
211 .... nn
T xaxaxaAxxY
A can be many for a particular quadratic form. To make it unique it is customary to write A as symmetric matrix.
Standard form What is its uses??
13
In Fact Infinite A’s• For example 1 we have
to take a12, and a21 s.t.a12 +a21 =6.
• We can do it in infinite ways.
• Symmetric A
14
Its Importance in Statistics
Variance is a fundamental concept in statistics. It is nothing but a quadratic form with a idempotent matrix of rank (n-1)
Quadratic forms play a central role in multivariate statistical analysis. For example, principal component analysis, factor analysis, discriminant analysis etc.
Tn
Tn
ii
nIAwhere
Axxn
xxn
111,
;...1)(11
2
15
Multivariate Gaussian
0
Its Importance in Statistics
16
Bivariate Gaussian
Its Importance in Statistics
17
Spherical, diagonal, full covariance
UM
18
Quadratic Form as Inner Product
XT AX=(AT X)TX = XT (AX) = XT Y
• Let A=CTC.Then XT AX= XTCTCX=(CX)TCX=YTYXT AY= XTCTCY=(CX)TCY=WT ZWhat is its geometric meaning ?
Different nonsingular Cs represent different inner products
Different inner products different geometries.
Length ofY, ||Y||= (YTY)1/2;
XTY, dot product of X andY
19
Euclidean Distance and Mathematical Euclidean Distance and Mathematical DistanceDistance
• Usual human concept of distance is Eucl. Dist.Usual human concept of distance is Eucl. Dist.• Each coordinate contributes equally to the distanceEach coordinate contributes equally to the distance
2222
211
2121
)()()(),(
),,,(),,,,(
pp
pp
yxyxyxQPd
yyyQxxxP
Mathematicians, generalizing its three properties ,
1) d(P,Q)=d(Q,P).2) d(P,Q)=0 if and only if P=Q and
3) d(P,Q)=<d(P,R)+d(R,Q) for all R, define distance on any set.
20
Statistical DistanceStatistical Distance• Weight coordinates subject to a great deal of variability Weight coordinates subject to a great deal of variability
less heavily than those that are not highly variableless heavily than those that are not highly variable
Who is nearer to data set if it
were point?
21
Statistical Distance for Uncorrelated DataStatistical Distance for Uncorrelated Data
2 22 2* * 1 2
1 211 22
( , ) x xd O P x x
s s
1 2( , ), (0,0)P x x O
* *1 1 11 2 2 22/ , /x x s x x s
22
Ellipse of Constant Statistical Distance for Ellipse of Constant Statistical Distance for Uncorrelated DataUncorrelated Data
11sc 11sc
22sc
x1
x2
0
22sc
23
Scattered Plot for Scattered Plot for Correlated MeasurementsCorrelated Measurements
24
Statistical Distance under Rotated Statistical Distance under Rotated Coordinate SystemCoordinate System
2 211 1 12 1 2 22 2( , ) 2d O P a x a x x a x
1 22 21 2
11 22
(0,0), ( , )
( , )
O P x x
x xd O P
s s
1 1 2
2 1 2
cos sinsin cos
x x xx x x
25
General Statistical DistanceGeneral Statistical Distance
)])((2))((2))((2
)(
)()([
),(
]222
[),(
),,,(),0,,0,0(),,,,(
11,1
331113221112
2
22222
21111
1,131132112
22222
2111
2121
pppppp
pppp
pppp
ppp
pp
yxyxayxyxayxyxa
yxa
yxayxa
QPd
xxaxxaxxa
xaxaxaPOd
yyyQOxxxP
26
Necessity of Statistical DistanceNecessity of Statistical Distance
27
Mahalonobis Distance
)()(),,( 1 μxΣμxΣμx TMDPopulation version:
Sample veersion;
We can robustify it using robust estimators of location and scatter functional
)()(),,( 1 xxSxxSxx TMD
28
Classification of Quadratic Form
• Chart: Quadratic Form
Definite Indefinite
Positive Definite
Positive Semi definite
Negative Definite
Negative Semi definite
29
Classification of Quadratic FormDefinitions
1. Positive Definite: A quadratic form Y=XTAX is said to be positive definite iff Y=XTAX>0 ; for all x≠0 . Then the matrix A is said to be a positive definite matrix.2. Positive Semi-definite:A quadratic form, Y=XTAX is said to be positive semi-definite iff Y=XTAX>=0 , for all x≠0 and there exists x≠0 such that XTAX=0 . Then the matrix A is said to be a positive semidefinite matrix. 3. Negative Definite: A quadratic form Y=XTAX is said to be negative definite iff Y=XTAX<=0 for all x≠0. Then the matrix A is said to be negative definite matrix
30
4. Negative Semi-definite: A quadratic form, is said to be negative semi-definite iff ,
for all x≠0 and there exists x≠0 such that . The matrix A is said to be a negative semi-definite matrix.Indefinite: Quadratic forms and their associated symmetric matrices need not be definite or semi-definite in any of the above scenes. In this case the quadratic form is said to be indefinite; that is , it can be negative, zero or positive depending on the values of x.
AxxY T0 AxxY T
0AxxT
Classification of Quadratic FormDefinitions
31
Two Theorems On Quadratic Form
Theorem(1): A quadratic form can always be expressed with respect to a given coordinate system as . where A is a unique symmetric matrix.
Theorem2: Two symmetric matrices A and B represent the same quadratic form if and only if
B=PTAP where P is a non-singular matrix.
AxxY T
32
Classification of Quadratic FormImportance of Standard Form
From standard form we can easily classify a quadratic form.
XT AX=
Is positive /positive semi/negative/ negative semidifinite/indefinite if ai >0 for all i/ ai >0 for some i others, a=0/ai <0 for all i,/ ai <0 some i , others, a=0/ Some ai are +ive, some are negative.
2
1i
n
iixa
33
That is why using suitable nonsingular trandformation ( why nonsingular??) we try to transform general XT AX into a standard form.If we can find a P nonsingular matrix s.t.
we can easily classify it. We can do it i) for congruent transformation and ii) using eigenvalues and eigen vectors.
matrix diagonal aD, DAPPT
Classification of Quadratic FormImportance of Standard Form
Method 2 is mostly used in MM
34
1. Positive Definite: (a). A quadratic form is positive definite iff the nested principal minors of A is given as
Evidently a matrix A is positive definite only if det(A)>0 (b). A quadratic form Y=XTAX be positive definite iff all the
eigen values of A are positive.
0 AxxY T
0
...............
...
,........,0,0,0
21
2....2221
11211
333231
232221
131211
2221
121111
nnnn
n
n
aaa
aaaaaa
aaaaaaaaa
aaaa
a
Classification of Quadratic FormImportance of Determinant, Eigen Values and
Diagonal Element
35
2. Positive Semi-definite:(a) A quadratic form is positive semi-definite iff the nested principal minors of A is given as
(b). A quadratic form Y=XTAX is positive semi-definite iff at least one eigen value of A is zero while the remaining roots are positive.
AxxY T
Classification of Quadratic FormImportance of Determinant, Eigen Values and
Diagonal Element
0
...............
...
,........,0,0
21
2....2221
11211
2221
121111
nnnn
n
n
aaa
aaaaaa
aaaa
a
36
Continued
3. Negative Definite: (a). A quadratic form is negative definite iff the nested principal minors of A are given as
Evidently a matrix A is negative definite only if (-1)n× det(A)>0;
where det(A) is either negative or positive depending on the order n of A.
(b). A quadratic form Y=XTAX be negative definite iff all the eigen Roots of A are negative.
AxxY T
,........0,0,0
333231
232221
131211
2221
121111
aaaaaaaaa
aaaa
a
37
4. Negative Semi-definite:(a)A quadratic form is negative semi-definite iff the nested principal minors of A is given as
Evidently a matrix A is negative semi-definite only if
,that is, det(A)≥0 ( det(A)≤0 ) when n is odd( even). (b). A quadratic form is negative semi-definite iff
at least one eigen value of A is zero while the remaining roots are negative.
AxxY T
AxxY T
0)1( An
,........0,0,0
333231
232221
131211
2221
121111
aaaaaaaaa
aaaa
a
Continued
0)1( An
38
Theorem on Quadratic Form(Congruent Transformation)
If is a real quadratic form of n variables x1, x2, …, xn and rank r i.e. ρ(A)=r then there exists a non-singular matrix P of order n such that x=Pz will convert Y in the canonical form
where λ1, λ2, …, λr are all the different from zero.
That implies
AxxY T
2222
211 ... rr zzz
}00,,,diag{D , 1 r
T
TTT
DZZ
APZPZAXX
39
Grammian (Gram)Matrix
Grammian Matrix
-----If A be n×m matrix then the matrix S=ATA is called grammian matrix of A. If A is m×n then S=ATA is a symmetric n-rowed matrix.
Propertiesa. Every positive definite or positive semi-definite matrix can be
represented as a Grammian matrixb. The Grammian matrix ATA is always positive definite or
positive semi-definite according as the rank of A is equal to or less than the number of columns of A
c.d. If ATA=0 then A=0
rAAAArA TT )()()(
40
What are eigenvalues?
• Given a matrix, A, x is the eigenvector and is the corresponding eigenvalue if Ax = x– A must be square and the determinant of A - I must
be equal to zeroAx - x = 0 ! (A - I) x = 0
• Trivial solution is if x = 0• The non trivial solution occurs when det(A - I) = 0
• Are eigenvectors are unique?– If x is an eigenvector, then x is also an eigenvector
and is an eigenvalue of A,A(x) = (Ax) = (x) = (x)
41
Calculating the Eigenvectors/values
• Expand the det(A - I) = 0 for a 2 × 2 matrix
• For a 2 × 2 matrix, this is a simple quadratic equation with two solutions (maybe complex)
• This “characteristic equation” can be used to solve for x
0
00det
01001
detdet
2112221122112
211222112221
1211
2221
1211
aaaaaa
aaaaaaaa
aaaa
IA
}4{21
211222112
22112211 aaaaaaaa
42
Eigenvalue example• Consider,
• The corresponding eigenvectors can be computed as
– For = 0, one possible solution is x = (2, -1)– For = 5, one possible solution is x = (1, 2)
5,0)41(02241)41(
0
4221
2
2211222112211
2
aaaaaa
A
00
1224
1224
05005
4221
5
00
4221
4221
00000
4221
0
yxyx
yx
yx
yxyx
yx
yx
43
Geometric Interpretation of eigen roots and vectors
We know from the definition of eigen roots and vectors Ax = λx; (**)
where A is m×m matrix, x is m tuples vector and λ is scalar quantity.
From the right side of (**) we see that the vector is multiplied by a scalar. Hence the direction of x and λx is on the same line.
The left side of(**)shows the effect of matrix multiplication of matrix A (matrix operator) with vector x. But matrix operator may change the direction and magnitude of the vector.
44
Geometric Interpretation of eigen roots and vectors
Hence our goal is to find such kind of vectors that change in magnitude but remain on the same line after matrix multiplication.
Now the question arises: does these eigen vectors along with their respective change in magnitude characterize the matrix?
Answer is the DECOMPOSITION THEOREMS
45
Geometric Interpretation of eigen Roots and Vectors
Y
X
x1
x2
[A] Y
X
Ax1
Geometric Interpretation
Ax2
ZZ
46
More to Notice
1 1
2 2
00 a x x
aa x x
1 1
2 2
1 1
2 2
0 0 0 0, 0 b 0 0 0 b
00 b
a x x aa bx x
a x axx bx
47
Properties of Eigen Values and Vectors
If B=CAC-1, where A, B and C are all n×n then A and B have same eigen roots. If x is the eigen Vector of A then Cx for BThe eigen roots of A and AT are same.A eigen Vector x≠o can not be associated with more than one eigen Root The eigen Vectors of a matrix A are linearly independent if they corresponds to distinct roots.Let A be a square matrix of order m and suppose all its roots are distinct. Then A is similar to a diagonal matrix Λ,i.e. P-
1AP= Λ.eigen Roots and vectors are all real for any real symmetric matrix, A If λi and λj are two distinct roots of a real symmetric matrix A, then vectors xi and xj are orthogonal
48
If λ1, λ2, … , λm are the eigen roots of the non-singular matrix A then λ1
-1, λ2-1, … , λm
-1 are the eigen roots of A-1.
Let A, B be two square matrices of order m. Then the eigen roots of AB are exactly the eigen roots of BA.
Let A, B be respectively m×n and n×m matrices, where m≤n. Then the eigen Roots of (BA)n×n consists of n-m zeros and the m eigen Roots of (AB)m×m.
Properties of Eigen Values and Vectors
49
Let A be a square matrix of order m and λ1, λ2, … , λm be its eigen Roots then .Let A be a m×m matrix with eigen Roots λ1, λ2, … , λm then tr(A) = tr(Λ) = λ1+ λ2+ … + λm .
If A has eigen Roots λ1, λ2, … , λm then A-kI has eigen Roots λ1-k, λ2-k, … , λm-k and kA has the eigen Roots kλ1, kλ2, … , kλm , where k is scalar.
If A is an orthogonal matrix then all its eigen Roots have absolute value 1.Let A be a square matrix of order m; suppose further that A is idempotent. Then its eigen Roots are either 0 or 1.
m
i iA1
Properties of Eigen Values and Vectors
50
51
Eigen/diagonal Decomposition
• Let be a square matrix with m linearly independent eigenvectors (a “non-defective” matrix)
• Theorem: Exists an eigen decomposition
– (cf. matrix diagonalization theorem)
• Columns of U are eigenvectors of S• Diagonal elements of are eigenvalues of
diagonalUnique for
distinct eigen-values
52
Diagonal decomposition: why/how
mvvU ...1Let U have the eigenvectors as columns:
m
mmmm vvvvvvSSU
............
1
1111
Then, SU can be written
And S=UU–1.
Thus SU=U, or U–1SU=
UM
53
Diagonal decomposition - example
Recall .3,1;2112
21
S
The eigenvectors and form
11
11
1111
U
Inverting, we have
2/12/12/12/11U
Then, S=UU–1 =
2/12/1
2/12/13001
1111
RecallUU–1 =1.
UM
54
Example continued
Let’s divide U (and multiply U–1) by 2
2/12/12/12/1
3001
2/12/12/12/1
Then, S=
Q (Q-1= QT )
Why? …
55
Symmetric Eigen Decomposition
• If is a symmetric matrix:
• Theorem: Exists a (unique) eigen decomposition• where Q is orthogonal:
– Q-1= QT
– Columns of Q are normalized eigenvectors
– Columns are orthogonal.
– (everything is real)
TQQS
56
Spectral Decomposition theorem• If A is a symmetric m ×m matrix with i and ei,
i = 1 m being the m eigenvector and eigenvalue pairs, then
– This is also called the eigen( spectral) decomposition theorem
• Any symmetric matrix can be reconstructed using its eigenvalues and eigenvectors
Tm
i m
Ti
miimmm
Tm
mmm
m
T
mm
T
mmmPPeeAeeeeeeA
1 111112
122
11
111
m
mmmmm
00
0000
,, 2
1
21 eeeP
57
Example for Spectral Decomposition• Let A be a symmetric, positive definite matrix
• The eigenvectors for the corresponding eigenvalues are
• Consequently,
02316.016.65
0det8.24.04.02.2
2
IAA
51,
52,
52,
51
21TT ee
4.08.08.06.1
4.22.12.16.0
51
52
51
52
25
25
1
52
51
38.24.04.02.2
A
58
The Square Root of a MatrixThe spectral decomposition allows us to express a
square matrix in terms of its eigenvalues and eigenvectors.
This expression enables us to conveniently create a square root matrix.A is a p x p positive definite matrix with the spectral
decomposition:
k
'i i i
i 1A ee P = [ e1 e2 e3 … ep]
k
' 'i i i
i 1A ee P P
where P’P = PP’ = I and = diag(i).
If
59
The Square Root of a Matrix
112 2
p
0 00 0
0 0This implies (P1/2P’)PP’ = P1/2P’ = (PP’)
Let
The matrix
1 1k
' '2 2i i i
i 1P P ee A
Is called he square root of A
60
Matrix Square Root Properties
The square root of A has the following properties(prove them):
'1 12 2A A
1 12 2A A A
-1 1 1 -12 2 2 2A A A A I
-1 -1 -1 112 2 2 2 where
-1
A A A A A
61
Physical Interpretation of SPD(Spectral Decomposition)
e1e2
Suppose xT Ax = c2. For p = 2, all x that satisfy this equation form an ellipse, i.e., c2 = 1(xTe1)2
+ 2(xT e2)2 (using SPD of p.d. A).
Let x1 = c 1-1/2 e1 and x2 = c λ2
-1/2 e2. Both x satisfy the above equation in the direction of eigenvector. Note that the length of x is c 1
-1/2. ||x|| is inversely propotional to sqrt of eigen values of A.
What w
ill be the case if w
e replace A by A
-1 ?
Var(eTi
x)=e
iTVar(X)eT=
Λi ith eigen value
ofVar(X)
All points at same ellipse-distance.
62
Matrix Inequalitiesand Maximization
- Extended Cauchy-Schwartz Inequality – Let b and d be any two p x 1 vectors and B be a p x p positive definite matrix. Then
(b’d)2 (b’Bb)(d’B-1d)with equality iff b=kB-1d or (or d=kB -1d) for some constant c.
- Maximization Lemma – let d be a given p x 1 vector and B be a p x p positive definite matrix. Then for an arbitrary nonzero vector x
21' 'max 'x 0
x dd B dx Bx
with the maximum attained when x = kB-1d for any constant k.
63
Matrix Inequalitiesand Maximization
- Maximization of Quadratic Forms for Points on the Unit Sphere – let B be a p x p positive definite matrix with eigenvalues 1 2 p and associated eigenvectors e1, e2, ,ep. Then
1 1
p p
' (attained when = )max '' (attained when = )min '
x 0
x 0
x Bx x ex xx Bx x ex x
1 kk+1 k+1,
' (attained when , k 1,2, ,p-1)max 'x e e
x Bx x ex x
64
t<-sqrt(2)x<-c(3.0046,t,t,16.9967)A<-matrix(x, nrow=2)eigen(A)
> x[1] 3.004600 1.414214 1.414214 16.996700> A<-matrix(x, nrow=2)> A [,1] [,2][1,] 3.004600 1.414214[2,] 1.414214 16.996700> eigen(A)$values[1] 17.138207 2.863093
$vectors [,1] [,2][1,] 0.09956317 -0.99503124[2,] 0.99503124 0.09956317
Calculation in R
65
Thank you