Date post: | 11-Jan-2019 |
Category: |
Documents |
Upload: | nguyencong |
View: | 239 times |
Download: | 0 times |
Regularized Discriminant Analysis and Reduced-Rank LDA
Regularized Discriminant Analysis andReduced-Rank LDA
Jia Li
Department of StatisticsThe Pennsylvania State University
Email: [email protected]://www.stat.psu.edu/∼jiali
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
Regularized Discriminant Analysis
I A compromise between LDA and QDA.
I Shrink the separate covariances of QDA toward a commoncovariance as in LDA.
I Regularized covariance matrices:
Σk(α) = αΣk + (1− α)Σ .
I The quadratic discriminant function δk(x) is defined using theshrunken covariance matrices Σk(α).
I The parameter α controls the complexity of the model.
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
Computations for LDA
I Discriminant function:
δk(x) = −1
2log |Σk | −
1
2(x − µk)T Σ−1
k (x − µk) + log πk .
I Eigen-decomposition of Σk : Σk = UkDkUTk . Dk is diagonal
with elements dkl , l = 1, 2, ..., p. Uk is p × p orthonormal.
I
(x − µk)T Σ−1k (x − µk)
= [UTk (x − µk)]TD−1
k [UTk (x − µk)]
= [D− 1
2k UT
k (x − µk)]T [D− 1
2k UT
k (x − µk)]
I log |Σk | =∑
l log dkl .
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
I LDA, Σ = UDUT :I Sphere the data D− 1
2 UTX → X ∗ and D− 12 UTµk → µ∗k .
I For the transformed data and class centroids, classify x∗ to theclosest class centroid in the transformed space, modulo theeffect of the class prior probabilities πk .
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
The geometric illustration of LDA. Left: Original data in the twoclasses. The ellipsis represent the two estimated covariancematrices. Right: The class mean removed data and the estimatedcommon covariance matrix.
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
The geometric illustration of LDA. Left: The sphered meanremoved data. Right: The sphered data in the two classes, thesphered means, and the decision boundary.
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
Reduced-Rank LDA
Binary classification
I Decision boundary is given by thefollowing linear equation:
logπ1
π2− 1
2(µ1 + µ2)
TΣ−1(µ1 − µ2)
+xTΣ−1(µ1 − µ2) = 0 .
I Only the projection of X on thedirection Σ−1(µ1 − µ2) matters.
I If the data are sphered, only theprojection of X ∗ on µ∗1 − µ∗2 is needed.
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
I Suppose data are sphered.
I The subspace spanned by the K centroids is of rank K − 1,denoted by HK−1.
I Data can be viewed in HK−1 without losing any information.
I When K > 3, we might want to find a subspace HL ⊆ HK−1
optimal for LDA in some sense.
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
Optimization Criterion
I Fisher’s optimization criterion: the projected centroids werespread out as much as possible comparing with variance.
I Find the linear combination Z = aTX such that thebetween-class variance is maximized relative to thewithin-class variance, where a = (a1, a2, ..., ap)
T .
I Assume the within-class covariance matrix of X is W, i.e., thecommon covariance matrix of the classes.
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
I The between-class covariance matrix is B. Suppose µk is acolumn vector denoting the mean vector of class k.
µ =K∑
k=1
πkµk
B =K∑
k=1
πk(µk − µ)(µk − µ)T
Note πk is the percentage of class k samples in the entiredata set.
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
I For the linear combination Z , the between-class variance isaTBa and the within-class variance is aTWa.
I Fisher’s optimization becomes
maxa
aTBa
aTWa.
I Eigen-decomposition of W = VW DW VTW .
I W = (W12 )TW
12 , where W
12 = D
12W VT
W .
I Define b = W12 a, then a = W− 1
2 b. The optimization becomes
maxb
bT (W− 12 )TBW− 1
2 b
bTb
I Define B∗ = (W− 12 )TBW− 1
2 .
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
I Eigen-decomposition of B∗ = V∗DBV∗T .V∗ = (v∗1 , v∗2 , ..., v∗p ).
I The maximization is achieved by b = v∗1 , the first eigen vectorof B∗.
I Similarly, one can find the next direction b2 = v∗2 that isorthogonal to b1 = v∗1 and maximizes bT
2 B∗b2/bT2 b2.
I Since a = W− 12 b, convert to the original problem,
al = W− 12 v∗l .
I The al (also denoted as vl in the textbook) are referred to asdiscriminant coordinates or canonical variates.
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
I Summarization on obtaining discriminant coordinates:I Find the centroids for all the classes.I Find between-class covariance matrix B using the centroid
vectors.I Find within-class covariance matrix W, i.e., Σ.I By eigen-decomposition
W = (W12 )TW
12 = (D
12
W VTW )TD
12
W VTW .
I Compute
B∗ = (W− 12 )TBW− 1
2 = D− 1
2
W VTW BVW D
− 12
W .
I Eigen-decomposition of B∗:
B∗ = V∗DBV∗T .
I The discriminant coordinates are: al = W− 12 v∗l .
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
Simulation
I Three classes with equal prior probabilities 1/3.
I Input is two dimensional.
I The class conditional density of X is a normal distribution.
I The common covariance matrix Σ =
(1.0 0.00.0 1.0
).
I The three mean vectors are:
µ1 =
(00
)µ2 =
(−32
)µ3 =
(−1−3
)I Total of 450 samples are drawn with 150 in each class for
training.
I Another set of 450 samples are drawn with 150 in each classfor testing.
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
The scatter plot of the test data. Red: class 1. Blue: class 2.Magenta: class 3.
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
LDA ResultI Priors: π1 = π2 = π3 = 150
450 = 0.3333.I The three mean vectors are:
µ1 =
(−0.0757−0.0034
)µ2 =
(−2.83101.9847
)µ3 =
(−0.9992−2.9005
)I Estimated covariance matrix: Σ =
(0.9967 0.00200.0020 1.0263
).
I Decision boundaries:I Between class 1 (red) and 2 (blue):
5.9480 + 2.7684X1 − 1.9427X2 = 0 .
I Between class 1 (red) and 3 (magenta):
4.5912 + 0.9209X1 + 2.8211X2 = 0 .
I Between class 2 (blue) and 3 (magenta):
−1.3568− 1.8475X1 + 4.7639X2 = 0 .
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
Classification error rate on the test data set: 7.78%.
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
Discriminant Coordinates
I Between-class covariance matrix:
B =
(1.3111 −1.3057−1.3057 4.0235
).
I Within-class covariance matrix:
W =
(0.9967 0.00200.0020 1.0263
).
I W12 =
(−0.0686 −1.01080.9960 −0.0676
).
I B∗ = (W− 12 )TBW− 1
2 =
(3.7361 1.46031.4603 1.5050
).
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
I Eigen-decomposition of B∗:
B∗ = V∗DBV∗T
V∗ =
(0.8964 0.44320.4432 −0.8964
)DB =
(4.4582 00 0.7830
).
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
I The two discriminant coordinates are:
v1 = W− 12 v∗1 =
(−0.0668 0.9994−0.9848 −0.0678
) (0.89640.4432
)=
(0.3831−0.9128
)
v2 = W− 12 v∗2 =
(−0.9255−0.3757
)I Project data onto v1 and classify using only this 1-D data.
I The projected data are xTi v1, i = 1, ...,N.
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
Solid line: first DC. Dash line: second DC.
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
Projection on the First DCProjection of the training data on the first discriminant coordinate.
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
I Perform LDA on the projected data.
I The classification rule is:
G (x) =
1 −1.4611 ≤ xT v1 ≤ 1.11952 xT v1 ≤ −1.46113 xT v1 ≥ 1.1195
I Error rate on the test data: 12.67%.
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
Principal Component Direction
I Find the input matrix of X , or do singular valuedecomposition of mean removed X , to find the principalcomponent directions.
I Denote the covariance matrix by T:
T =
(2.3062 −1.3066−1.3066 5.0542
).
I Eigen-decomposition of T = VTDTVTT :
VT =
(0.3710 −0.9286−0.9286 −0.3710
)DT =
(5.5762 00 1.7842
)
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
Solid line: first PCD. Dash line: second PCD.
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
Results Based on the First PCProjection of data on the first PC. The boundaries between classesare shown.
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
I Perform LDA on the projected data.
I The classification rule is:
G (x) =
1 −1.4592 ≤ xT v1 ≤ 1.14892 xT v1 ≤ −1.45923 xT v1 ≥ 1.1489
I Error rate on the test data: 13.11%.
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
Comparison
I It is generally true that T = B + W.
I For the given example W ≈ I; and the true within-classcovariance matrix is I.
I Ideally, for this example, both the discriminant coordinatesand the principal component directions are simply theeigenvectors of B.
I In general, discriminant coordinates and principal componentdirections are different.
I To compute PC directions, class information is not needed;and hence PCs have more flexible applications.
I For classification, DCs tend to be better.
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
A New Simulation
I Change the commoncovariance matrix Σ to:(
4.0898 −0.8121−0.8121 0.5900
).
I The scatter plot of thetest data set.
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
LDA ResultThe classification boundaries obtained by LDA. The error rate forthe test data is 6%.
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
DCs and PC Directions
The solid line indicates the first DC or PC; the dash line indicatesthe second DC or PC.
Discriminant coordinates Principal component directions
Jia Li http://www.stat.psu.edu/∼jiali
Regularized Discriminant Analysis and Reduced-Rank LDA
Projection on 1-DThe LDA results obtained using the projected data onto the firstdiscriminant coordinate and the first principal component direction.
Projection on the first DC (test seterror rate: 7.78%)
Projection on the first PCD (testset error rate: 32.44%)
Jia Li http://www.stat.psu.edu/∼jiali