1
Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal
Continuous Parameter Modeling
Jerome R. Bellegarda
2
Outline
• Introduction
• LSM
• Applications
• Conclusions
3
Introduction
• LSA in IR:– Words of queries and documents– Recall and precision
• Assumption: There is some underlying latent semantic structure in the data– Latent structure is conveyed by correlation patterns– Documents: bag-of-words model
• LSA improves separability among different topics
4
Introduction
5
Introduction
• Success of LSA:– Word clustering– Document clustering– Language modeling– Automated call routing– Semantic Inference for spoken interface control
• These solutions all leverage LSA’s ability to expose global relationships in context and meaning
6
Introduction
• Three unique factors for LSA:– The mapping of discrete entries– The dimensionality reduction– The intrinsically global outlook
• The change of terminology to latent semantic mapping (LSM) to convey increased reliance on the general properties
7
Latent Semantic Mapping
• LSA defines a mapping between the discrete sets– M: an inventory of M individual units, such as words– N: an collection of N meaningful compositions of units,
such as documents– L: a continuous vector space
– ri: unit in M
– cj: composition in N
8
Feature Extraction
• Construction of a matrix W of co-occurrences between units and compositions
• The cell of W:
,,
,
(1 )
: the number of times occurs in
: the total number of units present in
: the normalized entropy of in the collection
i ji j i
j
i j i j
j j
i i
w
r c
c
r N
9
Feature Extraction
• The entropy of ri:
• Value of Entropy Close to 0 means that the unit is present only in a few specific compositions.
• The global weight is therefore a measure of the indexing power of the unit ri
, ,
1
,
, ,
1log
log
0 1 with equality if and only if and
Ni j i j
ij i i
i i jj
i i j i i j i
N
N
1 i
10
Singular Value Decomposition
• The MxN unit-composition matrix W defines two vector representations for the units and the compositions
• ri: a row factor of dimension N
• cj: a column factor of dimension M
• Unpractical:– M,N can be extremely large
– Vector ri, cj are typically sparse
– Two spaces are distinct from each other
11
Singular Value Decomposition
• Employ SVD:• U: MxR left singular matrix with row vectors u i
• S: RxR diagonal matrix of singular values
• V: NxR right singular matrix
with row vector vj • U, V are column-orthonormal
– UTU=VTV=IR
• R<min(M, N)
ˆ TW W USV
1 2 ... 0Rs s s
12
Singular Value Decomposition
13
Singular Value Decomposition
• captures the major structural associations in and ignores higher order effects
• The closeness of vector in L:– Unit-unit comparison– Composition-composition comparison– Unit-Composition comparison
WW
14
Closeness Measure
• WWT: co-occurrences between units• WTW: co-occurrences between compositions
• ri, rj: units which have similar pattern of occurrence across the composition
• ci, cj: compositions which have similar pattern of occurrence across the unit
15
Closeness Measure
• Unit-Unit Comparisons:
• Cosine measure:
• Distance: [0, π]
TT UUSWW 2
2
( , ) cos( , )T
i ji j i j i
j
u S uK r r u S u S
u S u S
1( , ) cos ( , )i j i jD r r K r r
16
Unit-Unit Comparisons
17
Closeness Measure
• Composition-Composition Comparisons:
• Cosine measure:
• Distance: [0, π]
2T TW W VS V
2
( , ) cos( , )T
i ji j i j i
j
v S vK c c v S v S
v S v S
1( , ) cos ( , )i j i jD c c K c c
18
Closeness Measure
• Unit-Composition Comparisons:
• Cosine measure:
• Distance: [0, π]
TW USV
1/ 2 1/ 2
1/ 2 1/ 2( , ) cos( , )
Ti j
i j i j ij
u SvK r c u S v S
u S v S
1( , ) cos ( , )i j i jD r c K r c
19
LSM Framework Extension
• Observe a new composition , p>N, the tilde symbol reflects the fact that the composition was not part of the original N
• , a column vector of dimension M, can be thought of as an additional column of the matrix W
• U, S do not change: Tp pc USv
pc
pc
20
LSM Framework Extension
: pseudo-composition
: pseudo-composition vector
• If the addition of causes the major structural associations in W to shift in some substantial manner, the singular vectors will become inadequate.
is similar to a composition vector
Tp p p
p
v v S c U
v
pc
pv
pc
21
LSM Framework Extension
• It would be necessary to re-compute SVD to find a proper representation for pc
22
Salient Characteristics of LSM
• A single vector embedding for both units and compositions in the same continuous vector space L
• A relatively low dimensionality, which make operations such as clustering meaningful and practical
• An underlying structure reflecting globally meaningful relationships, with natural similarity metrics to measure the distance between units, between compositions or between units and compositions in L
23
Applications
• Semantic classification
• Multi-span language modeling
• Junk e-mail filtering
• Pronunciation modeling
• TTS Unit Selection
24
Semantic Classification
• Semantic classification refers to determine which one of predefined topic a given document is most closely aligned with
• The centroid of each clusters can be viewed as the semantic representation of this outcome in LSM space– Semantic anchor
• A newly observed word sequence measures by computing the distance between the document and semantic anchor, and pick minimum
1( , ) cos ( , )i j i jD c c K c c
25
Semantic Classification
• Domain knowledge is automatically encapsulated in the LSM space in a data-driven fashion
• For Desktop interface control:– Semantic inference
26
Semantic Inference
27
Multi-Span Language Modeling
• In a standard n-gram , the history is string
• In LSM language modeling, the history is the current document up to word
• Pseudo-document:– Continually updated as q increases
( )1 1 2 1...n
q q q q nH r r r
1qr
( )1 1lq qH c
1
1( 1) (1 )q q q q i i
q
c c S n c rSn
28
Multi-Span Language Modeling
• An Integrated n-gram + LSM formulation for the overall language model probability:
– Different syntactic constructs can be used to carry the same meaning (content words)
( ) ( ) ( )1 1 1Pr( | ) Pr( | , )n l n l
q q q q qr H r H H
1 2 1 1( )1
1 2 1 1
Pr( | ... ) Pr( | )Pr( | )
Pr( | ... ) Pr( | )i
q q q q n q qn lq q
i q q q n q qr M
r r r r c rr H
r r r r c r
29
Multi-Span Language Modeling( ) ( ) ( )
1 1 1
( ) ( ) ( ) ( )1 1 1 1
( ) ( ) ( ) ( )1 1 1 1
( ) ( ) ( )1 1 1
( ) ( )1 1 1
Pr( | ) Pr( | , )
Pr( , | ) Pr( , | )
Pr( | ) Pr( , | )
Pr( | ) Pr( | , )
Pr( | ) Pr( | ,
i
n l l nq q q q q
l n l nq q q q q q
l n l nq q i q q
r M
n l nq q q q q
n li q q i q
r H r H H
r H H r H H
H H r H H
r H H r H
r H H r H
( )
1 2 1 1 1 2 1
1 2 1 1 1 2 1
1 2 1 1
1 2 1 1
)
Pr( | ... ) Pr( | , ... )
Pr( | ... ) Pr( | , ... )
Pr( | ... ) Pr( | )
Pr( | ... ) Pr( | )
i
i
i
n
r M
q q q q n q q q q q n
i q q q n q i q q q nr M
q q q q n q q
i q q q n q ir M
r r r r c r r r r
r r r r c r r r r
r r r r c r
r r r r c r
Assume that the probability of the document History given the current word is not affected by immediate context preceding it
30
Multi-Span Language Modeling
1 2 1 1( )1
1 2 1 1
11 2 1
11 2 1
1 2
Pr( | ... ) Pr( | )Pr( | )
Pr( | ... ) Pr( | )
Pr( , )Pr( | ... )
Pr( )
Pr( , )Pr( | ... )
Pr( )
Pr( | .
i
i
q q q q n q qn lq q
i q q q n q ir M
q qq q q q n
q
q ii q q q n
r M i
q q q
r r r r c rr H
r r r r c r
c rr r r r
r
c rr r r r
r
r r r
1 11
1 11 2 1
Pr( | ) Pr( ).. )
Pr( )
Pr( | ) Pr( )Pr( | ... )
Pr( )i
q q qq n
q
i q qi q q q n
r M i
r c cr
r
r c cr r r r
r
31
Junk E-mail Filtering
• It can be viewed as a degenerate case of semantic classification (two categories)– Legitimate – Junk
• M: an inventory of words, symbols• N: a binary collection of email messages• Two semantic anchors
32
Pronunciation Modeling
• Also called grapheme-to-phoneme conversion (GPC)
• Orthographic anchors – (one for each in-vocabulary word)
• Orthographic neighborhood– In-vocabulary word with High closeness for out-
vocabulary word
33
Pronunciation Modeling
34
Conclusions
• Descriptive Power– Forgoing local constraints is not acceptable in some si
tuations
• Domain Sensitivity– Depend on the quality of the training data– polysemy
• Updating the LSM Space– SVD on the fly is not practical
• Success of LSM for three characteristics