Date post: | 15-Dec-2015 |
Category: |
Documents |
Upload: | hector-maggs |
View: | 214 times |
Download: | 0 times |
Multi-Relational Latent Semantic Analysis.
Kai-Wei ChangJoint work with Scott Wen-tau Yih, Chris Meek
Microsoft Research
Natural Language Understanding
Build an intelligent system that can interact with human using natural language
Research challengeMeaning representation of textSupport useful inferential tasks
Semantic word representation is the foundation
Language is compositionalWord is the basic semantic unit
Continuous Semantic Representations
A lot of popular methods for creating word vectors!
Vector Space Model [Salton & McGill 83]
Latent Semantic Analysis [Deerwester+ 90]
Latent Dirichlet Allocation [Blei+ 01]
Deep Neural Networks [Collobert & Weston 08]
Encode term co-occurrence informationMeasure semantic similarity well
Continuous Semantic Representations
sunnyrainy
windycloudy
car
wheel
cab sad
joy
emotion
feeling
Semantics Needs More Than SimilarityTomorrow
will be rainy. Tomorrow
will be sunny.
rainy, sunny?
rainy, sunny?
Leverage Linguistic Resources
Can’t we just use the existing linguistic resources?
Knowledge in these resources is never completeOften lack of degree of relations
Create a continuous semantic representation that
Leverages existing rich linguistic resourcesDiscovers new relationsEnables us to measure the degree of multiple relations (not just similarity)
Roadmap
IntroductionBackground:
Latent Semantic Analysis (LSA)Polarity Inducing LSA (PILSA)
Multi-Relational Latent Semantic Analysis (MRLSA)
Encoding multi-relational data in a tensorTensor decomposition & measuring degree of a relation
ExperimentsConclusions
Roadmap
IntroductionBackground:
Latent Semantic Analysis (LSA)Polarity Inducing LSA (PILSA)
Multi-Relational Latent Semantic Analysis (MRLSA)
Encoding multi-relational data in a tensorTensor decomposition & measuring degree of a relation
ExperimentsConclusions
Latent Semantic Analysis [Deerwester+ 1990]
Data representationEncode single-relational data in a matrix
Co-occurrence (e.g., from a general corpus)Synonyms (e.g., from a thesaurus)
FactorizationApply SVD to the matrix to find latent components
Measuring degree of relationCosine of latent vectors
Cosine Score
Encode Synonyms in MatrixInput: Synonyms from a thesaurus
Joyfulness: joy, gladdenSad: sorrow, sadden
joy gladden
sorrow sadden
goodwill
Group 1: “joyfulness”
1 1 0 0 0
Group 2: “sad” 0 0 1 1 0
Group 3: “affection”
0 0 0 0 1
Target word: row-vector
Term: column-vector
SVD generalizes the original dataUncovers relationships not explicit in the thesaurusTerm vectors projected to -dim latent space
Word similarity: cosine of two column vectors in
Mapping to Latent Space via SVD
𝐖 𝐔 𝐕𝑇≈
𝑑×𝑛 𝑑×𝑘
𝑘×𝑘 𝑘×𝑛
𝚺terms
Problem: Handling Two Opposite RelationsSynonyms & Antonyms
LSA cannot distinguish antonyms [Landauer 2002]
“Distinguishing synonyms and antonyms is still perceived as a difficult open problem.” [Poon & Domingos 09]
Polarity Inducing LSA [Yih, Zweig, Platt 2012]
Data representationEncode two opposite relations in a matrix using “polarity”
Synonyms & antonyms (e.g., from a thesaurus)
FactorizationApply SVD to the matrix to find latent components
Measuring degree of relationCosine of latent vectors
joy gladden
sorrow sadden
goodwill
Group 1: “joyfulness”
1 1 -1 -1 0
Group 2: “sad” -1 -1 1 1 0
Group 3: “affection”
0 0 0 0 1
Encode Synonyms & Antonyms in Matrix
Joyfulness: joy, gladden; sorrow, saddenSad: sorrow, sadden; joy, gladden
Inducing polarity
Cosine Score:
Target word: row-vector
joy gladden
sorrow sadden
goodwill
Group 1: “joyfulness”
1 1 -1 -1 0
Group 2: “sad” -1 -1 1 1 0
Group 3: “affection”
0 0 0 0 1
Encode Synonyms & Antonyms in Matrix
Joyfulness: joy, gladden; sorrow, saddenSad: sorrow, sadden; joy, gladden
Inducing polarityTarget word: row-
vector
Cosine Score:
Limitation of the matrix representationEach entry captures a particular type of relation between two entities, orTwo opposite relations with the polarity trick
Encoding other binary relationsIs-A (hyponym) – ostrich is a birdPart-whole – engine is a part of car
Problem: How to Handle More Relations?
Encode multiple relations in a 3-way tensor (3-dim array)!
Multi-Relational LSA
Data representationEncode multiple relations in a tensor
Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)
FactorizationApply tensor decomposition to the tensor to find latent components
Measuring degree of relationCosine of latent vectors after projection
Multi-Relational LSA
Data representationEncode multiple relations in a tensor
Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)
FactorizationApply tensor decomposition to the tensor to find latent components
Measuring degree of relationCosine of latent vectors after projection
Multi-Relational LSA
Data representationEncode multiple relations in a tensor
Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)
FactorizationApply tensor decomposition to the tensor to find latent components
Measuring degree of relationCosine of latent vectors after projection
Multi-Relational LSA
Data representationEncode multiple relations in a tensor
Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)
FactorizationApply tensor decomposition to the tensor to find latent components
Measuring degree of relationCosine of latent vectors after projection
Multi-Relational LSA
Data representationEncode multiple relations in a tensor
Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)
FactorizationApply tensor decomposition to the tensor to find latent components
Measuring degree of relationCosine of latent vectors after projection
Represent word relations using a tensorEach slice encodes a relation between terms and target words.
0 0 0 0
0 0 1 0
1 0 0 0
0 0 0 0
joy
glad
den
sadd
enfe
elin
gjoyfulness
gladden
sad
anger
glad
den
joy
sadd
enfe
elin
g
joyfulness
gladden
sad
anger
Synonym layer Antonym layer
1 1 0 0
1 1 0 0
0 0 1 0
0 0 0 0
Construct a tensor with two slices
Encode Multiple Relations in Tensor
Can encode multiple relations in the tensor
0 0 0 1
0 0 0 0
0 0 0 1
0 0 0 1
glad
den
joy
sadd
enfe
elin
g
joyfulness
gladden
sad
anger
Hyponym layer
1 1 0 0
1 1 0 0
0 0 1 0
0 0 0 0
1 1 0 0
1 1 0 0
0 0 1 0
0 0 0 0
Encode Multiple Relations in Tensor
Multi-Relational LSA
Data representationEncode multiple relations in a tensor
Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)
FactorizationApply tensor decomposition to the tensor to find latent components
Measuring degree of relationCosine of latent vectors after projection
Derive a low-rank approximation to generalize the data and to discover unseen relationsApply Tucker decomposition and reformulate the results
Tensor Decomposition – Analogy to SVD
𝑤1
,𝑤2
,…,𝑤
𝑑 𝑡1 , 𝑡2 ,…, 𝑡𝑛
~~ × ×
𝑡1 , 𝑡2 ,…, 𝑡𝑛
𝑟 𝑟
𝑟
𝑟
𝑤1
,𝑤2
,…,𝑤
𝑑
latent representation of words
𝑤1
,𝑤2
,…,𝑤
𝑑 𝑡1 , 𝑡2 ,…, 𝑡𝑛
~~ × ×
𝑡1 , 𝑡2 ,…, 𝑡𝑛
𝑟 𝑟
𝑟
𝑟
Derive a low-rank approximation to generalize the data and to discover unseen relationsApply Tucker decomposition and reformulate the results
~~ × ×
𝑟
𝑟
𝑅1 ,𝑅
2 ,…,𝑅
𝑚
𝑟
latent representation of words
latent representation of a relation
Tensor Decomposition – Analogy to SVD
Multi-Relational LSA
Data representationEncode multiple relations in a tensor
Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)
FactorizationApply tensor decomposition to the tensor to find latent components
Measuring degree of relationCosine of latent vectors after projection
Measure Degree of Relation
SimilarityCosine of the latent vectors
Other relation (both symmetric and asymmetric)
Take the latent matrix of the pivot relation (synonym)Take the latent matrix of the relationCosine of the latent vectors after projection
𝑎𝑛𝑡 ( joy ,sadden )=cos (𝓦: , joy , 𝑠𝑦𝑛 ,𝓦: ,sadden ,𝑎𝑛𝑡 )
0 0 0 0
0 0 1 0
1 0 0 0
0 0 0 0
joy
glad
den
sadd
enfe
lling
joyfulness
gladden
sad
anger
glad
den
joy
sadd
enfe
lling
joyfulness
gladden
sad
anger
Synonym layer Antonym layer
1 1 0 0
1 1 0 0
0 0 1 0
0 0 0 0
Measure Degree of RelationRaw Representation
𝑎𝑛𝑡 ( joy ,sadden )=cos (𝓦: , joy , 𝑠𝑦𝑛 ,𝓦: ,sadden ,𝑎𝑛𝑡 )
0 0 0 0
0 0 1 0
1 0 0 0
0 0 0 0
joy
glad
den
sadd
enfe
lling
joyfulness
gladden
sad
anger
glad
den
joy
sadd
enfe
lling
joyfulness
gladden
sad
anger
Synonym layer Antonym layer
1 1 0 0
1 1 0 0
0 0 1 0
0 0 0 0
Measure Degree of RelationRaw Representation
𝐻𝑦𝑝𝑒𝑟 ( joy , feeling )=cos (𝑾 : , joy ,𝑠𝑦𝑛 ,𝑾 : ,feeling , h𝑦𝑝𝑒𝑟 )
joy
glad
den
sadd
enfe
lling
joyfulness
gladden
sad
anger
Synonym layer
1 0 0 0
1 1 0 0
0 0 1 0
0 0 0 0
Estimate the Degree of a RelationRaw Representation
0 0 0 1
0 0 0 0
0 0 0 1
0 0 0 1
glad
den
joy
sadd
enfe
elin
g
joyfulness
gladden
sad
anger
Hypernym layer
𝑟𝑒𝑙 ( w 𝑖 ,w 𝑗 )=cos (𝑊 : , w 𝑖 , 𝑠𝑦𝑛,𝑊 : , w 𝑗 ,𝑟𝑒𝑙 )
Measure Degree of RelationRaw Representation
w 𝑗
w𝑖
Synonym layer
The slice of the specific relation
Cos ( , )
~~ × ×
𝑣1 ,𝑣2 , …,𝑣𝑛
𝑟
𝑟
𝑆,𝑆
2 ,…,𝑆𝑚
𝑟~~
× ×
𝑟𝑒𝑙 ( w 𝑖 ,w 𝑗 )=cos (𝑺:, : ,𝑠𝑦𝑛𝐕𝑖 , :𝑇 ,𝑺: , :, 𝑟𝑒𝑙𝐕 𝑗 ,:
𝑇 )
Measure Degree of RelationLatent Representation
w 𝑗
w𝑖
R 𝑠𝑦𝑛
R𝑟𝑒𝑙 v 𝑗 v 𝑖
Roadmap
IntroductionBackground:
Latent Semantic Analysis (LSA)Polarity Inducing LSA (PILSA)
Multi-Relational Latent Semantic Analysis (MRLSA)
Encoding multi-relational data in a tensorTensor decomposition & measuring degree of a relation
ExperimentsConclusions
Experiment: Data for Building MRLSA Model
Encarta Thesaurus Record synonyms and antonyms of target wordsVocabulary of 50k terms and 47k target words
WordNetHas synonym, antonym, hyponym, hypernym relationsVocabulary of 149k terms and 117k target words
Goals:MRLSA generalizes LSA to model multiple relationsImprove performance by combing heterogeneous data
Example Antonyms Output by MRLSA
Target High Score Words
inanimate
alive, living, bodily, in-the-flesh, incarnate
alleviate exacerbate, make-worse, in-flame, amplify, stir-up
relish detest, abhor, abominate, despise, loathe
* Words in blue are antonyms listed in the Encarta thesaurus.
Results – GRE Antonym TestTask: GRE closest-opposite questions
Which is the closest opposite of adulterate?(a) renounce (b) forbid (c) purify (d) criticize (e) correct
Mohammad et al. 08 Lookup PILSA MRLSA0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.64
0.56
0.740.77
Acc
ura
cy
Example Hyponyms Output by MRLSATarget High Score Words
bird ostrich, gamecock, nighthawk, amazon, parrot
automobile
minivan, wagon, taxi, minicab, gypsy cab
vegetable buttercrunch, yellow turnip, romaine, chipotle, chilli
Results – Relational Similarity (SemEval-2012)
Task: Class-Inclusion Relation ( is-a kind of )Most/least illustrative word pairs(a) art:abstract (b) song:opera (c) footwear:boot (d) hair:brown
UTD Lookup MRLSA0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.340.37
0.56
Acc
ura
cy
Relational domain knowledge – the entity type
Relation can only hold between the right types of entities
Words having is-a relation have the same part-of-speechFor relation born-in, the entity types are: (person, location)
Leverage type information to improve MRLSA
Idea #3: Change the objective function
Problem: Use Relational Domain Knowledge
ConclusionsContinuous semantic representation that
Leverages existing rich linguistic resourcesDiscovers new relationsEnables us to measure the degree of multiple relations
ApproachesBetter data representationMatrix/Tensor decomposition
Challenges & Future WorkCapture more types of knowledge in the modelSupport more sophisticated inferential tasks