Date post: | 30-Dec-2015 |
Category: |
Documents |
Upload: | arabella-elvery |
View: | 61 times |
Download: | 1 times |
Kai-Wei Chang, Scott Wen-tau Yih, Bishan Yang & Chris MeekMicrosoft Research
Typed Tensor Decomposition of Knowledge Bases for Relation Extraction
• Useful resources for NLP applications• Semantic Parsing & Question Answering [e.g., Berant+,
2014] • Information Extraction [Riedel+, 2013]
Knowledge Base
FreebaseDBpedia
YAGONELL
OpenIE/ReVerb
• Captures world knowledge by storing properties of millions of entities, as well as relations among them
• Knowledge base is never complete!• Extract previously unknown facts from new corpora• Predict new facts via inference
• Modeling multi-relational data• Statistical relational learning [Getoor & Taskar, 2007]• Path ranking methods (e.g., random walk) [e.g., Lao+ 2011]•Knowledge base embedding•Very efficient•Better prediction accuracy
Reasoning with Knowledge Base
• Each entity in a KB is represented by an vector• Predict whether is true by • Linear: or Bilinear:
• Recent work on KB embedding•RESCAL [Nickel+, ICML-11], SME [Bordes+, AISTATS-12], NTN [Socher+, NIPS-13], TransE [Bordes+, NIPS-13]• Train on existing facts (e.g., triples)• Ignore relational domain knowledge available in the KB (e.g., ontology)
Knowledge Base Embedding
• Example – type constraint can be true only if
• Example – common sense can be true only if
Relational Domain Knowledge
• KB embedding via Tensor Decomposition• Entity vector, Relation matrix
• Relational domain knowledge• Type information and constraints•Only legitimate entities are included in the loss
• Benefits of leveraging type information• Faster model training time•Highly scalable to large KB•Higher prediction accuracy
• Application to Relation Extraction
Typed Tensor Decomposition – TRESCAL
• Introduction• KB embedding via Tensor Decomposition• Typed tensor decomposition (TRESCAL)• Experiments• Discussion & Conclusions
Road Map
• Collection of subj-pred-obj triples –
Knowledge Base Representation (1/2)
Subject Predicate Object
Obama Born-in Hawaii
Bill Gates Nationality USA
Bill Clinton
Spouse-of Hillary Clinton
Satya Nadella
Work-at Microsoft
… … …
• Collection of subj-pred-obj triples –
Knowledge Base Representation (1/2)
Subject Predicate Object
Obama Born-in Hawaii
Bill Gates Nationality USA
Bill Clinton
Spouse-of Hillary Clinton
Satya Nadella
Work-at Microsoft
… … …
: # entities, : # relations
Knowledge Base Representation (2/2)
e1 « en
e 1 «
e n χχk 𝒳𝑘
: born-in
Hawaii
Obama 1
-th slice
Knowledge Base Representation (2/2)
e1 « en
e 1 «
e n χχk 𝒳𝑘
: born-in
Hawaii
Obama 1
-th slice
A zero entry means either:• Incorrect (false)• Unknown
• Objective:
Tensor Decomposition Objective
~~ × ×
𝒳𝑘 𝐀𝐀𝑇ℛ𝑘
12 (∑
𝑘‖𝒳𝑘−𝐀ℛ 𝑘𝐀
𝑇‖𝐹2 )+ 1
2 (‖𝐴‖𝐹2+∑
𝑘‖ℛ𝑘‖𝐹
2 )
RESCAL [Nickel+, ICML-11]
Reconstruction Error Regularization
-th relation
Measure the Degree of a Relationship
× ×
𝐀𝐀𝑇ℛborn − in
Hawaii
Obama
• Introduction• KB embedding via Tensor Decomposition• Typed tensor decomposition (TRESCAL)•Basic idea• Training procedure•Complexity analysis• Experiments• Discussion & Conclusions
Road Map
• Reconstruction error:
Typed Tensor Decomposition Objective
12∑𝑘 ‖𝒳𝑘−𝐀ℛ𝑘𝐀
𝑇‖𝐹2
~~ × ×
𝒳𝑘 𝐀𝐀𝑇ℛ𝑘
• Reconstruction error:
Typed Tensor Decomposition Objective
12∑𝑘 ‖𝒳𝑘−𝐀ℛ𝑘𝐀
𝑇‖𝐹2
~~ × ×
𝒳𝑘 𝐀𝐀𝑇ℛ𝑘
Relation: born-in
• Reconstruction error:
Typed Tensor Decomposition Objective
12∑𝑘 ‖𝒳𝑘−𝐀ℛ𝑘𝐀
𝑇‖𝐹2
~~ × ×
𝒳𝑘 𝐀𝐀𝑇ℛ𝑘
people Relation: born-in
• Reconstruction error:
Typed Tensor Decomposition Objective
12∑𝑘 ‖𝒳𝑘−𝐀ℛ𝑘𝐀
𝑇‖𝐹2
~~ × ×
𝒳𝑘 𝐀𝐀𝑇ℛ𝑘
locations
people Relation: born-in
• Reconstruction error:
Typed Tensor Decomposition Objective
12∑𝑘 ‖𝒳𝑘
′ −𝐀𝑘𝑙ℛ𝑘𝐀𝑘𝑟
𝑇‖𝐹2
~~ × ×
𝒳𝑘′ 𝐀𝑘𝑙 𝐀𝑘𝑟
𝑇ℛ𝑘
Training Procedure – Alternating Least-Squares (ALS) Method
𝐀←[∑𝑘 𝒳𝑘𝐀ℛ𝑘𝑇+𝒳𝑘
𝑇𝐀ℛ 𝑘] [∑𝑘 𝐵𝑘+𝐶𝑘+𝜆𝐈 ]−1
where .
𝐯𝐞𝐜 (ℛ𝑘 ) ← (𝐙T 𝐙+𝜆𝐈 )−1𝐙T 𝐯𝐞𝐜 (𝒳𝑘 )
where is vectorization, and is the Kronecker product.
Fix , update
Fix , update
Training Procedure – Alternating Least-Squares (ALS) Method
𝐀←[∑𝑘 𝒳𝑘𝐀ℛ𝑘𝑇+𝒳𝑘
𝑇𝐀ℛ 𝑘] [∑𝑘 𝐵𝑘+𝐶𝑘+𝜆𝐈 ]−1
where .
𝐯𝐞𝐜 (ℛ𝑘 ) ← (𝐙T 𝐙+𝜆𝐈 )−1𝐙T 𝐯𝐞𝐜 (𝒳𝑘 )
where is vectorization, and is the Kronecker product.
Fix , update
Training Procedure – Alternating Least-Squares (ALS) Method
𝐀←[∑𝑘 𝒳𝑘𝐀ℛ𝑘𝑇+𝒳𝑘
𝑇𝐀ℛ 𝑘] [∑𝑘 𝐵𝑘+𝐶𝑘+𝜆𝐈 ]−1
where .
𝐯𝐞𝐜 (ℛ𝑘 ) ← (𝐙T 𝐙+𝜆𝐈 )−1𝐙T 𝐯𝐞𝐜 (𝒳𝑘 )
where is vectorization, and is the Kronecker product.
Training Procedure – Alternating Least-Squares (ALS) Method
𝐀←[∑𝑘 𝒳𝑘′ 𝐀𝑘𝑟
ℛ𝑘T+𝒳𝑘
′ T𝐀𝑘𝑙ℛ𝑘] [∑𝑘 𝐵𝑘𝑟
+𝐶𝑘𝑙+𝜆𝐈 ]− 1
where .
𝐯𝐞𝐜 (ℛ𝑘 ) ← (𝐀𝑘𝑟
T 𝐀𝑘𝑟⨂𝐀𝑘𝑙
T 𝐀𝑘𝑙+𝜆𝐈 )−𝟏
× 𝐯𝐞𝐜 (𝐀𝑘𝑙
T 𝒳𝑘′ 𝐀𝑘𝑟
)
• Without Type information (RESCAL): • : # entities• : # non-zero entries• : # dimensions of projected entity vectors
• With Type information (TRESCAL): • : average # entities satisfying the type constraint
Complexity Analysis
• Introduction• KB embedding via Tensor Decomposition• Typed tensor decomposition (TRESCAL)• Experiments•KB Completion•Application to Relation Extraction• Discussion & Conclusions
Road Map
• KB – Never Ending Language Learning (NELL)• Training: version 165•Developing: new facts between v.166 and v.533• Testing: new facts between v.534 and v.745
• Data statistics of the training set
Experiments – KB Completion
# Entities 753k
# Relation Types 229
# Entity Types 300
# Entity-Relation Triples 1.8M
• Entity Retrieval: •One positive entity with 100 negative entities• Relation Retrieval: • Positive entity pairs with equal number of negative pairs
• Baselines:
Tasks & Baselines
RESCAL[Nickel+, ICML-11]
𝑒𝑖 𝑒 𝑗
𝑟𝑘
TransE[Bordes+, NIPS-13]
Training Time Reduction
• Both models finish training in 10 iterations.• TRESCAL filters 96% entity triples with incompatible
types.
1
2
0 5 10 15 20 25
4.46
20.5
Model Training Time (hours)
4.6x speed-up
Training Time Reduction
• # iterations for TransE is set to 500 (the default value).
1
2
0 10 20 30 40 50 60 70 80 90 100
4.46
96
Model Training Time (hours)
21.5x speed-up
Entity Retrieval
1 2 358.0%
60.0%
62.0%
64.0%
66.0%
68.0%
70.0%
72.0%
67.56%
62.91%
69.26%
Mean Average Precision (MAP)
Relation Retrieval
1 2 368.0%
70.0%
72.0%
74.0%
76.0%
78.0%
70.71%
73.08%
75.70%
Mean Average Precision (MAP)
Experiments – Relation Extraction
Satya Nadella is the CEO of Microsoft.
(Satya Nadella , work-at, Microsoft)
• Row: Entity Pair• Column:
Relation
Relation Extraction as Matrix Factorization[Riedel+ 13]
Fig.1 of [Riedel+ 13]
• Raw data: NY Times corpus & Freebase• Entities in NY Times and Freebase are aligned• Raw tensor construction• 80,698 entities & 1,652 relations• Type information from Freebase & NER• Type constraints are derived from training data
• Task – identify FB relations of entity pairs in text• 10,000 entity pairs: 2,048 have both entities in FB• Evaluation metric – Weighted mean average precision (MAP) on 19 relations
Data & Task Description
Relation Extraction
1 2 3 4 50.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.490.52
0.58
0.70.72
Chart Title
• Evaluated using only 2,048 FB entity pairs
[updated version]
Relation Extraction
1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.330.36
0.39
0.47
0.57
Chart Title
• Evaluated using all 10,000 entity pairs
• TRESCAL: A KB embedding model via tensor decomposition• Leverages entity type constraint•Faster model training time•Highly scalable to large KB•Higher prediction accuracy•Application to relation extraction
• Challenges & Future Work•Capture more types of relational domain knowledge • Support more sophisticated inferential tasks
Conclusions