+ All Categories
Home > Documents > Barack’s Wife Hillarynfliu/papers/logan+liu+peters+gardner+singh... · Experiments Barack’s...

Barack’s Wife Hillarynfliu/papers/logan+liu+peters+gardner+singh... · Experiments Barack’s...

Date post: 12-Mar-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
1
Experiments Barack’s Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling Robert L. Logan IV * Nelson F. Liu †§ Matthew E. Peters § Matt Gardner § Sameer Singh * University of California, Irvine, CA, USA, University of Washington, Seattle, WA, USA, § Allen Institute for Artificial Intelligence, Seattle, WA, USA For any questions, email: [email protected] Resources Code Dataset rloganiv.github.io/linked-wikitext-2 github.com/rloganiv/kglm-model Traditional language models have limited ability to generate factually correct text. We introduce the knowledge graph language model (KGLM), a neural language model with mechanisms for generating information from a knowledge graph. We collect the Linked WikiText-2 dataset, which aligns WikiText-2 to the Wikidata knowledge graph. Experiments show that the KGLM has better perplexity than AWD-LSTM-LM, and better fact-completion capabilities than GPT-2 small despite being trained on less data. Summary Motivating Example Generative Story & Model Linked WikiText-2 Dataset Train Dev Test Documents 600 60 60 Tokens 2M 200K 236K Vocabulary Size 33K - - Mention Tokens 207K 21K 24K Mention Spans 123K 12K 15K Unique Entities 41K 5.4K 5.6K Unique Relations 1.2K 484 504 PPL UPP ENTITYNLM * 85.4 189.2 EntityCopyNet * 76.1 144.0 AWD-LSTM 74.8 165.8 KGLM * 44.1 88.5 Input Sentence Gold GPT-2 KGLM Both Correct Paris Hilton was born in ____ New York City New New Arnold Schwarzenegger was born on ____ 1947-07-30 July 30 KGLM Correct Bob Dylan was born in ____ Duluth New Duluth Ulysses is a book that was written by ____ James Joyce a James GPTv2 Correct St. Louis is a city in the state of ____ Missouri Missouri Oldham Kanye West is married to ____ Kim Kardashian Kim the Both Wrong The capital of India is ____ New Dehli the a Madonna is married to ____ Carlos Leon a Alex Perplexity Fact Completion Examples Tokens Super Mario Land is a 1989 side - scrolling platform video game Mention Type new related new related Entity Mentioned SML 4-21-1989 SIDE_SCROLL PVG Relation pub. date genre Parent Entity SML SML developed and published by Nintendo as a launch title for their Game Boy related new related NIN LT GAME_BOY publisher manuf./platform SML NIN/ SML Example Annotation Dataset Statistics * Obtained using importance sampling Unknown Penalty Fact Completion
Transcript
Page 1: Barack’s Wife Hillarynfliu/papers/logan+liu+peters+gardner+singh... · Experiments Barack’s Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling Robert L. Logan

Experiments

Barack’s Wife Hillary:Using Knowledge Graphs for Fact-Aware Language ModelingRobert L. Logan IV* Nelson F. Liu†§ Matthew E. Peters§ Matt Gardner§ Sameer Singh*

∗University of California, Irvine, CA, USA, †University of Washington, Seattle, WA, USA, §Allen Institute for Artificial Intelligence, Seattle, WA, USA

For any questions, email: [email protected]

Resources

CodeDataset

rloganiv.github.io/linked-wikitext-2

github.com/rloganiv/kglm-model

● Traditional language models have limited ability to generate factually correct text.

● We introduce the knowledge graph language model (KGLM), a neural language model with mechanisms for generating information from a knowledge graph.

● We collect the Linked WikiText-2 dataset, which aligns WikiText-2 to the Wikidata knowledge graph.

● Experiments show that the KGLM has better perplexity than AWD-LSTM-LM, and better fact-completion capabilities than GPT-2 small despite being trained on less data.

Summary

Motivating Example

Generative Story & Model

Linked WikiText-2 Dataset

Train Dev TestDocuments 600 60 60

Tokens 2M 200K 236KVocabulary

Size 33K - -

Mention Tokens 207K 21K 24K

Mention Spans 123K 12K 15KUnique Entities 41K 5.4K 5.6K

Unique Relations 1.2K 484 504

PPL UPPENTITYNLM* 85.4 189.2

EntityCopyNet* 76.1 144.0AWD-LSTM 74.8 165.8KGLM* 44.1 88.5

Input Sentence Gold GPT-2 KGLM

BothCorrect

Paris Hilton was born in ____ New York City New NewArnold Schwarzenegger was born on ____ 1947-07-30 July 30

KGLMCorrect

Bob Dylan was born in ____ Duluth New DuluthUlysses is a book that was written by ____ James Joyce a James

GPTv2Correct

St. Louis is a city in the state of ____ Missouri Missouri OldhamKanye West is married to ____ Kim Kardashian Kim the

BothWrong

The capital of India is ____ New Dehli the aMadonna is married to ____ Carlos Leon a Alex

PerplexityFact Completion Examples

Tokens Super Mario Land is a 1989 side - scrolling platform video gameMention Type new related new related

Entity Mentioned SML 4-21-1989 SIDE_SCROLL PVG

Relation pub. date genre

Parent Entity SML SML

developed and published by Nintendo as a launch title for their Game Boyrelated new relatedNIN LT GAME_BOY

publisher manuf./platformSML NIN/SML

Example Annotation Dataset Statistics

*Obtained using importance sampling

Unknown Penalty

AWD-LSTM GPT-2KGLM

Oracle NEL

nation-capital 0 / 0 6 / 7 0 / 0 0 / 4

birthloc 0 / 9 14 / 14 94 / 95 85 / 92

birthdate 0 / 25 8 / 9 65 / 68 61 / 67

spouse 0 / 0 2 / 3 2 / 2 1 / 19

city-state 0 / 13 62 / 62 9 / 59 4 / 59

book-author 0 / 2 0 / 0 61 / 62 25 / 28

Average 0 / 8.2 15.3 / 15.8 38.5 / 47.7 29.3 / 44.8

Fact Completion Results (Hits@1 / Hits@5)

Fact Completion

Recommended