+ All Categories
Home > Data & Analytics > UWN: A Large Multilingual Lexical Knowledge Base

UWN: A Large Multilingual Lexical Knowledge Base

Date post: 19-Jun-2015
Category:
Upload: gerard-de-melo
View: 152 times
Download: 1 times
Share this document with a friend
Description:
We present UWN, a large multilingual lexical knowledge base that describes the meanings and relationships of words in over 200 languages. This paper explains how link prediction, information integration and taxonomy induction methods have been used to build UWN based on WordNet and extend it with millions of named entities from Wikipedia. We additionally introduce extensions to cover lexical relationships, frame-semantic knowledge, and language data. An online interface provides human access to the data, while a software API enables applications to look up over 16 million words and names.
Popular Tags:
1
Step 1: Link Prediction Step 1: Link Prediction UWN's Multilingual Graph UWN's Multilingual Graph Goal: Richer, Less Sparse Features How: Model Synonymy, Polysemy, Semantic Relatedness, Taxonomy. (within and across languages) UWN: A Large Multilingual Lexical Knowledge Base Gerard de Melo and Gerhard Weikum ICSI Berkeley / Max Planck Institute for Informatics Better NLP Features using Lexical Semantics Better NLP Features using Lexical Semantics More Information: www.lexvo.org/gdm/ Downloadable API available Web User Interface Entity Entity por: “entidade” por: “entidade” cmn: “ 制度cmn: “ 制度Institution Institution Educational institution Educational institution University University heb: “ישות.heb: “ישות.deu: “Bildungs- einrichtung” deu: “Bildungs- einrichtung” srp: “универзитете” srp: “универзитете” ... University of California, Berkeley University of California, Berkeley eng: “Berkeley ” eng: “Berkeley ” ara: كينونةوجود،ara: كينونةوجود،tha: “ สถาบันtha: “ สถาบันfin: “oppilaitos” fin: “oppilaitos” fin: “yliopisto” fin: “yliopisto” cmn: 柏克萊加州大學cmn: 柏克萊加州大學Berkeley, CA Berkeley, CA George Berkeley George Berkeley deu: “Schulgebäude” deu: “Schulgebäude” school (group of fish) school (group of fish) school (institution) school (institution) school (building) school (building) deu: “Schulhaus” deu: “Schulhaus” deu: “Fischschwarm” deu: “Fischschwarm” ces: “hejno” ces: “hejno” fra: “banc” fra: “banc” chv: “шкул” chv: “шкул” jpn: “ 学校” jpn: “ 学校” kor: “ 학교” kor: “ 학교” lao: “ໂຮງຮຽນlao: “ໂຮງຮຽນkat: “სკოლაkat: “სკოლაOver 16 million words and names in over 200 languages semantically connected Ambiguity and synonymy captured eng: “UC Berkeley” eng: “UC Berkeley” eng: “Cal” eng: “Cal” City City Geopolitical Entity Geopolitical Entity Chuvash Chuvash Georgian Georgian Lexvo.org Language Descriptions: Languages Scripts Characters Countries Cyrllic (Script) Cyrllic (Script) Russia (Country) Russia (Country) UWN: Meaning Distinctions Ontological Taxonomy Encyclopedic Knowledge, Pictures, Video, Sounds, Maps Etymological and other word relationships Millions of Named Entities (People, Places, Proteins, Asteroids, Companies, etc.) 200+ languages Step 2: Entity Integration Step 2: Entity Integration Step 3: Taxonomy Induction Step 3: Taxonomy Induction Extras Extras Markov Chain to rank taxonomic parents 270 Wikipedia taxonomies integrated with WordNet's hypernym hierarchy es: Televisor es: Televisor es: Televisión es: Televisión ru: Телевизор ru: Телевизор hi: दूरदर्शन hi: दूरदर्शन ja: テレビ ja: テレビ en: Television en: Television en: Television set en: Television set zh: 电视机 zh: 电视机 ja: テレビ受像機 ja: テレビ受像機 en: TV set en: TV set en: T.V. en: T.V. V 1 ,u V 1 ,u V 1 ,v V 1 ,v LP for constraint-based computation of equivalence classes of entities Region Growing approximation algorithm Link multilingual words to WordNet Connect Wikipedia with WordNet (equivalence and taxonomic links) FrameNet Linking Common-Sense Knowledge Extraction Multilingual Roget's Thesaurus
Transcript
Page 1: UWN: A Large Multilingual Lexical Knowledge Base

Step 1: Link PredictionStep 1: Link Prediction

UWN's Multilingual GraphUWN's Multilingual Graph

• Goal: Richer, Less Sparse Features• How: Model Synonymy, Polysemy, Semantic Relatedness, Taxonomy. (within and across languages)

UWN: A Large MultilingualLexical Knowledge Base

Gerard de Melo and Gerhard WeikumICSI Berkeley / Max Planck Institute for Informatics

Better NLP Features using Lexical SemanticsBetter NLP Features using Lexical Semantics

More Information:www.lexvo.org/gdm/

• Downloadable API available

• Web User Interface

EntityEntitypor: “entidade”por: “entidade”

cmn: “制度”cmn: “制度” InstitutionInstitution

Educationalinstitution

Educationalinstitution

UniversityUniversity

heb: “ישות.”heb: “ישות.”

deu: “Bildungs-einrichtung”

deu: “Bildungs-einrichtung”

srp:“универзитете”

srp:“универзитете”

...

University of California, Berkeley

University of California, Berkeley

eng: “Berkeley ”eng: “Berkeley ”

ara: ”وجود، كينونة“

ara: ”وجود، كينونة“

tha: “ สถาบัน”tha: “ สถาบัน”

fin: “oppilaitos”fin: “oppilaitos”

fin: “yliopisto”fin: “yliopisto”

cmn: “柏克萊加州大學”

cmn: “柏克萊加州大學”

Berkeley, CABerkeley, CA

George BerkeleyGeorge Berkeley

deu: “Schulgebäude”deu: “Schulgebäude”

school (group of fish)

school (group of fish)

school(institution)

school(institution)

school(building)school

(building)

deu: “Schulhaus”deu: “Schulhaus”

deu: “Fischschwarm”deu: “Fischschwarm”

ces: “hejno”ces: “hejno”

fra: “banc”fra: “banc”

chv: “шкул”chv: “шкул”

jpn: “学校”jpn: “学校”

kor: “학교”kor: “학교”

lao: “ໂຮງຮຽນ”lao: “ໂຮງຮຽນ”

kat: “სკოლა”kat: “სკოლა”

• Over 16 million words and names in over 200 languages semantically connected

• Ambiguity and synonymy captured

eng: “UC Berkeley”eng: “UC Berkeley” eng: “Cal”eng: “Cal”

CityCity

GeopoliticalEntity

GeopoliticalEntity

ChuvashChuvash

GeorgianGeorgian

Lexvo.org LanguageDescriptions:LanguagesScriptsCharactersCountries

Cyrllic(Script) Cyrllic(Script)

Russia (Country)Russia

(Country)

UWN: Meaning Distinctions

OntologicalTaxonomy

Encyclopedic Knowledge,

Pictures, Video,

Sounds, Maps

Etymological and other word

relationships

Millions of Named Entities(People, Places,

Proteins, Asteroids,

Companies, etc.)

200+ languages

Step 2: Entity IntegrationStep 2: Entity Integration

Step 3: Taxonomy InductionStep 3: Taxonomy Induction ExtrasExtras

• Markov Chain to rank taxonomic parents• 270 Wikipedia taxonomies integrated with WordNet's hypernym hierarchy

es: Televisores: Televisor

es: Televisiónes: Televisión

ru: Телевизорru: Телевизор

hi: दूरदर्शनhi: दूरदर्शन

ja: テレビja: テレビ

en: Televisionen: Television

en:Television

set

en:Television

set

zh: 电视机zh: 电视机

ja: テレビ受像機ja: テレビ受像機

en: TV seten: TV set

en: T.V.en: T.V.

V1 ,u

V1 ,u

V1 ,v

V1 ,v

• LP for constraint-based computation of equivalence classes of entities• Region Growing approximation algorithm

• Link multilingual words to WordNet• Connect Wikipedia with WordNet (equivalence and

taxonomic links)

• FrameNet Linking• Common-Sense Knowledge Extraction

• Multilingual Roget's Thesaurus

Recommended