+ All Categories
Home > Documents > Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf ·...

Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf ·...

Date post: 12-Oct-2018
Category:
Upload: letuong
View: 222 times
Download: 0 times
Share this document with a friend
107
HG8003 Technologically Speaking: The intersection of language and technology. Words, Lexicons and Ontologies Francis Bond Division of Linguistics and Multilingual Studies http://www3.ntu.edu.sg/home/fcbond/ [email protected] Lecture 4 Location: LT8 HG8003 (2014)
Transcript
Page 1: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

HG8003 Technologically Speaking:The intersection of language and technology.

Words, Lexicons and Ontologies

Francis BondDivision of Linguistics and Multilingual Studieshttp://www3.ntu.edu.sg/home/fcbond/

[email protected]

Lecture 4Location: LT8

HG8003 (2014)

Page 2: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Schedule

Lec. Date Topic1 01-16 Introduction, Organization: Overview of NLP; Main Issues2 01-23 Representing Language3 02-06 Representing Meaning4 02-13 Words, Lexicons and Ontologies5 02-20 Text Mining and Knowledge Acquisition Quiz6 02-27 Structured Text and the Semantic Web

Recess7 03-13 Citation, Reputation and PageRank8 03-20 Introduction to MT, Empirical NLP9 03-27 Analysis, Tagging, Parsing and Generation Quiz

10 Video Statistical and Example-based MT11 04-03 Transfer and Word Sense Disambiguation12 04-10 Review and Conclusions

Exam 05-06 17:00

➣ Video week 10

Words, Lexicons and Ontologies 1

Page 3: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Review of Meaning

Words, Lexicons and Ontologies 2

Page 4: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Review of Representing Meaning

➣ Three ways of defining meaning

➢ Attributional (Compositional)➢ Relational➢ Distributional

➣ the Syntax-Semantic Interface

➢ Usage ⇀↽ Meaning

Words, Lexicons and Ontologies 3

Page 5: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Attributional Meaning

➣ Give a semantic description of word use in isolation of the categorisationof other lexical items

➢ definitions➢ decompositional semantics (break down into primitives)

➣ Easy for humans to understand

➣ Hard to decide on sense boundaries (granularity: splitters vs. lumpers)

➣ Definitions are circular (the grounding problem)

➣ Hard to be consistent

Words, Lexicons and Ontologies 4

Page 6: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Relational Meaning

➣ Capture correspondences between lexical items by way of a finite set ofpre-defined semantic relations

➣ Methodologies:

➢ lexical relations➢ constructional relations

➣ Captures many generalizations usefully

➣ Hard to make complete

➣ Leads to large, complex graphs

Words, Lexicons and Ontologies 5

Page 7: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Distributional Meaning

➣ Capture word meanings as collections of contexts in which words appear

➢ n-grams➢ syntactic relations➢ sentences➢ documents

➣ Good for synonymy, not so good for antonymy

➣ Computationally tractable

Words, Lexicons and Ontologies 6

Page 8: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Why are dictionaries important?

➣ For humans

➢ find meaning of unknown words➢ find more information about known words➢ codify knowledge about word usage (glossaries)

➣ For machines

➢ store information about words➢ link between text and knowledge

Words, Lexicons and Ontologies 7

Page 9: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Words

Words, Lexicons and Ontologies 8

Page 10: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Introduction to Words, Lexicons and Ontologies

➣ Design and implementation

➢ Machine Readable Dictionaries➢ Morphological lexicons➢ Syntactic lexicons➢ Semantic lexicons➢ Ontologies

➣ Construction and Maintenance

➢ Construction from scratch➢ Boot-strapping from existing resources➢ Ensuring consistency

Words, Lexicons and Ontologies 9

Page 11: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Machine Readable Dictionaries (MRDs)

➣ Human dictionaries made available on machine

➢ Electronic Dictionaries➢ Dictionary Applications

∗ often with automatic word lookup➢ On-line dictionaries

∗ Sometimes with glosses

Words, Lexicons and Ontologies 10

Page 12: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

A typical entry

definition (n) a concise explanation of the meaning of a word or phraseor symbol

➣ Headword: definition

➣ Part of Speech: n (noun)

➣ Definition:

➢ genus: explanation➢ differentia: concise; of the meaning of a word or phrase or symbol

? Implied: countable (a), regular plural

Words, Lexicons and Ontologies 11

Page 13: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Parts-of-Speech (POS)

➣ Traditional Grammar has eight:Noun, Verb, Adjective, Adverb (open class)Conjunction, Preposition, Pronoun, Interjection (closed class)

➣ In the US, the Penn Treebank POS set is de-facto standard:

➢ http://www.comp.leeds.ac.uk/ccalas/tagsets/upenn.html➢ 45 tags (including punctuation)

➣ In Europe, CLAWS tagset is popular

➢ http://ucrel.lancs.ac.uk/claws7tags.html➢ 137 tags (without punctuation)

Words, Lexicons and Ontologies 12

Page 14: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Penn Treebank Examples (14/45)

Tag Description Tag DescriptionNN Noun, singular or mass VB Verb, base formNNS Noun, plural VBD Verb, past tenseNNP Proper noun, singular VBG Verb, gerund or present participleNNPS Proper noun, plural VBN Verb, past participlePRP Personal pronoun VBP Verb, non-3rd person singular presentIN Preposition VBZ Verb, 3rd person singular presentTO to . Sentence Final punct (.,?,!)

➣ The tags include inflectional information

➢ If you know the tag, you can generally find the lemma

➣ Some tags are very specialized: I/PRP wanted/VBD to/TO go/VB ./.

Words, Lexicons and Ontologies 13

Page 15: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Good Definitions

➣ a definition should be simpler than the word being explained

➣ the definition should match the part of speechdefinition (n) a concise explanation of the meaning of a word or phraseldefine (v) – give a definition for the meaning of a word; “Define ‘sadness”’

➣ the definition should not be circular

➣ all words in the definition should be defined (somewhere)

➢ prefer small defining vocabulary➢ only use metalanguage (NSM: Natural Semantic Metalanguage)

Words, Lexicons and Ontologies 14

Page 16: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Circular definitions

beauty the state of being beautiful

beautiful full of beauty

bobcat a lynx

lynx a bobcat

Words, Lexicons and Ontologies 15

Page 17: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Other useful information

http://en.wiktionary.org/wiki/lynx

➣ Pronunciation

➣ Usage Examples

➣ Illustrations

➣ Etymology (history of the word)

➣ Links to other resources

Easier to do without the space restrictions of a paper dictionary.

Words, Lexicons and Ontologies 16

Page 18: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Dictionaries for NLP

Minimize content in order to minimize acquisition problem.

Declarativity and human readability with compilation into a machine-friendly representation.

Modularity so components are reusable: e.g. distinct monolingual andtransfer lexicons in an MT system.

Capture generalizations with inheritance (and lexical rules etc). Avoidserrors, easier to maintain and expand.

Underspecification to reduce disambiguation for a particular application.

(Copestake, 1992) 17

Page 19: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Morphological Analysis

森 永 前 日 銀 総 裁rin ei zen hi gin sou saimori mae nichimorinaga zennichi gin sousai

morinaga zen nichigin sousai

➣ 森永 前 日銀 総裁Morinaga former Bank of Japan President

Words, Lexicons and Ontologies 18

Page 20: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Morphological Lexicons

➣ Stem

➣ Inflectional Class

➣ Part of Speech (often 1-200)

➣ Arguments (?)

➣ For example

➢ Relations: 前 -総裁➢ Arguments: 前(総裁)➢ Abstraction: 前(title);総裁 ⊂ title

Words, Lexicons and Ontologies 19

Page 21: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Morphological Lexicon

➣ I fabricate for a living

➣ I make things for a living

➣ I fabricated yesterday

➣ I made things for a living

➣ These are differences in the inflectional class

Words, Lexicons and Ontologies 20

Page 22: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Inflection

➣ Inflection: In many languages, words appear in different forms toshow small differences in meaning: for example number (dog/dogs;child/children) or tense/aspect (make/made/making/made; take/took/taking/taken)

➣ Many words pattern the same way, this is called an inflectional class (orparadigm). For example, one class of plurals in English is words that endin y : fly/flies; sky/skies.

➣ The inflectional class is normally not predictable from the meaning orsyntax; and so must be stored for each word

➣ The root form (lemma) and the inflected form have the same meaningmodulo the number/tense/. . . and the same basic part-of-speech

➣ Normally a word only undergoes one inflection

Words, Lexicons and Ontologies 21

Page 23: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Derivation

➣ New words can also be created by changing the form. If the part ofspeech or meaning changes, we call it derivation: (happy/happiness;happy/unhappy ; happy/happily)

➣ You can also get zero derivation, where the meaning changes without achange in form (I butter the bread/I like butter.

➣ The root form in derivation is called the stem and the process of strippingoff derivational affixes is stemming

➣ You can have multiple derivations: anti-dis-establish-ment-arian-ism: thestem is establish

➣ Derivation is largely but not entirely productive: employer, teacher,*studier, actor, contractor

Words, Lexicons and Ontologies 22

Page 24: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Syntactic Lexicon

➣ I fabricated the results

➣ I made up the results

➣ = I made the results up

➣ I walked down the road

➣ 6= I walked the road down

➣ These are differences in the syntactic lexical type

Words, Lexicons and Ontologies 23

Page 25: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Differences in Argument Structure

➣ These are also differences in the syntactic lexical type

➢ I gave the book to him➢ I gave him the book➢ Cats eat mice➢ Cats eat➢ Cats devour mice➢ *Cats devour mice

➣ The information about what arguments a verb can take is also calledsubcategorization, valence or argument frame

Words, Lexicons and Ontologies 24

Page 26: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Semantic Lexicon

➣ I deposited the money in the bank (financial)

➣ The river overflowed its bank (riverside)

➣ I had lunch by the bank (???)

➣ These are differences in the semantic class

Words, Lexicons and Ontologies 25

Page 27: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

All the possibilities combine

➣ I saw her duck

➢ see/saw, saw/sawed➢ duckN , duckV➢ duckN :cloth, duckN :bird

➣ Still useful to keep separate

➢ inflectional paradigm➢ arguments (subcategorization)➢ semantics (selectional preferences)

Words, Lexicons and Ontologies 26

Page 28: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Transfer Lexicons

➣ bank ↔銀行 ginkou

➣ bank ↔土手 dote

➣ 鼻 hana ↔ nose

➢ trunk [of elephant]➢ muzzle [of horse]➢ snout [of boar]

Words, Lexicons and Ontologies 27

Page 29: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Dictionaries in Processing

➣ Lexical lookup is slow (disk-based)

➢ Compile dictionaries into compressed format➢ Index➢ Cache the index

cache = load it into memory➢ Cache already accessed entries➢ Keep a list of frequent entries and cache them

the most frequent words are very frequent

➣ Batch check for consistency off-line

Words, Lexicons and Ontologies 28

Page 30: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Dictionaries and Intellectual Property Rights (IPR)

➣ Lexicography has along tradition of extending other’s work

➢ Johnson, Murray, . . .➢ Language itself should not be restricted➢ Dictionaries describe language

➣ Restricted resources lose to open resources

➢ Work on restricted resources is wasted

➣ It is hard to fund maintenance

➢ Users of the lexicons are the best developers

Words, Lexicons and Ontologies 29

Page 31: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Online Dictionaries

➣ Mouse-over lookup (http://www.polarcloud.com/rikaichan/)

➣ No space restrictions (JMDIct)

➣ Collaborative Construction (Wiktionary)

➣ Easy cross-referencing (WordNet)

➣ Easy to link dictionaries (Open Multilingual WordNet)http://compling.hss.ntu.edu.sg/omw/

Words, Lexicons and Ontologies 30

Page 32: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Ontologies

Words, Lexicons and Ontologies 31

Page 33: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

What is an Ontology

➣ A set of statements in a formal languagethat describes/conceptualizes knowledge in a given domain

➢ What kinds of entities exist (in that domain)➢ What kinds of relationships hold among them

➣ Ontologies usually assume a particular level of granularity

➢ doesn’t capture all details

32

Page 34: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

In Other Words

➣ In theory

“An ontology is a formal, explicit specification of a sharedconceptualisation.” (Gruber, 1993)

➣ More generally

An ontology provides a shared vocabulary, which can be used tomodel a domain, that is, the type of objects and/or concepts that exist,and their properties and relations

Words, Lexicons and Ontologies 33

Page 35: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Why use Ontologies?

➣ To make different markup terminologies transparent

➣ Placing the focus on the meaning of the markup (not its form)

➣ To make implicit knowledge explicit

➣ To ensure that the knowledge is consistentor at least consistently formatted;

Words, Lexicons and Ontologies 34

Page 36: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

What does an Ontology consist of?

Classes abstract groups, sets, or collections of objectscountry, language

Individuals Actual things or conceptsJapan, Japanese

Attributes Properties of classes or individuals∗

Japan AREA 377,873 km2

Relations Relations between classes or individualsJapanese SPOKEN-IN Japan

Words, Lexicons and Ontologies 35

Page 37: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Objects can be complex

➣ relation LOCATED-IN has the attribute TRANSITIVE

ADD3 LOCATED-IN BangkokBangkok LOCATED-IN Thailand⇒ ADD3 LOCATED-IN Thailand

Words, Lexicons and Ontologies 36

Page 38: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Some Examples

➣ Disease Ontologyhttp://diseaseontology.sourceforge.net

➣ Dublin Corehttp://dublincore.org/

➣ GOLD (General Ontology for Linguistic Description)http://www.linguistics-ontology.org/gold.html

➣ WordNet (Fellbaum, 1998)http://wordnet.princeton.edu/

Words, Lexicons and Ontologies 37

Page 39: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Disease Ontology

➣ A controlled medical vocabulary

➣ developed at the Bioinformatics Core Facility

➣ designed to map diseases to medical codessuch as ICD9CM, SNOMED and others

➣ an early version of the Disease Ontology

➢ doubled concept coverage➢ reduced the overall misclassification error percentage

Words, Lexicons and Ontologies 38

Page 40: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Ontology linked to Terminology

Concept Term StringsC316301 T657210 bovine spongiform encephalopathy, BSE

T657211 mad cow disease, Mad Cow Disease, MCDT734567 encephalopathy spongiforme bovine, ESBT734566 maladie de la vache folle, MVFT700345 encefalopatia espongiforma bovina, EEBT700346 enfermedad de la vaca loca, EVL

➣ Disease Ontology V2.1 2005➢ diseases and injuries

∗ Unspecified infectious and parasitic diseases· poliomyelitis and other non-arthropod-borne viral diseases

+ Unspecified slow virus infection of central nervous system

Words, Lexicons and Ontologies 39

Page 41: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Disease Ontology

➣ A lightweight ontology with a direct application

➣ Maintained by cooperative editing

➣ One of several linked medical ontologies

➣ Successful Application

Words, Lexicons and Ontologies 40

Page 42: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Dublin Core

➣ Goals

➢ Provides a semantic vocabulary for describing the “core” informationproperties of resources (electronic and “real” physical objects)

➢ Provide enough information to enable intelligent resource discoverysystems

➣ History

➢ A collaborative effort started in 1995➢ Initiated by people from computer science, librarianship, on-line

information services, abstracting and indexing, imaging and geospatialdata, museum and archive control.

http://www.tutorialsonline.info/Common/DublinCore.html 41

Page 43: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Dublin Core - 15 Elements

➣ Content (7)

➢ Title, Subject, Description, Type, Source, Relation and Coverage

➣ Intellectual property (4)

➢ Creator, Publisher, Contributor, Rights

➣ Instantiation (4)

➢ Date, Language, Format, Identifier

Words, Lexicons and Ontologies 42

Page 44: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Dublin Core – discussion

➣ Widely used to catalog web data

➢ OLAC: Open Language Archives Community➢ The MusicBrainz Project: http://www.musicbrainz.org

The musicbrainz project is run by volunteers that are defining ametadata standard for music recordings. This metadata standardis an extension of the Dublin Core. The goal of the project is todefine the metadata standard for music and to create a metadatacatalog of all music recordings around the world.

➢ Australian Government Locator Service (ALGS) http://www.agls.gov.au/AGLS was developed in late 1997 as the resource discovery metadatastandard for Australian governments and was endorsed for use by alllevels of government in Australia in November 1998.

➢ . . .

Words, Lexicons and Ontologies 43

Page 45: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

GOLD

➣ an upper ontology for descriptive linguistics

➣ providing a set of linguistic notions

➣ designed to link different grammatical descriptions

➣ PART-OF E-MELD ‘Electronic Metastructure for Endangered LanguagesData’

➣ USED-IN ODIN http://www.csufresno.edu/odin/

Words, Lexicons and Ontologies 44

Page 46: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Singular Number (concept)

Definition:

A value of numberFeature. Singular quantifies the denotation of thenominal element so that:

1. it specifies that there is exactly one. In this English example below, singularNumber isshown by both the noun and the verb in (1): See example (Corbett2000 : 5 )

2. additionally, but not necessarily, this value may be assigned on the basis of formalproperties (e.g. singularia tantum), or (health / *healths).

3. if singularNumber functions as generalNumber, it may specify a lack of commitment withregard to quantification. In this Japanese (jpn) example below, ’inu’ (dog) is not specifiedfor number: See example

Words, Lexicons and Ontologies 45

Page 47: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Sg: Usage note

➣ On terminology: the term ’singulative’ is sometimes used for the concept singularNumber,especially when singularNumber is overtly expressed. ’Singulative’ has been usedsometimes for singularNumber in systems where singularNumber is distinct fromgeneralNumber.

➣ It is worth bearing in mind that the expression of number can differ cross-linguisticallyaccording to the animacy hierarchy. See numberAssignmentSystem.

➣ A note on minimal/augmented systems (and also minimal/unit-augmented/augmented).In some languages which have an inclusive/exclusive distinction in the first person, thefirstPersonInclusive may use the morphology which otherwise expresses singularNumber,even though the semantics of firstPersonInclusive entail that it cannot be singular. Thereis an analysis of this in which the morphology is seen as representing the minimalnumber associated with the particular person value. Under such a system, the label’minimal’ can be mapped onto the concept singularNumber, except if one is dealingwith firstPersonInclusive minimal, which would be mapped onto the concept dualNumber.(Corbett2000 : 166-169 ) (Conklin1962 )

Words, Lexicons and Ontologies 46

Page 48: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

➣ There is an important theoretical question about whether minimal/unit-augmented/augmentedshould be considered separate concepts in the GOLD ontology. The main argument forthis is that under such systems, the number values dual and trial are expressed onlyon the firstPersonInclusive by using the morphology otherwise associated with singularand dual respectively. However, as it is possible to specify a mapping from one systemonto the other, we allow for a COPE to deal with this substantive issue while ensuringinteroperability.

Words, Lexicons and Ontologies 47

Page 49: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Sg: Cross references

Parent Number Feature

Siblings Dual Number, General Number, Greater Paucal Number, GreaterPlural Number, Paucal Number, Plural Number, Trial Number,

Words, Lexicons and Ontologies 48

Page 50: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

GOLD — Discussion

➣ Main emphasis on definitions and references

➣ Not yet widely used

➣ Linked to by some grammars (e.g. JACY, ERG)

➣ Ontology framework makes it easy to

➢ validate➢ refer-to

Words, Lexicons and Ontologies 49

Page 51: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

WordNet

➣ Princeton WordNet R©: is a large lexical database of English.

➣ Nouns, verbs, adjectives and adverbs grouped into sets of cognitivesynonyms (synsets), each expressing a distinct concept.

➣ Synsets interlinked

➢ hypernym/hyponym/instance (is-a)➢ meronym/part (has-a)➢ domain

➣ Free license

Words, Lexicons and Ontologies 50

Page 52: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Multilingual WordNets

➣ Over twenty languages (Polish, Serbian, Croatian, Hindi, Telugu, Tail,Malayalam to come)

➣ Linking many different projects with a common core

➣ Created at NTU

http://casta-net.jp/ kuribayashi/multi/ 51

Page 53: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Noun Relations (WordNet)

hypernyms: Y is a hypernym of X if every X is a (kind of) Y

hyponyms: Y is a hyponym of X if every Y is a (kind of) X

coordinate terms: Y is a coordinate term of X if X and Y share a hypernym

holonym: Y is a holonym of X if X is a part of Y

meronym: Y is a meronym of X if Y is a part of X

Words, Lexicons and Ontologies 52

Page 54: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Example: driver

1. (17) driver – (the operator of a motor vehicle)

2. driver – (someone who drives animals that pull a vehicle)

3. driver – (a golfer who hits the golf ball with a driver)

4. driver, device driver – ((computer science) a program that determines howa computer will communicate with a peripheral device)

5. driver, number one wood – (a golf club (a wood) with a near vertical facethat is used for hitting long shots from the tee)

Words, Lexicons and Ontologies 53

Page 55: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Verb Relations (WordNet)

hypernym: the verb Y is a hypernym of the verb X if the activity X is a (kindof) Y (travel to movement)

troponym: the verb Y is a troponym of the verb X if the activity Y is doing Xin some manner (lisp to talk)

entailment: the verb Y is entailed by X if by doing X you must be doing Y(sleeping by snoring)

coordinate terms: those verbs sharing a common hypernym

Words, Lexicons and Ontologies 54

Page 56: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Adjective and Adverb Relations (WN)

➣ Adjectives

antonymyrelated nounssimilar toparticiple of verb

➣ Adverbs

root adjectives

Words, Lexicons and Ontologies 55

Page 57: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Other Relations

domain:driver#n#3 “device driver” DOMAIN computing#n#1

derivationally related form:driver#n#3 “device driver” RELATED TO drive#n#20 “cause to functionby supplying the force or power for or by controlling”

Words, Lexicons and Ontologies 56

Page 58: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Usability and Accessibility

Usability :

➣ Originally designed for psycholinguistic experiments➣ Widely used in NLP

➢ PP attachment➢ WSD - senseval

Accessibility :

➣ downloadable➣ redistributable➣ actively maintained

Words, Lexicons and Ontologies 57

Page 59: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

There are many wordnets

➣ Because WordNet is both usable and accessible it has inspired thecreation of wordnets in many languages.

➢ There are over 60 wordnet projectshttp://globalwordnet.org/wordnets-in-the-world/

➢ Many have released open data (22 languages in 2013)http://compling.hss.ntu.edu.sg/omw/

➣ Here at NTU we are building several

➢ Japanese Wordnet (Isahara et al., 2008)http://nlpwww.nict.go.jp/wn-ja/index.en.html

➢ Wordnet Bahasa (Nurril Hirfana at al, 2011)http://wn-msa.sourceforge.net/

➢ Chinese Open Wordnet (Wang & Bond, 2013)http://compling.hss.ntu.edu.sg/cow/

Words, Lexicons and Ontologies 58

Page 60: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

The Japanese WordNet

http://nlpwww.nict.go.jp/wn-ja/index.en.html

(Isahara et al., 2008) 59

Page 61: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

SUMO

➣ Suggested Upper Merged Ontology (Niles & Pease, 2001; Pease, 2006)http://www.ontologyportal.org/

➣ IEEE sponsored free ontology (with Domain Ontologies)

➢ 20,000 terms➢ 70,000 axioms

➣ All WordNet synsets are mapped to the ontology

➣ Used as an upper ontology for many projects

Words, Lexicons and Ontologies 60

Page 62: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Curating Knowledge

Words, Lexicons and Ontologies 61

Page 63: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Lexicon Construction and Maintenance

➣ Hand construction

➣ Reuse of existing resources

➣ Machine readable dictionaries

➣ Corpus-based approaches

Words, Lexicons and Ontologies 62

Page 64: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Hand-coding

➣ Some one adds all the information about a word by hand

➣ Expensive (estimates from my time at NTT)

➢ 2,000 yen/verb➢ 200 yen/noun

➣ The traditional technique for full-blown, linguistically-motivated systems.

➣ For applications, hand-coding should be corpus-driven.

➣ Techniques exist to help the grammar engineer construct the lexicon, or to(partially) allow non-expert user to build lexicons.

Words, Lexicons and Ontologies 63

Page 65: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Reusing resources

The primary problem is building usable lexicons: if it’s usable then it’sreusable!

But there can be many problems:

➣ Lack of documentation, especially for semantics

➣ Cost/benefit - not worth reusing a 100 word lexicon

➣ Domain specificity

➣ Legal issues

There are many existing resources: wordlists (e.g. software companies),COMLEX, MRDs (machine readable dictionaries), Wordnet, EDR, . . .

Words, Lexicons and Ontologies 64

Page 66: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Machine-readable dictionaries

➣ Long history of research

➣ Limited explicit syntactic information,except in English learners’ dictionaries (OALD (Oxford), LDOCE(Longman), Cobuild, CIDE (Collaborative International Dictionary ofEnglish))

➣ Noun definitions can be analysed to derive taxonomies:inheritance hierarchies for semantic information.

➣ Less published work on verbs

➣ Aquilex, MindNet, Lexeed

Words, Lexicons and Ontologies 65

Page 67: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Deriving semantic information

Categorise lexical entries using predefined linguistically motivatedclasses. E.g. automatically derived taxonomies.

Sauternes a type of sweet gold-coloured French wine

1. find genus term (wine)

2. disambiguate genus term wrt MRD senses: wine1

3. (possibly) refine classification using differentia

Words, Lexicons and Ontologies 66

Page 68: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

MRDs: pros and cons

➣ Advantages:

➢ dictionaries are used for manual lexicon construction➢ broad coverage➢ a labour-saving resource to augment a manually constructed core

lexicon

➣ Disadvantages:

➢ time-consuming to process initially➢ highly variable quality➢ different dictionaries require different strategies➢ combining senses is non-trivial➢ dictionaries inadequate even for human learners (ambiguities, little

frequency information, obsolete or offensive terms, poor coverage ofidioms)

Words, Lexicons and Ontologies 67

Page 69: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

➢ publishers’ IPR (intellectual property rights)➢ restricted by printed medium

➣ Many problems partially solved by wiktionary

➢ Collaborative Lexical Construction

Words, Lexicons and Ontologies 68

Page 70: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Corpus-based acquisition techniques

➣ Test corpus essential for hand-coded lexicons

➣ Semi-automatic acquisition techniques:

➢ show examples in context (concordances)➢ filter automatically built entries

➣ You can learn monolingual information (typically not 100

➢ part of speech of unknown words➢ syntactic class (subcategorisation frames)

➣ Bilexicon acquisition from aligned corpora

➢ learn translations

More next week 69

Page 71: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Corpora: pros and cons

➣ Advantages:

➢ essential tool for human lexicographer➢ domain-specific terminology and translations➢ frequency information (some approaches)➢ possibility of addressing ambiguity problem

Words, Lexicons and Ontologies 70

Page 72: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

➣ Disadvantages:

➢ automatically processing corpora is a research challenge in itself➢ large scale corpus may not exist for particular domain or language

– parallel corpora especially difficult➢ For MT: large scale results only demonstrated with one-to-one

mappings, so far, automatic extraction unproven in full systems➢ Also unclear how well results transfer to different classes of text

Words, Lexicons and Ontologies 71

Page 73: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Lexicons: Ensuring Consistency

➣ Documentation is essential

➢ Lexicons/ontologies are built by groups of people➢ Combine documentation with examples

∗ Automatically test examples (e.g COMLEX: Thursday)∗ Link documentation to annotated corpora

➣ Exploit redundancy rulesif there is a correlation between two different classes

➢ uncountable ⇒ singular (test it or enforce it)

Words, Lexicons and Ontologies 72

Page 74: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

➣ Allow different views

➢ All words with a certain property or properties∗ POS∗ Semantic Class∗ Countability

➣ Add words class-by-classso that the meta-data is constant (POS, syntactic/semantic class)

➢ add all soccer player’s names (Diego, Best, . . . )➢ add all quotative verbs (say, tell, think, know, . . . )➢ add all time expressions (today, this morning, yesterday morning, last

night, tonight, tomorrow night, . . . )➢ add all classifiers (匹,人,台,個,枚,本, . . . )

Words, Lexicons and Ontologies 73

Page 75: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Some very concrete Examples

➣ Redefining the dictionary (by Erin McKean; TED Talk 2007) (http://blog.ted.com/2007/08/30/redefining_the/)

➣ Building a simple NLP Lexicon from an MRD

➣ Building a new bilingual dictionary

Words, Lexicons and Ontologies 74

Page 76: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Case Study: NLP Lexicon from MRD

➣ Build an ontology of relations between word senses

➢ Information comes from machine readable dictionary

driver2: somebody who drives a car

➣ Extract genus term by parsing the definition sentence

➢ headword wh ⊂ genus term wg

〈HYPERNYM, somebody, driver2〉

➣ Evaluate by comparing to Goi-Taikei

Nichols et al. (2005) 75

Page 77: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

A Sample Entry: Driver 1

Index ドライバー doraiba-

POS noun

Familiarity 6.5 [1–7]

Sense 1

Lexical-type noun-lex

Definition

S1 ねじ/まわし/。

screw turn (screwdriver)

S1′ ねじ/を/差し入れ/たり/ 、

/抜き取っ/た/する/道具/。

a tool for inserting and removing screws .

Hypernym 道具1 equipment “tool”

Sem. Class 〈942:tool〉 (⊂ 893:equipment)

Words, Lexicons and Ontologies 76

Page 78: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

A Sample Entry: Driver 2,3

Sense 2

Definition

[

S1 自動車/を/運転/する/人/。

Someone who drives a car

]

Hypernym 人1 hito “person”

Sem. Class 〈292:driver〉 (⊂ 5:person)

Sense 3

Definition

S1 ゴルフ/で/、/遠/距離/用/の/クラブ/。

In golf, a long-distance club.

S2 一番/ウッド/。/

A number one wood .

Hypernym クラブ2 kurabu “club”

Sem. Class (〈921:leisure equipment〉 (⊂ 921))

Domain ゴルフ1 gorufu “golf”

Words, Lexicons and Ontologies 77

Page 79: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Parse Results for Driver 2 (MRS)

〈h, x1{h : prpstn rel(h1)h1 : hito(x1)h2 : jidosha(x2)h3 : unten(u1, x1, x2)}〉

〈h, x1{h : prpstn rel(h0)h0 : person(x1)h1 : some(x1, h0, h4)h2 : car(x2)h3 : drive(u1, x1, x2)}〉

「自動車を運転する人」 somebody who drives a carMinimal Recursion Semantics (simplified)

➣ Generally language independent

➣ Genus term is normally the highest scoping word (x1):doraiba2 ⊂ hito(x1) or driver2 ⊂ person(x1)

Words, Lexicons and Ontologies 78

Page 80: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Extracting more from the MRS

ア:a:a:

アルプス

arupusualps

、,,

または

matawaor

日本アルプスnihon-arupusujapan alps

noADN

ryakuabbreviation

a: an abbreviation for the Alps or the Japanese Alps

➣ Sometimes highest scoping word is an explicit relatione.g. 〈abbreviation, kind, name, general term〉

➣ Sometimes there is coordination

➢ 〈ABBREVIATION, ア “a”,アルプス “Alps”〉➢ 〈ABBREVIATION, ア “a”,日本アルプス “Japanese alps”〉

Words, Lexicons and Ontologies 79

Page 81: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Example of class condiment

トマトケチャップ1

tomato ketchupホワイトソース1

white sauceミートソース1

meat sauceソース2

sauceトマトソース1

tomato sauceケチャップ1

ketchup調味料1

condiment塩1

saltカレー粉1

curry powderカレー1

curry香辛料1

spiceスパイス1

spice

You can only extract what is in the dictionary definitions

Words, Lexicons and Ontologies 80

Page 82: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Case Study: Transfer Lexicons

➣ I (or my system) speaks language S

➣ I (or my system) want to understand language T

Q: What do I do if I have no S ⇔ T lexicon?

A: Look up S ⇔ I ⇔ T

➢ How can I do this accurately, especially if I don’t understand T?

Bond & Ogura (2007) 81

Page 83: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Example

S I T

markanjing laut

印 in seal mohorstamp tera

Figure 1: Matching through I

Words, Lexicons and Ontologies 82

Page 84: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Our Specific Problem

➣ Make a bilingual lexicon Japanese → Malay lexiconby crossing J → E and M → E

➢ Largest existing lexicons ≈ 7, 000 words

➣ The resulting lexicon will be used by a Ja-Ms MT systemThe lexicon should have:

➢ Appropriate Translation Equivalents∗ Especially the first one

➢ Parts of Speech➢ Semantic Classes (Semantic Transfer System)

Words, Lexicons and Ontologies 83

Page 85: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Two Kinds of Sense Distinctions

➣ Homonyms (Must disambiguate)

➢ Clearly different meanings (different Semantic Classes)seal ⇔あざらし azarashi 〈animal〉 vs seal ⇔印 in 〈tool〉

➢ Distinguish using semantic classes

➣ Variants (near synonyms) (Should disambiguate)

➢ Finer grained differences (same Semantic Classes)鳩 hato → doves or pigeons

➢ Distinguish using domains, collocations, n-grams, . . .➢ As a fall-back, use ranked preferences鳩 hato → (1) pigeon; (2) dove

Words, Lexicons and Ontologies 84

Page 86: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Shared Translations

S I T

mark印 in seal mohor

stampimprint teragauge

anjing lautmohor 0.4 = 2

3+2

tera 0.286 = 2

3+4

anjing laut 0.25 = 1

3+1

Words, Lexicons and Ontologies 85

Page 87: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Our Approach

➣ Lexicons

➢ Japanese-English Lexicon (Goi-Taikei)➢ Malay-English-Chinese Lexicon (KAMI)➢ Japanese-Chinese Lexicon (Ri-Zhong Cidian)

➣ Scoring

➢ Syntactic matching (POS)➢ Shared Translations➢ Semantic matching (Semantic Classes)➢ Second-language matching (Chinese)

➣ Finally hand checking

Words, Lexicons and Ontologies 86

Page 88: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Japanese-English lexicon

➣ 380,000 Japanese-English word pairs

➣ 3,000 semantic categories (human, inanimate, etc.)

➢ 2,710 common-noun classes➢ 200 proper-noun classes➢ 108 verb, event, and state classes

➣ Subcat frames and selectional restrictions for 15,000 verb senses

➣ Available as a book (five volumes) or CD-ROM

➢ Goi-Taikei — A Japanese Lexicon

Words, Lexicons and Ontologies 87

Page 89: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Semantic Transfer Dictionary

Japanese あざらし (azarashi)

English seal

POS noun

Sem Classes 〈animal〉

Rank 1

➣ In the noun dictionary:

➢ 63,926 Japanese index words➢ 71,818 Japanese-English pairs➢ 49,205 different English entries

∗ 90% with 1 translation, 6.5% with 2, 2% with 3➢ average number of translations is 1.12

Words, Lexicons and Ontologies 88

Page 90: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

The Malay-English lexicon: KAMI

Malay anjing lautPOS 〈noun〉

Classifier ekor (27%)Sem Classes 〈animal〉 (30%)English seal

Chinese 海豹 (hai3 bao4) (25%)

➢ 67,658 Malay index words➢ 91,426 Malay-English word pairs

∗ 79% with 1 translation, 14% with 2 4% with 3➢ average number of translations is 1.35

Words, Lexicons and Ontologies 89

Page 91: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Adding Semantic Classes to KAMI

1. Original syntactic-semantic codes → Goi-Taikei semantic classes (10,000)e.g. noun.city → city

2. CICC Indonesian dictionary classes (found for 14,784 entries)mapped to Goi-Taikei semantic classes (hand-made partial mapping)

Words, Lexicons and Ontologies 90

Page 92: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Adding Semantic Classes to KAMI

3. Malay numeral classifiers (found for 18,303 nouns)mapped onto Goi-Taikei semantic classes (hand-made partial mapping)e.g. ekor → animal; orang → human

4. Known word lists (ISO 639 languages; ISO 4217 currencies)ISO 4217 entry → currency

5. Manual addition

Words, Lexicons and Ontologies 91

Page 93: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Ja-Cn lexicon: Ri-Zhong Cidian

➣ Example:

Japanese あざらし (azarashi)

Japanese Kanji 海豹

Chinese Hanzi 海豹

Pronunciation hai bao

➢ 83,000 Japanese-Chinese word pairs

Words, Lexicons and Ontologies 92

Page 94: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Crossing

➣ For each pair in the Japanese-English lexicon

➢ Look up the Malay equivalent of the Englishif an entry with the same coarse POS exists∗ create a Japanese-Malay pair∗ store the intermediate English∗ Calculate scores

· shared translations· semantic matching· second-language matching

➢ else mark the Japanese-English pair

➣ For each Japanese index: rank Ja-Ms then Ja-En

Words, Lexicons and Ontologies 93

Page 95: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Example

S I T

mark印 in seal mohorstationary

stamp tera toolimprintgauge

anjing laut

Figure 2: Matching through I and Sem

Words, Lexicons and Ontologies 94

Page 96: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Calculating the Scores

➣ Shared Translations for pair J and M ,where E(W ) is the set of English translations of W :

shared translation score =|E(J) ∩ E(M)|

|E(J)|+ |E(M)|(1)

➣ Semantic matching score for word pair J and M ,number of semantic classes of J which subsume or are subsumed by asemantic class of M

➣ Total score = 10× semantic score + shared translation score − rank

Words, Lexicons and Ontologies 95

Page 97: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Results

➣ Crossed the Japanese-English common-nouns with the Malay-Englishnouns

➢ 22,658 out of 63,926 Japanese words linked➢ 16,974 out of 67,658 Malay words➢ 75,872 Japanese-Malay pairs➢ Average number of translations was 3.4

➣ Tested 65 randomly selected linked Japanese index words (232translations)

Words, Lexicons and Ontologies 96

Page 98: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Results (2)

0

10

20

30

40

50

40.1

Good

25

OK

12.1

Error

22.8

Bad0

10

20

30

40

5046.2

Good

33.8

OK

9.2

Error

10.8

Bad

All Pairs First Ranked Pair

Words, Lexicons and Ontologies 97

Page 99: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Matching through two languages

Japanese English Chinese Malaymark 印章

印 seal terastampimprint mohorgauge

anjing lautFigure 3: Matching through two languages

Words, Lexicons and Ontologies 98

Page 100: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Results (two pivots)

➣ 5,238 pairs matched both English and Chinese

➢ 97% good translations➢ 8.1% of the original Japanese index words➢ 1 in 4 matched Japanese index words

➣ High Precision/Low Recall

Words, Lexicons and Ontologies 99

Page 101: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Further Work

➣ Use a British/American spelling filter

➢ Japanese-English dictionary uses American spelling➢ Malay-English dictionary uses British spelling➢ armor/armour don’t match

➣ Lemmatize more before matching

➢ expecially singular/plural

➣ Use an English thesaurus to increase matchesSanfilippo and Steinberger (1997)

Words, Lexicons and Ontologies 100

Page 102: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Conclusions - building a bilingual lexicon

➣ Number of shared translations tells you something

➣ Semantic classes are even more useful in linking bilingual dictionaries

➢ word pairs with matching semantic classes are better translations

➣ Using two (or more) pivot languages gives even higher accuracy

➣ More information gives higher precision

➢ Link through a pivot language (≈ 65% precision)➢ Add in semantic links (≈ 80% precision)➢ Link through two pivot languages (≈ 97% precision)

Words, Lexicons and Ontologies 101

Page 103: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

The Secret of Lexical Acquistion

➣ For a given word wu find the most similar known word wk and describe itin the same way

➣ Similarity can be

➢ Distributional➢ Translation Equivalence➢ Semantic Class➢ Burstiness (appears on the same date)➢ Sub-morpheme (same character)

Words, Lexicons and Ontologies 102

Page 104: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

How to build Resources?

➣ Bootstrap ontologies from MRDs

1. Parse definitions to find the genus2. Take it as hypernym or parse further if it is relational

abbreviation [of x], nickname [for x], kind [of x], polite form [of x], . . .

➣ Bootstrap bilingual dictionaries from other bilingual dictionaries

➢ Link through a pivot language (≈ 65% precision)➢ Add in semantic links (≈ 80% precision)➢ Link through two pivot languages (≈ 97% precision)

➣ Find people to build it

➢ Wiktionary, lexicographers, fans, . . .

bootstrap – help oneself, often through improvised means 103

Page 105: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

*References

Bond, Francis & Kentaro Ogura. 2007. Combining linguistic resources to create a machine-tractable Japanese-Malay dictionary. Language Resources and Evaluation 42(2). 127–136. URL http://dx.doi.org/10.1007/s10579-007-9038-4. (Special issue onAsian language technology).

Copestake, Ann. 1992. The representation of lexical semantic information. Brighton:University of Sussex dissertation.

Fellbaum, Christine (ed.). 1998. WordNet: An electronic lexical database. MIT Press.

Gruber, Thomas R. 1993. A translation approach to portable ontology specifications.Knowledge Acquisition 5(2). 199–200.

Words, Lexicons and Ontologies 104

Page 106: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

Isahara, Hitoshi, Francis Bond, Kiyotaka Uchimoto, Masao Utiyama & Kyoko Kanzaki. 2008.Development of the Japanese WordNet. In Sixth international conference on languageresources and evaluation (lrec 2008), Marrakech.

Mohamed Noor, Nurril Hirfana, Suerya Sapuan & Francis Bond. 2011. Creating the openWordnet Bahasa. In Proceedings of the 25th pacific asia conference on language,information and computation (paclic 25), 258–267. Singapore.

Nichols, Eric, Francis Bond & Daniel Flickinger. 2005. Robust ontology acquisition frommachine-readable dictionaries. In Proceedings of the international joint conference onartificial intelligence ijcai-2005, 1111–1116. Edinburgh.

Niles, Ian & Adam Pease. 2001. Towards a standard upper ontology. In Chris Welty &Barry Smith (eds.), Proceedings of the 2nd international conference on formal ontology ininformation systems (fois-2001), Maine.

Pease, Adam. 2006. Formal representation of concepts: The suggested upper merged

Words, Lexicons and Ontologies 105

Page 107: Lecture 4: Words, Lexicons and Ontologiescompling.hss.ntu.edu.sg/courses/hg8003/pdf/wk-04.pdf · Words, Lexicons and Ontologies ... 9 03-27 Analysis, Tagging, Parsing and Generation

ontology and its use in linguistics. In Andrea C Schalley & D. Zaefferer (eds.),Ontolinguistics. how ontological status shapes the linguistic coding of concepts, Moutonde Gruyter. URL http://www.adampease.org/Articulate/publications/Ontolinguist%ics04.pdf.

Wang, Shan & Francis Bond. 2013. Building the Chinese Open Wordnet (COW): Startingfrom core synsets. In Proceedings of the 11th workshop on asian language resources, aworkshop at ijcnlp-2013, 10–18. Nagoya.

Words, Lexicons and Ontologies 106


Recommended