Post on 12-Apr-2019
transcript
Distributionelle Semantik
Vorlesung “Computerlinguistische Techniken”
Alexander Koller
12. Januar 2016
Welt- und Wortwissen
• Semantische Inferenzen brauchen formalisiertes Wissen über die Welt und über Wortbedeutungen.
Which genetically caused connective tissue disorder has severe symptoms and complications regarding the aorta and skeletal features, and, very characteristically, ophthalmologic subluxation?
Marfan's is created by a defect of the gene that determines the structure of Fibrillin-11. One of the symptoms is displacement of one or both of the eyes' lenses. The most serious complications affect the cardiovascular system, especially heart valves and the aorta.
Der Wissens-Bottleneck
• Bedeutung von formalisiertem Wissen für CL-Anwendungen seit Jahrzehnten akzeptiert. ‣ z.B. Bar-Hillel 1960: Übersetzung von “the box is in the pen”?
• Breite Formalisierung impraktikabel. ‣ immerhin z.B. Cyc: mehrere Millionen Fakten
‣ Weltwissen sehr umfangreich
‣ Prädikatenlogik geeigneter Formalismus?
• Aktuelle Perspektive: lexikalisches Wissen von Hand oder automatisch formalisieren.
Query Expansion
hiernach gesucht
das gefunden
Lexikalische Semantik
He's not pining! He's passed on! This parrot is no more! He has ceased to be! He's expired and gone to meet his maker! He's a stiff ! Bereft of life, he rests in peace! His metabolic processes are now history! He's off the twig! He's kicked the bucket, he's shuffled off his mortal coil, run down the curtain and joined the bleedin' choir invisible!! THIS IS AN EX-PARROT!!
Relationen zwischen Bedeutung von Wörtern: z.B. Synonymie
Semantische Relationen
• Lexikalische Semantik beschreibt mögliche semantische Relationen zwischen Wörtern: ‣ Synonymie: Wörter bedeuten das gleiche.
Apfelsine/Orange; Bildschirm/Monitor; etc.
‣ Hyponymie: Ein Wort ist Oberbegriff des anderen. Auto/Fahrzeug; Blume/Pflanze; etc.
‣ Antonymie: Wörter beschreiben das Gegenteil. gewinnen/verlieren; heiß/kalt; etc.
WordNetentity
physical object
artifact
structure
building complex
plant#1,works,
industrial plant
living thing
organism
plant#2,flora,
plant life
= Hyponymiegleicher Knoten = Synonymiehttp://wordnet.princeton.edu/
Lexikalische Ambiguitäten
• Polysemie: Wort hat zwei verschiedene Bedeutungen, die miteinander verwandt sind. ‣ Schule #1: Institution, in der Schüler lernen
‣ Schule #2: Gebäude, in dem Schule #1 arbeitet
• Homonymie: Wort hat zwei verschiedene Bedeutungen, die nicht verwandt sind. ‣ Bank #1: Geldinstitut
‣ Bank #2: Sitzgelegenheit
Word sense disambiguation
• Word sense disambiguation ist das Problem, jedes Wort-Token mit seinem Wortsinn zu taggen.
• Accuracy von WSD hängt vom Bedeutungs-Inventar ab. Stand der Kunst: 90% auf grobkörnigen Senses.
• Typische Techniken machen überwachtes Training auf kleineren Datenmengen und erweitern Modell mit unüberwachten Methoden.
Problem
• Handgeschriebene Thesauri sind viel zu klein. ‣ Englisches Wordnet: 117.000 Synsets
‣ GermaNet: 85.000 Synsets
• Anzahl von englischen Wörtern im englischen Google-n-Gramm-Korpus > 1 Million.
• Damit lösen wir das Query-Expansion-Problem nicht.
• Semantische Relationen automatisch lernen?
Experiment 1
(nach Folien von Katrin Erk)
Experiment 1
Doc 1: Guantanamo USA Verschlüsselung Yahoo Enthüllungen rechtsstaatliche
(nach Folien von Katrin Erk)
Experiment 1
Doc 1: Guantanamo USA Verschlüsselung Yahoo Enthüllungen rechtsstaatliche
(nach Folien von Katrin Erk)
spiegel.de zu PRISM
Experiment 1
Doc 1: Guantanamo USA Verschlüsselung Yahoo Enthüllungen rechtsstaatliche
Doc 2: Raumschiff Macht Imperator Todesstern Vater
(nach Folien von Katrin Erk)
spiegel.de zu PRISM
Experiment 1
Doc 1: Guantanamo USA Verschlüsselung Yahoo Enthüllungen rechtsstaatliche
Doc 2: Raumschiff Macht Imperator Todesstern Vater
(nach Folien von Katrin Erk)
Wikipedia zu “Star Wars”spiegel.de zu PRISM
Experiment 1
Doc 1: Guantanamo USA Verschlüsselung Yahoo Enthüllungen rechtsstaatliche
Doc 2: Raumschiff Macht Imperator Todesstern Vater
Doc 3: kontext-freie Algorithmus dynamische Tabelle Chomsky-Normalform
(nach Folien von Katrin Erk)
Wikipedia zu “Star Wars”spiegel.de zu PRISM
Experiment 1
Doc 1: Guantanamo USA Verschlüsselung Yahoo Enthüllungen rechtsstaatliche
Doc 2: Raumschiff Macht Imperator Todesstern Vater
Doc 3: kontext-freie Algorithmus dynamische Tabelle Chomsky-Normalform
(nach Folien von Katrin Erk)
Wikipedia zu “Star Wars”
Wikipedia zum CKY-Parser
spiegel.de zu PRISM
Experiment 1
Doc 1: Guantanamo USA Verschlüsselung Yahoo Enthüllungen rechtsstaatliche
Doc 2: Raumschiff Macht Imperator Todesstern Vater
Doc 3: kontext-freie Algorithmus dynamische Tabelle Chomsky-Normalform
Doc 4: Erntebemühungen Anbaufläche Sie Gurken Pflänzchen Zentimeter
(nach Folien von Katrin Erk)
Wikipedia zu “Star Wars”
Wikipedia zum CKY-Parser
spiegel.de zu PRISM
Experiment 1
Doc 1: Guantanamo USA Verschlüsselung Yahoo Enthüllungen rechtsstaatliche
Doc 2: Raumschiff Macht Imperator Todesstern Vater
Doc 3: kontext-freie Algorithmus dynamische Tabelle Chomsky-Normalform
Doc 4: Erntebemühungen Anbaufläche Sie Gurken Pflänzchen Zentimeter
(nach Folien von Katrin Erk)
Wikipedia zu “Star Wars”
Wikipedia zum CKY-Parser
www.gartenbau.org
spiegel.de zu PRISM
Experiment 2
(Stefan Evert, Tutorial bei NAACL 2010)
Experiment 2
• Was ist “bardiwac”? Im Korpus finden Sie:
(Stefan Evert, Tutorial bei NAACL 2010)
Experiment 2
• Was ist “bardiwac”? Im Korpus finden Sie:‣ He handed her a glass of bardiwac.
(Stefan Evert, Tutorial bei NAACL 2010)
Experiment 2
• Was ist “bardiwac”? Im Korpus finden Sie:‣ He handed her a glass of bardiwac.
‣ Nigel staggered to his feet, face flushed from too much bardiwac.
(Stefan Evert, Tutorial bei NAACL 2010)
Experiment 2
• Was ist “bardiwac”? Im Korpus finden Sie:‣ He handed her a glass of bardiwac.
‣ Nigel staggered to his feet, face flushed from too much bardiwac.
‣ Malbec, one of the lesser-known bardiwac grapes, responds well to Australia’s sunshine.
(Stefan Evert, Tutorial bei NAACL 2010)
Experiment 2
• Was ist “bardiwac”? Im Korpus finden Sie:‣ He handed her a glass of bardiwac.
‣ Nigel staggered to his feet, face flushed from too much bardiwac.
‣ Malbec, one of the lesser-known bardiwac grapes, responds well to Australia’s sunshine.
‣ The drinks were delicious: blood-red bardiwac as well as light, sweet Rhenish.
(Stefan Evert, Tutorial bei NAACL 2010)
Experiment 2
• Was ist “bardiwac”? Im Korpus finden Sie:‣ He handed her a glass of bardiwac.
‣ Nigel staggered to his feet, face flushed from too much bardiwac.
‣ Malbec, one of the lesser-known bardiwac grapes, responds well to Australia’s sunshine.
‣ The drinks were delicious: blood-red bardiwac as well as light, sweet Rhenish.
(Stefan Evert, Tutorial bei NAACL 2010)
→ Bardiwac ist ein Rotwein.
Distributionelle Semantik
• Ansatz, um semantische Ähnlichkeit von Wörtern aus unannotierten Daten zu lernen. ‣ Ähnlichkeit als Approximation von Synonymie
‣ Lexikon kann automatisch beliebig groß werden
• Bedeutung eines Worts ≈ Verteilung der anderen Wörter, die mit ihm zusammen auftreten.
• Grundidee aus den 1950ern (Harris 1951):“You shall know a word by the company it keeps.” (Zitat ist von Firth)
Kookkurrenz
• Was bedeutet es, dass zwei Wörtern “zusammen auftreten”?
• Einfachster Ansatz: Zähle im Korpus ab, wie oft Wort w1 in k-Wort-Fenster um Wort w2 auftritt.
see who can grow the biggest flower. Can we buy some fibre, pleaseAbu Dhabi grow like a hot-house flower, but decided themselves to follow the
as a physical level. The Bach Flower Remedies are prepared from non-poisonous wilda seed from which a strong tree will grow. This is the finest
(k = 6, British National Corpus)
Kookkurrenz
factory
flow
er
tree
plant
water
fork
grow 15 147 330 517 106 3garden 5 200 198 316 118 17worker 279 0 5 84 18 0production 102 6 9 130 28 0wild 3 216 35 96 30 0
Figure 108.4: Some co-occurrence vectors from the British National Corpus.
factory
flower
tree
plant
Figure 108.5: Graphical illustration of co-occurrence vectors.
through counts of context words occurring in the neighborhood of targetword instances. Take, as in the WSD example above, the n (e.g., 2000)most frequent content words in a corpus as the set of relevant context words;then count, for each word w, how often each of these context words occurredin a context window of n before or after each occurrence of w. Fig. 108.4shows the co-occurrence counts for a number of target words (columns),and a selection of context words (rows) obtained from a 10% portion of theBritish National Corpus (Clear 1993).
The resulting frequency pattern encodes information about the meaningof w. According to the Distributional Hypothesis, we can model the semanticsimilarity between two words by computing the similarity between their co-occurrences with the context words. In the example of Fig. 108.4, the targetflower co-occurs frequently with the context words grow and garden, andinfrequently with production and worker. The target word tree has a similardistribution, but the target factory shows the opposite co-occurrence patternwith these four context words. This is evidence that trees and flowers aremore similar to each other than to factories.
Technically, we represent each word w as a vector in a high-dimensional
23
Kookkurrenz-Matrix für BNC, aus Koller & Pinkal 12
see who can grow the biggest flower. Can we buy some fibre, pleaseAbu Dhabi grow like a hot-house flower, but decided themselves to follow the
as a physical level. The Bach Flower Remedies are prepared from non-poisonous wilda seed from which a strong tree will grow. This is the finest
(Kon
text
-Wör
ter)
Vektorraum-Modell
factory
flow
er
tree
plant
water
fork
grow 15 147 330 517 106 3garden 5 200 198 316 118 17worker 279 0 5 84 18 0production 102 6 9 130 28 0wild 3 216 35 96 30 0
Figure 108.4: Some co-occurrence vectors from the British National Corpus.
factory
flower
tree
plant
Figure 108.5: Graphical illustration of co-occurrence vectors.
through counts of context words occurring in the neighborhood of targetword instances. Take, as in the WSD example above, the n (e.g., 2000)most frequent content words in a corpus as the set of relevant context words;then count, for each word w, how often each of these context words occurredin a context window of n before or after each occurrence of w. Fig. 108.4shows the co-occurrence counts for a number of target words (columns),and a selection of context words (rows) obtained from a 10% portion of theBritish National Corpus (Clear 1993).
The resulting frequency pattern encodes information about the meaningof w. According to the Distributional Hypothesis, we can model the semanticsimilarity between two words by computing the similarity between their co-occurrences with the context words. In the example of Fig. 108.4, the targetflower co-occurs frequently with the context words grow and garden, andinfrequently with production and worker. The target word tree has a similardistribution, but the target factory shows the opposite co-occurrence patternwith these four context words. This is evidence that trees and flowers aremore similar to each other than to factories.
Technically, we represent each word w as a vector in a high-dimensional
23
factory
flow
er
tree
plant
water
fork
grow 15 147 330 517 106 3garden 5 200 198 316 118 17worker 279 0 5 84 18 0production 102 6 9 130 28 0wild 3 216 35 96 30 0
Figure 108.4: Some co-occurrence vectors from the British National Corpus.
factory
flower
tree
plant
Figure 108.5: Graphical illustration of co-occurrence vectors.
through counts of context words occurring in the neighborhood of targetword instances. Take, as in the WSD example above, the n (e.g., 2000)most frequent content words in a corpus as the set of relevant context words;then count, for each word w, how often each of these context words occurredin a context window of n before or after each occurrence of w. Fig. 108.4shows the co-occurrence counts for a number of target words (columns),and a selection of context words (rows) obtained from a 10% portion of theBritish National Corpus (Clear 1993).
The resulting frequency pattern encodes information about the meaningof w. According to the Distributional Hypothesis, we can model the semanticsimilarity between two words by computing the similarity between their co-occurrences with the context words. In the example of Fig. 108.4, the targetflower co-occurs frequently with the context words grow and garden, andinfrequently with production and worker. The target word tree has a similardistribution, but the target factory shows the opposite co-occurrence patternwith these four context words. This is evidence that trees and flowers aremore similar to each other than to factories.
Technically, we represent each word w as a vector in a high-dimensional
23
Vektoren inhochdimensionalem
Vektorraum
1 Dimension pro Kontextwort (hier: 6 Dimensionen)
Bild vereinfacht zu 2 Dimensionen,ist nur schematisch.
Ähnlichkeit
• Aus Vektorraum-Modell kann man jetzt Ähnlichkeit zwischen Wörtern ableiten.
• 1. Versuch:ähnlich = euklidische Distanz ist klein
factory
flow
er
tree
plant
water
fork
grow 15 147 330 517 106 3garden 5 200 198 316 118 17worker 279 0 5 84 18 0production 102 6 9 130 28 0wild 3 216 35 96 30 0
Figure 108.4: Some co-occurrence vectors from the British National Corpus.
factory
flower
tree
plant
Figure 108.5: Graphical illustration of co-occurrence vectors.
through counts of context words occurring in the neighborhood of targetword instances. Take, as in the WSD example above, the n (e.g., 2000)most frequent content words in a corpus as the set of relevant context words;then count, for each word w, how often each of these context words occurredin a context window of n before or after each occurrence of w. Fig. 108.4shows the co-occurrence counts for a number of target words (columns),and a selection of context words (rows) obtained from a 10% portion of theBritish National Corpus (Clear 1993).
The resulting frequency pattern encodes information about the meaningof w. According to the Distributional Hypothesis, we can model the semanticsimilarity between two words by computing the similarity between their co-occurrences with the context words. In the example of Fig. 108.4, the targetflower co-occurs frequently with the context words grow and garden, andinfrequently with production and worker. The target word tree has a similardistribution, but the target factory shows the opposite co-occurrence patternwith these four context words. This is evidence that trees and flowers aremore similar to each other than to factories.
Technically, we represent each word w as a vector in a high-dimensional
23
dist(~v, ~w) =
vuutnX
i=1
(vi � wi)2
nicht besonderssinnvoll
Kosinus-Ähnlichkeit
• 2. Versuch: ähnlich = Winkel ist klein. ‣ ignoriert Länge von Vektoren = absolute Worthäufigkeiten
(das ist gut)
‣ Kontextwörter kommen proportional ähnlich oft vor
• Leicht zu berechnen ist Kosinus des Winkels: ‣ cos = 1 heißt Winkel = 0°, d.h. sehr ähnlich
‣ cos = 0 heißt Winkel = 90°, d.h. sehr unähnlich
factory
flow
er
tree
plant
water
fork
grow 15 147 330 517 106 3garden 5 200 198 316 118 17worker 279 0 5 84 18 0production 102 6 9 130 28 0wild 3 216 35 96 30 0
Figure 108.4: Some co-occurrence vectors from the British National Corpus.
factory
flower
tree
plant
Figure 108.5: Graphical illustration of co-occurrence vectors.
through counts of context words occurring in the neighborhood of targetword instances. Take, as in the WSD example above, the n (e.g., 2000)most frequent content words in a corpus as the set of relevant context words;then count, for each word w, how often each of these context words occurredin a context window of n before or after each occurrence of w. Fig. 108.4shows the co-occurrence counts for a number of target words (columns),and a selection of context words (rows) obtained from a 10% portion of theBritish National Corpus (Clear 1993).
The resulting frequency pattern encodes information about the meaningof w. According to the Distributional Hypothesis, we can model the semanticsimilarity between two words by computing the similarity between their co-occurrences with the context words. In the example of Fig. 108.4, the targetflower co-occurs frequently with the context words grow and garden, andinfrequently with production and worker. The target word tree has a similardistribution, but the target factory shows the opposite co-occurrence patternwith these four context words. This is evidence that trees and flowers aremore similar to each other than to factories.
Technically, we represent each word w as a vector in a high-dimensional
23
cos(~v, ~w) =
Pni=1 vi · wipPn
i=1 v2i ·
pPni=1 w
2i
cos(tree, flower) = 0.75, i.e. 40° cos(tree, factory) = 0.05, i.e. 85°
Was haben wir erreicht?
• Maß für semantische Ähnlichkeit ‣ Kookkurrenzmatrix für alle Wortpaare aus unannotertem
Text berechnen.
‣ Auf dieser Grundlage Ähnlichkeitsmaß, z.B. Kosinus.
‣ Für beliebig große Textmengen leicht zu berechnen.
• Mögliche Erweiterungen: ‣ Komplexere Features und Feature-Gewichte
‣ Dimensionsreduktion
‣ Kompositionalität
Uninformative Dimensionen
• Nicht alle Kontextwörter gleich informativ. ‣ Kookkurrenz mit “grow” vs. mit “the”
• Einfachster Ansatz: Bestimmte häufige Wörter von Hand angeben und bei der Berechnung von Ähnlichkeit ignorieren. ‣ Solche Wörter heißen im Information Retrieval
“Stop-Wörter”.
• Allgemeiner: Gewichtung von Dimensionen automatisch lernen.
Komplexere Features
• Kookkurrenz von Wörtern überschätzt “gemeinsames Auftreten”.
• Lösungsansatz: Komplexere Features, die syntaktische Relationen zwischen Wörtern mit erfassen (Lin 98). ‣ zähle nicht mehr: tritt “flower” in Fenster von
7 Wörtern um “Abu Dhabi” auf?
‣ sondern: tritt “flower” als Subjekt von“grow” auf?
the Qataris had watched Abu Dhabi grow like a hot-house flower, but decided
Introduction The distributional hypothesis
Geometric interpretation
I row vector xdogdescribes usage ofword dog in thecorpus
I can be seen ascoordinates of pointin n-dimensionalEuclidean space
I illustrated for twodimensions:get and use
Ixdog = (115, 10) ●
●
●
●
0 20 40 60 80 100 120
020
4060
80100
120
Two dimensions of English V−Obj DSM
getuse
catdog
knife
boat
© Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 11 / 107
(get, obj)(u
se, o
bj)
Ergebnis
Introduction The distributional hypothesis
Semantic distances
I main result of distributionalanalysis are “semantic”distances between words
I typical applicationsI nearest neighboursI clustering of related wordsI construct semantic map
pota
toon
ion
cat
bana
nach
icke
nm
ushr
oom
corn
dog
pear
cher
ryle
ttuce
peng
uin
swan
eagl
eow
ldu
ckel
epha
nt pig
cow
lion
helic
opte
rpe
acoc
ktu
rtle car
pine
appl
ebo
atro
cket
truck
mot
orcy
cle
snai
lsh
ipch
isel
scis
sors
scre
wdr
iver
penc
ilha
mm
erte
leph
one
knife
spoo
npe
nke
ttle
bottl
ecu
pbo
wl0.
00.
20.
40.
60.
81.
01.
2
Word space clustering of concrete nouns (V−Obj from BNC)
Clu
ster
siz
e
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
−0.4 −0.2 0.0 0.2 0.4 0.6 0.8
−0.4
−0.2
0.0
0.2
0.4
0.6
Semantic map (V−Obj from BNC)
●
●
●
●
●
●
birdgroundAnimalfruitTreegreentoolvehicle
chicken
eagle duck
swanowl
penguinpeacock
dog
elephantcow
cat
lionpig
snail
turtle
cherry
banana
pearpineapple
mushroom
corn
lettuce
potatoonion
bottle
pencil
pen
cup
bowl
scissors
kettle
knife
screwdriver
hammer
spoon
chisel
telephoneboat carship
truck
rocket
motorcycle
helicopter
© Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 15 / 107
(Evert, NAACL Tutorial 2010)
Ergebnisse
(Ergebnisse von Lin 98, aus J&M)
hope (N): optimism 0.141, chance 0.137, expectation 0.136, prospect 0.126, dream 0.119, desire 0.118, fear 0.116, effort 0.111, confidence 0.109, promise 0.108
hope(V): would like 0.158, wish 0.140, plan 0.139, say 0.137, believe 0.135, think 0.133, agree 0.130, wonder 0.130, try 0.127, decide 0.125
brief (N): legal brief 0.139, affidavit 0.103, filing 0.098, petition 0.086, document 0.083, argument 0.083, letter 0.079, rebuttal 0.078, memo 0.077, article 0.076
brief (A): lengthy 0.256, hour-long 0.191, short 0.173, extended 0.163, frequent 0.162, recent 0.158, short-lived 0.155, prolonged 0.149, week-long 0.149, occasional 0.146
Probleme
• Ähnlichkeit = Synonymie? ‣ Synonyme sind distributionell sehr ähnlich.
‣ Aber auch Antonyme und (in geringerem Maß)Hyponyme distributionell sehr ähnlich.
• Distributionelle Ähnlichkeit ist nicht referentielle Ähnlichkeit. Erkennung von Antonymen notorisch schweres Problem.
brief (A): lengthy 0.256, hour-long 0.191, short 0.173, extended 0.163, frequent 0.162, recent 0.158, short-lived 0.155, prolonged 0.149, week-long 0.149, occasional 0.146
Kompositionelle distrib Semantik
• Aktueller Trend: kompositionelle Berechnung von größeren Phrasen aus distributionellen Repräsentationen von Wörtern.
• Z.B. Mitchell & Lapata 08: berechne Kookk-Vektor für Phrase durch Addition der Wortvektoren.
• Erscheint linguistisch zweifelhaft, korreliert aber mit menschlichen Bewertungen von Ähnlichkeit.
Kompositionelle distrib Semantik
• Baroni & Zamparelli (2010): “Nouns are vectors, adjectives are matrices” (= Funktionen). ‣ lernt Matrizen für Adjektive, so dass A* N den Kookk-
Vektor von “A N” approximiert (für alle N)
• Cf. Anwendung von Adjektiven auf Nomen in Montague-Grammatik.
related to the definition of the adjective (mental ac-tivity, historical event, green colour, quick and littlecost for easy N), and so on.
American N black N easy NAm. representative black face easy startAm. territory black hand quickAm. source black (n) little costgreen N historical N mental Ngreen (n) historical mental activityred road hist. event mental experiencegreen colour hist. content mental energynecessary N nice N young Nnecessary nice youthfulnecessary degree good bit young doctorsufficient nice break young staff
Table 1: Nearest 3 neighbors of centroids of ANs thatshare the same adjective.
How about the neighbors of specific ANs? Ta-ble 2 reports the nearest 3 neighbors of 9 randomlyselected ANs involving different adjectives (we in-spected a larger random set, coming to similar con-clusions to the ones emerging from this table).
bad electronic historicalluck communication mapbad elec. storage topographicalbad weekend elec. transmission atlasgood spirit purpose hist. materialimportant route nice girl little warimportant transport good girl great warimportant road big girl major warmajor road guy small warred cover special collection young husbandblack cover general collection small sonhardback small collection small daughterred label archives mistress
Table 2: Nearest 3 neighbors of specific ANs.
The nearest neighbors of the corpus-based ANvectors in Table 2 make in general intuitive sense.Importantly, the neighbors pick up the compositemeaning rather than that of the adjective or nounalone. For example, cover is an ambiguous word,but the hardback neighbor relates to its “front of abook” meaning that is the most natural one in com-bination with red. Similarly, it makes more sensethat a young husband (rather than an old one) wouldhave small sons and daughters (not to mention the
mistress!).We realize that the evidence presented here is
of a very preliminary and intuitive nature. Indeed,we will argue in the next section that there arecases in which the corpus-derived AN vector mightnot be a good approximation to our semantic in-tuitions about the AN, and a model-composed ANvector is a better semantic surrogate. One of themost important avenues for further work will be tocome to a better characterization of the behaviour ofcorpus-observed ANs, where they work and wherethe don’t. Still, the neighbors of average and AN-specific vectors of Tables 1 and 2 suggest that, forthe bulk of ANs, such corpus-based co-occurrencevectors are semantically reasonable.
6 Study 2: Predicting AN vectors
Having tentatively established that the sort of vec-tors we can harvest for ANs by directly collectingtheir corpus co-occurrences are reasonable represen-tations of their composite meaning, we move on tothe core question of whether it is possible to recon-struct the vector for an unobserved AN from infor-mation about its components. We use nearness tothe corpus-observed vectors of held-out ANs as avery direct way to evaluate the quality of model-generated ANs, since we just saw that the observedANs look reasonable (but see the caveats at the endof this section). We leave it to further work to as-sess the quality of the generated ANs in an appliedsetting, for example adapting Mitchell and Lapata’sparaphrasing task to ANs. Since the observed vec-tors look like plausible representations of compos-ite meaning, we expect that the closer the model-generated vectors are to the observed ones, the betterthey should also perform in any task that requires ac-cess to the composite meaning, and thus that the re-sults of the current evaluation should correlate withapplied performance.
More in detail, we evaluate here the compositionmethods (and the adjective and noun baselines) bycomputing, for each of them, the cosine of the testset AN vectors they generate (the “predicted” ANs)with the 41K vectors representing our extended vo-cabulary in semantic space, and looking at the posi-tion of the corresponding observed ANs (that werenot used for training, in the supervised approaches)
Zusammenfassung
• “Knowledge bottleneck” ist ein sehr ernstes Problem in der semantischen Verarbeitung.
• Wichtiges Thema in aktueller Forschung: distributionelle Methoden für semantische Ähnlichkeit von Wörtern.
• Aktueller Trend: Kombination mit kompositionellen Methoden.