of 108
7/27/2019 english-unl-analysis-hyderabad-10jun08
1/108
Semantic Parsing
Pushpak Bhattacharyya,Computer Science and Engineering
Department,IIT Bombay
[email protected] contributions from Rajat Mohanty, S. Krishna, Sandeep Limaye
7/27/2019 english-unl-analysis-hyderabad-10jun08
2/108
Motivation
Semantics Extraction has many applications MT IR IE
Does not come free Resource intensive
Properties of words Conditions of relation establishment between words
Disambiguation at many levels Current Computational Parsing less than
satisfactory for deep semantic analysis
7/27/2019 english-unl-analysis-hyderabad-10jun08
3/108
Roadmap
Current important parsers Experimental observations Handling of difficult language phenomena
Brief Introduction to the adopted Semantic
Representation: Universal Networking Language (UNL) Two stage process to UNL generation:
approach-1 Use of better parser: approach-2
Consolidating statement of resources Observations on treatment of verb Conclusions and future work
7/27/2019 english-unl-analysis-hyderabad-10jun08
4/108
Current parsers
7/27/2019 english-unl-analysis-hyderabad-10jun08
5/108
Categorization of parsers
Output
MethodConstituency Dependency
Rule Based
Earley Chart(1970), CYK(1965-70), LFG(1970), HPSG
(1985)
Link (1991),Minipar (1993)
Probabilistic
Charniack(2000), Collins(1999),
Stanford
Stanford(2006), MST(2005), MALT
(2007)
7/27/2019 english-unl-analysis-hyderabad-10jun08
6/108
Observations on some well-known Probabilistic Constituency
Parsers
7/27/2019 english-unl-analysis-hyderabad-10jun08
7/108
Parsers investigated
Charniak: Probabilistic Lexicalized Bottom-UpChart Parser
Collins: Head-driven statistical Beam Search
Parser Stanford: Probabilistic A* Parser
RASP: Probabilistic GLR Parser
7/27/2019 english-unl-analysis-hyderabad-10jun08
8/108
Investigations based on
Robustness to Ungrammaticality
Ranking in case of multiple parses
Handling of embeddings
Handling of multiple POS
Words repeated with multiple POS
Complexity
7/27/2019 english-unl-analysis-hyderabad-10jun08
9/108
Handling ungrammatical
sentences
7/27/2019 english-unl-analysis-hyderabad-10jun08
10/108
Charniak
S
NP VP
NNP
Joe
AUX
has
VP
VBG
reading
NP
DT NN
the book
Joe has reading the book
haslabelledas aux
7/27/2019 english-unl-analysis-hyderabad-10jun08
11/108
Collins
hasshouldhavebeen
AUX
7/27/2019 english-unl-analysis-hyderabad-10jun08
12/108
Stanford
has is treated asVBZ and not AUX.
7/27/2019 english-unl-analysis-hyderabad-10jun08
13/108
RASP
Confuses as acase of sentenceembedding
7/27/2019 english-unl-analysis-hyderabad-10jun08
14/108
Ranking in case of multiple
parses
7/27/2019 english-unl-analysis-hyderabad-10jun08
15/108
Charniak
S
NP VP SBAR
NNP VBD S
John said NP VP
VB
Marry
VBD NP PP
sang DT NN IN NP
NNPwithsongthe
MaX
John said Marry sang the song with Max
semanticallycorrect onechosen fromamong
possiblemultipleparse trees
7/27/2019 english-unl-analysis-hyderabad-10jun08
16/108
Collins
Wrongattachment
7/27/2019 english-unl-analysis-hyderabad-10jun08
17/108
Stanford
Same asCharniak
7/27/2019 english-unl-analysis-hyderabad-10jun08
18/108
RASP Different
POS Tags,but parsetrees arecomparable
7/27/2019 english-unl-analysis-hyderabad-10jun08
19/108
Time complexity
7/27/2019 english-unl-analysis-hyderabad-10jun08
20/108
Time taken
54 instances of the sentence This is jus t tocheck the t ime is used to check the time Time taken
Collins : 40s Stanford : 14s Charniak : 8s RASP : 5s
Reported complexity Charniack: O(n5) Collins: O(n5) Stanford: O(n3) RASP: not known
7/27/2019 english-unl-analysis-hyderabad-10jun08
21/108
Embedding Handling
7/27/2019 english-unl-analysis-hyderabad-10jun08
22/108
Charniak
S
NP
NP SBAR
DT NN WHNP S
The cat WDT VP
that VBD NP
killed NP SBAR
NNDT WHNP S
the rat WDT VP
that VBD NP
stole NP SBAR
DT WHNPNN S
WDTthat VP
NP
NPNP
escaped
VBD
VP
A
A
VBD PP
SBARNP
VP
INspilled
SINNNDTon
was
thatfloorthe
AUX ADJP
slippery
The cat that killed the rat that stole the milk that spilled on thefloor that was slippery escaped.
7/27/2019 english-unl-analysis-hyderabad-10jun08
23/108
Collins
7/27/2019 english-unl-analysis-hyderabad-10jun08
24/108
Stanford
7/27/2019 english-unl-analysis-hyderabad-10jun08
25/108
RASP
7/27/2019 english-unl-analysis-hyderabad-10jun08
26/108
Handling words with multiple
POS tags
7/27/2019 english-unl-analysis-hyderabad-10jun08
27/108
Charniack
S
NP VP
NNP VBZ PP
Time flies IN NP
like DT NN
an arrow
Time flies like an arrow
7/27/2019 english-unl-analysis-hyderabad-10jun08
28/108
Collins
7/27/2019 english-unl-analysis-hyderabad-10jun08
29/108
Stanford
7/27/2019 english-unl-analysis-hyderabad-10jun08
30/108
RASP
Flies tagged asnoun!
7/27/2019 english-unl-analysis-hyderabad-10jun08
31/108
Repeated Word handling
7/27/2019 english-unl-analysis-hyderabad-10jun08
32/108
Charniak
S
NP VP
NNP VBZ SBAR
Buffalo buffaloes S
NP VP
NNP VBZ SBAR
Buffalo buffaloes S
NP VP
NN NNP NNP VBZ
buffalo buffaloBuffalo buffaloes
Buffalo buffaloes Buffalo buffaloes buffalo
buffalo Buffalo buffaloes
7/27/2019 english-unl-analysis-hyderabad-10jun08
33/108
Collins
7/27/2019 english-unl-analysis-hyderabad-10jun08
34/108
Stanford
7/27/2019 english-unl-analysis-hyderabad-10jun08
35/108
RASP
Tags all words as nouns!
7/27/2019 english-unl-analysis-hyderabad-10jun08
36/108
Sentence Length
7/27/2019 english-unl-analysis-hyderabad-10jun08
37/108
Sentence with 394 words
One day, Sam left his small, yellow home to head towards the meat-packing plant where he worked, a task whichwas never completed, as on his way, he tripped, fell, and went careening off of a cliff, landing on and destroyingMax, who, incidentally, was also heading to his job at the meat-packing plant, though not the same plant at whichSam worked, which he would be heading to, if he had been aware that that the plant he was currently headingtowards had been destroyed just this morning by a mysterious figure clad in black, who hailed from the small,remote country of France, and who took every opportunity he could to destroy small meat-packing plants, due tothe fact that as a child, he was tormented, and frightened, and beaten savagely by a family of meat-packing plantswho lived next door, and scarred his little mind to the point where he became a twisted and sadistic creature,capable of anything, but specifically capable of destroying meat-packing plants, which he did, and did quite often,much to the chagrin of the people who worked there, such as Max, who was not feeling quite so much chagrin asmost others would feel at this point, because he was dead as a result of an individual named Sam, who worked ata competing meat-packing plant, which was no longer a competing plant, because the plant that it would becompeting against was, as has already been mentioned, destroyed in, as has not quite yet been mentioned, amassive, mushroom cloud of an explosion, resulting from a heretofore unmentioned horse manure bombmanufactured from manure harvested from the farm of one farmer J. P. Harvenkirk, and more specificallyharvested from a large, ungainly, incontinent horse named Seabiscuit, who really wasn't named Seabiscuit, butwas actually named Harold, and it completely baffled him why anyone, particularly the author of a very longsentence, would call him Seabiscuit; actually, it didn't baffle him, as he was just a stupid, manure-making horse,who was incapable of cognitive thought for a variety of reasons, one of which was that he was a horse, and theother of which was that he was just knocked unconscious by a flying chunk of a meat-packing plant, which hadbeen blown to pieces just a few moments ago by a shifty character from France.
7/27/2019 english-unl-analysis-hyderabad-10jun08
38/108
Partial RASP Parse
(|One_MC1| |day_NNT1| |,_,| |Sam_NP1| |leave+ed_VVD| |his_APP$| |small_JJ| |,_,| |yellow_JJ| |home_NN1| |to_TO| |head_VV0| |towards_II| |the_AT| |meat-packing_JJ| |plant_NN1| |where_RRQ| |he_PPHS1| |work+ed_VVD| |,_,| |a_AT1| |task_NN1| |which_DDQ| |be+ed_VBDZ| |never_RR| |complete+ed_VVN| |,_,| |as_CSA||on_II| |his_APP$| |way_NN1| |,_,| |he_PPHS1| |trip+ed_VVD| |,_,| |fall+ed_VVD| |,_,| |and_CC| |go+ed_VVD| |careen+ing_VVG| |off_RP| |of_IO| |a_AT1| |cliff_NN1| |,_,||land+ing_VVG| |on_RP| |and_CC| |destroy+ing_VVG| |Max_NP1| |,_,| |who_PNQS| |,_,| |incidentally_RR| |,_,| |be+ed_VBDZ| |also_RR| |head+ing_VVG| |to_II||his_APP$| |job_NN1| |at_II| |the_AT| |meat-packing_JB| |plant_NN1| |,_,| |though_CS| |not+_XX| |the_AT| |same_DA| |plant_NN1| |at_II| |which_DDQ| |Sam_NP1||work+ed_VVD| |,_,| |which_DDQ| |he_PPHS1| |would_VM| |be_VB0| |head+ing_VVG| |to_II| |,_,| |if_CS| |he_PPHS1| |have+ed_VHD| |be+en_VBN| |aware_JJ||that_CST| |that_CST| |the_AT| |plant_NN1| |he_PPHS1| |be+ed_VBDZ| |currently_RR| |head+ing_VVG| |towards_II| |have+ed_VHD| |be+en_VBN| |destroy+ed_VVN||just_RR| |this_DD1| |morning_NNT1| |by_II| |a_AT1| |mysterious_JJ| |figure_NN1| |clothe+ed_VVN| |in_II| |black_JJ| |,_,| |who_PNQS| |hail+ed_VVD| |from_II| |the_AT||small_JJ| |,_,| |remote_JJ| |country_NN1| |of_IO| |France_NP1| |,_,| |and_CC| |who_PNQS| |take+ed_VVD| |every_AT1| |opportunity_NN1| |he_PPHS1| |could_VM||to_TO| |destroy_VV0| |small_JJ| |meat-packing_NN1| |plant+s_NN2| |,_,| |due_JJ| |to_II| |the_AT| |fact_NN1| |that_CST| |as_CSA| |a_AT1| |child_NN1| |,_,| |he_PPHS1||be+ed_VBDZ| |torment+ed_VVN| |,_,| |and_CC| |frighten+ed_VVD| |,_,| |and_CC| |beat+en_VVN| |savagely_RR| |by_II| |a_AT1| |family_NN1| |of_IO| |meat-packing_JJ||plant+s_NN2| |who_PNQS| |live+ed_VVD| |next_MD| |door_NN1| |,_,| |and_CC| |scar+ed_VVD| |his_APP$| |little_DD1| |mind_NN1| |to_II| |the_AT| |point_NNL1||where_RRQ| |he_PPHS1| |become+ed_VVD| |a_AT1| |twist+ed_VVN| |and_CC| |sadistic_JJ| |creature_NN1| |,_,| |capable_JJ| |of_IO| |anything_PN1| |,_,| |but_CCB||specifically_RR| |capable_JJ| |of_IO| |destroy+ing_VVG| |meat-packing_JJ| |plant+s_NN2| |,_,| |which_DDQ| |he_PPHS1| |do+ed_VDD| |,_,| |and_CC| |do+ed_VDD||quite_RG| |often_RR| |,_,| |much_DA1| |to_II| |the_AT| |chagrin_NN1| |of_IO| |the_AT| |people_NN| |who_PNQS| |work+ed_VVD| |there_RL| |,_,| |such_DA| |as_CSA||Max_NP1| |,_,| |who_PNQS| |be+ed_VBDZ| |not+_XX| |feel+ing_VVG| |quite_RG| |so_RG| |much_DA1| |chagrin_NN1| |as_CSA| |most_DAT| |other+s_NN2| |would_VM|
|feel_VV0| |at_II| |this_DD1| |point_NNL1| |,_,| |because_CS| |he_PPHS1| |be+ed_VBDZ| |dead_JJ| |as_CSA| |a_AT1| |result_NN1| |of_IO| |an_AT1| |individual_NN1||name+ed_VVN| |Sam_NP1| |,_,| |who_PNQS| |work+ed_VVD| |at_II| |a_AT1| |compete+ing_VVG| |meat-packing_JJ| |plant_NN1| |,_,| |which_DDQ| |be+ed_VBDZ||no_AT| |longer_RRR| |a_AT1| |compete+ing_VVG| |plant_NN1| |,_,| |because_CS| |the_AT| |plant_NN1| |that_CST| |it_PPH1| |would_VM| |be_VB0| |compete+ing_VVG||against_II| |be+ed_VBDZ| |,_,| |as_CSA| |have+s_VHZ| |already_RR| |be+en_VBN| |mention+ed_VVN| |,_,| |destroy+ed_VVN| |in_RP| |,_,| |as_CSA| |have+s_VHZ||not+_XX| |quite_RG| |yet_RR| |be+en_VBN| |mention+ed_VVN| |,_,| |a_AT1| |massive_JJ| |,_,| |mushroom_NN1| |cloud_NN1| |of_IO| |an_AT1| |explosion_NN1| |,_,||result+ing_VVG| |from_II| |a_AT1| |heretofore_RR| |unmentioned_JJ| |horse_NN1| |manure_NN1| |bomb_NN1| |manufacture+ed_VVN| |from_II| |manure_NN1||harvest+ed_VVN| |from_II| |the_AT| |farm_NN1| |of_IO| |one_MC1| |farmer_NN1| J._NP1 P._NP1 |Harvenkirk_NP1| |,_,| |and_CC| |more_DAR| |specifically_RR||harvest+ed_VVN| |from_II| |a_AT1| |large_JJ| |,_,| |ungainly_JJ| |,_,| |incontinent_NN1| |horse_NN1| |name+ed_VVN| |Seabiscuit_NP1| |,_,| |who_PNQS| |really_RR||be+ed_VBDZ| |not+_XX| |name+ed_VVN| |Seabiscuit_NP1| |,_,| |but_CCB| |be+ed_VBDZ| |actually_RR| |name+ed_VVN| |Harold_NP1| |,_,| |and_CC| |it_PPH1||completely_RR| |baffle+ed_VVD| |he+_PPHO1| |why_RRQ| |anyone_PN1| |,_,| |particularly_RR| |the_AT| |author_NN1| |of_IO| |a_AT1| |very_RG| |long_JJ||sentence_NN1| |,_,| |would_VM| |call_VV0| |he+_PPHO1| |Seabiscuit_NP1| |;_;| |actually_RR| |,_,| |it_PPH1| |do+ed_VDD| |not+_XX| |baffle_VV0| |he+_PPHO1| |,_,||as_CSA| |he_PPHS1| |be+ed_VBDZ| |just_RR| |a_AT1| |stupid_JJ| |,_,| |manure-making_NN1| |horse_NN1| |,_,| |who_PNQS| |be+ed_VBDZ| |incapable_JJ| |of_IO||cognitive_JJ| |thought_NN1| |for_IF| |a_AT1| |variety_NN1| |of_IO| |reason+s_NN2| |,_,| |one_MC1| |of_IO| |which_DDQ| |be+ed_VBDZ| |that_CST| |he_PPHS1||be+ed_VBDZ| |a_AT1| |horse_NN1| |,_,| |and_CC| |the_AT| |other_JB| |of_IO| |which_DDQ| |be+ed_VBDZ| |that_CST| |he_PPHS1| |be+ed_VBDZ| |just_RR||knock+ed_VVN| |unconscious_JJ| |by_II| |a_AT1| |flying_NN1| |chunk_NN1| |of_IO| |a_AT1| |meat-packing_JJ| |plant_NN1| |,_,| |which_DDQ| |have+ed_VHD||be+en_VBN| |blow+en_VVN| |to_II| |piece+s_NN2| |just_RR| |a_AT1| |few_DA2| |moment+s_NNT2| |ago_RA| |by_II| |a_AT1| |shifty_JJ| |character_NN1| |from_II||France_NP1| ._.) -1 ; ()
7/27/2019 english-unl-analysis-hyderabad-10jun08
39/108
What do we learn?
All parsers have problems dealing withlong sentences
Complex language phenomena causethem to falter
Good as starting points for structure
detection But need output correction very often
7/27/2019 english-unl-analysis-hyderabad-10jun08
40/108
Needs of high accuracy parsing(difficult language phenomena)
Systems
LanguagePhenomena
Link Parser Charniak
Parser
Stanford
Parser
Machinese
Syntax
MiniPar Collins
Parser
Our System
Empty-PRO Detection No No No No Yes No Yes
Empty-PROResolution
No No No No Yes No Yes
WH-TraceDetection
Yes No No No Yes No Yes
Relative Pronoun Resolution No No No No Yes No Yes
PP Attachment
Resolution
No No Yes No No No Yes
Clausal Attachment Resolution Yes No No No Yes No Yes
Distinguishing Arguments
from Adjuncts
No No No No No No Yes
Small Clause Detection No Yes Yes No Yes No Yes
7/27/2019 english-unl-analysis-hyderabad-10jun08
41/108
Context of our work: Universal
Networking Language (UNL)
7/27/2019 english-unl-analysis-hyderabad-10jun08
42/108
A vehicle for machine translation
Much more demanding than transfer approach or direct approach
Interlingua(UNL)
English
French
Hindi
Chinese
generation
Analysis
7/27/2019 english-unl-analysis-hyderabad-10jun08
43/108
A United Nations project
Started in 1996 10 year program
15 research groups across continents
First goal: generators
Next goal: analysers (needs solving various ambiguityproblems)
Current active groups: UNL-Spanish, UNL-Russian,UNL-French, UNL-Hindi
IIT Bombay concentrating on UNL-Hindi and UNL-English
Dave, Parikh and Bhattacharyya, Journal of Machine Translation, 2002
7/27/2019 english-unl-analysis-hyderabad-10jun08
44/108
UNL represents knowledge:John eats rice with a spoon
Semantic relations
attributes
Universal words
7/27/2019 english-unl-analysis-hyderabad-10jun08
45/108
Sentence EmbeddingsMary claimed that she had composed a poem
compose(icl>do)
Mary(iof>person)
claim(icl>do)
she poem(icl>art)
:01
agt
obj
objagt
@entry.@past
@entry.@past@complete
7/27/2019 english-unl-analysis-hyderabad-10jun08
46/108
Relation repository
Number 39 Groups:
Agent-object-instrument: agt, obj, ins, met
Time: tim, tmf, tmt Place:plc, plf, plt
Restriction: mod, aoj
Prepositions taking object: go, frm
Ontological: icl, iof, equ
Etc. etc.
7/27/2019 english-unl-analysis-hyderabad-10jun08
47/108
Semantically Relatable
Sequences (SRS)Mohanty, Dutta and
Bhattacharyya, Machine
Translation Summit, 2005
S ti ll R l t bl S
7/27/2019 english-unl-analysis-hyderabad-10jun08
48/108
Semantically Relatable Sequences(SRS)
Defini t ion: A semantically relatablesequence (SRS)of a sentence is a groupof unordered words in the sentence (not
necessarily consecutive) that appear in thesemantic graph of the sentence as linkednodes or nodes with speech act labels
E l t ill t t SRS
7/27/2019 english-unl-analysis-hyderabad-10jun08
49/108
Example to illustrate SRS
The man bought a
new car in Junein: modifier
a: indefinite
the: definite
man
past tense
agent
bought
object
time
car
new
June
modifier
SRS f th b ht
7/27/2019 english-unl-analysis-hyderabad-10jun08
50/108
SRSsfrom the man bought a newcar in June
a. {man, bought}b. {bought, car}c. {bought, in, June}d. {new, car}e. {the, man}f. {a, car}
7/27/2019 english-unl-analysis-hyderabad-10jun08
51/108
Basic questions
What are the SRSs of a given sentence? What semantic relations can link the words
in an SRS?
7/27/2019 english-unl-analysis-hyderabad-10jun08
52/108
Postulate
A sentence needs to be broken into sets ofat most three forms
{CW, CW}
{CW, FW, CW} {FW, CW}
where CW refers to content word or a clauseand FW to function word
7/27/2019 english-unl-analysis-hyderabad-10jun08
53/108
Language Phenomena and
SRS
Clausal constructs
7/27/2019 english-unl-analysis-hyderabad-10jun08
54/108
Clausal constructs
Sentences: The boy said that he was reading a
novela. {the boy}
b. {boy, said}c. {said, that, SCOPE}d. SCOPE:{he, reading}
e. SCOPE:{reading, novel}
f. SCOPE:{a, novel}
g. SCOPE:{was, reading}
scope: umbrella for clauses or compounds
P iti Ph (PP)
7/27/2019 english-unl-analysis-hyderabad-10jun08
55/108
Preposition Phrase (PP)Attachment
John published the article in June{John, published}: {CW,CW}
{published, article}: {CW,CW}
{published, in, June}: {CW,FW,CW}
{the, article}: {FW,CW}
Contrast with
The article in June was published by John
{The, article}: {FW,CW}{article, in, June}: {CW,FW,CW}
{article, was, published}: {CW,CW}
{published, by, John}: {CW,CW}
T I fi iti l
7/27/2019 english-unl-analysis-hyderabad-10jun08
56/108
To-Infinitival
PRO element co-indexed with the object him
I forced Johni[PRO]ito throw a party
PRO element co-indexed with the subject I
Iipromised John [PRO]ito throw a party
SRSs are {I, forced}: {CW,CW}
{forced, John}: {CW,CW}
{forced, SCOPE}: {CW,CW}
SCOPE:{John, to, throw}: {CW,FW,CW} SCOPE:{throw, party}: {CW,CW}
SCOPE:{a, party}: {FW,CW}
replacedwith Iin the 2nd
sentence
go deeperthan surfacephenomena
C l iti f th t
7/27/2019 english-unl-analysis-hyderabad-10jun08
57/108
Complexities of that
Embedded clausal constructs as opposed torelative clauses need to be resolved Mary claimed that she had composed a poem
The poem that Mary composed was beautiful
Dangling that I told the child that I know that he played well
7/27/2019 english-unl-analysis-hyderabad-10jun08
58/108
Two possibilities
told
I the child
that hePlayed
well
that Iknow
told
I the child that I knowthat that he
Playedwell
7/27/2019 english-unl-analysis-hyderabad-10jun08
59/108
SRS Implementation
Syntactic constituents to Semantic
7/27/2019 english-unl-analysis-hyderabad-10jun08
60/108
Syntactic constituents to Semanticconstituents
Used a probabilistic parser (Charniak,04)
Output of Charniack parser: tags give
indications of CW and FW NP, VP, ADJP and ADVP
CW
PP (prepositional phrase), IN(preposition) and DT (determiner)FW
Observation:
7/27/2019 english-unl-analysis-hyderabad-10jun08
61/108
Observation:Headwords of sibling nodes form SRSs
John hasbought
a car.
SRS:
{has, bought},{a, car},
{bought, car}a
(C) VP bought
(F) AUX has (C) VP bought
(C) VBD bought (C) NPcar
(F) DT a (C) NN car
bought
car
has
7/27/2019 english-unl-analysis-hyderabad-10jun08
62/108
Work needed on the parse
tree
Correction of wrong PP attachment
7/27/2019 english-unl-analysis-hyderabad-10jun08
63/108
Correction of wrong PP attachment
John has publishedan article onlinguistics
Use PP attachmentheuristics
Get
{article, on, linguistics}
on linguistics
(C)VP published
(F) PP on(C)VBD published (C)NP article
published
(F)DT an
an
(C)NNarticle
(F)IN on
article
(C)NNSlinguistics
(C)NPlinguistics
To Infinitival
7/27/2019 english-unl-analysis-hyderabad-10jun08
64/108
To-Infinitival
Clause boundary is the VP
node, labeled with SCOPE.
Tag is modified to TO, a FW
tag, indicating that it heads
a to-infinitival clause.
The duplication and insertion
of the NP node with head h im
(depicted by shaded nodes) as
a sibling of the VBD node with
head forcedis done to bring out
the existence of a semantic
relation between fo rceand h im.
(C)VPwatch
(C)VBD forced (C)NP him(C) S SCOPE
(F)TOtoto
(C)VP forced
to
forced
(C)VP
(C)PRP him
him
(C)NPhim
him
(C)PRP him
Linking of clauses:
7/27/2019 english-unl-analysis-hyderabad-10jun08
65/108
Linking of clauses:John said that he was reading a novel
Head of S nodemarked as ScopeSRS:
{said, that, SCOPE}
Adverbial clauseshave similar parsetree structures
except that thesubordinatingconjunctions aredifferent from that
(C)VBD said (F) SBAR that
(C) VP said
(F) IN that(C) S SCOPE
said that
Implementation
7/27/2019 english-unl-analysis-hyderabad-10jun08
66/108
Implementation Block Diagram of the system
Parse Tree
Charniak Parser
Scope Handler
Attachment Resolver
WordNet 2.0
Sub-categorization Database
Input Sentence
Parse Tree modification and
augmentation with head and scope
information
Augmented
Parse Tree
Semantically Related Sets
Noun classification
Semantically Relatable Sets
Generator
THAT clause as Subcat property
Preposition as Subcat property
Time and Place
features
Evaluation
7/27/2019 english-unl-analysis-hyderabad-10jun08
67/108
Evaluation
Used the Penn Treebank (LDC, 1995)asthe test bed
The un-annotated sentences, actually fromthe WSJ corpus (Charniak et. al.1987),
were passed through the SRS generator Results were compared with the
Treebanks annotated sentences
Results on SRS generation
7/27/2019 english-unl-analysis-hyderabad-10jun08
68/108
Results on SRS generation
0 20 40 60 80 100
Total SRSs
(FW,CW)
(CW,FW,CW)
(CW,CW)
Parametersmatched
Recall/Precision
Recall
Precision
Results on sentence constructs
7/27/2019 english-unl-analysis-hyderabad-10jun08
69/108
Results on sentence constructs
0 20 40 60 80 100
To-infinitival clause resolution
Complement-clause resolution
Clause linkings
PP Resolution
Parameter
Recall/Precision
Recall
Precision
7/27/2019 english-unl-analysis-hyderabad-10jun08
70/108
SRS to UNL
7/27/2019 english-unl-analysis-hyderabad-10jun08
71/108
Features of the system
High accuracy resolution of different kinds of attachment Precise and fine grained semantic relations between
sentence constituents
Empty-pronominal detection and resolution
Exhaustive knowledge bases of sub-categorizationframes, verb knowledge bases and rule templates forestablishing semantic relations and speech act likeattributes using Oxford Advanced Learners Dictionary (Hornby, 2001)
VerbNet (Schuler, 2005)
WordNet 2.1 (Miller, 2005)
Penn Tree Bank (LDC, 1995) and
XTAG lexicon (XTAG, 2001)
Side effect: high accuracy parsing
7/27/2019 english-unl-analysis-hyderabad-10jun08
72/108
Side effect: high accuracy parsing(comparison with other parsers)
Systems
LanguagePhenomena
Link Parser Charniak
Parser
Stanford
Parser
Machinese
Syntax
MiniPar Collins
Parser
Our System
Empty-PRO Detection No No No No Yes No Yes
Empty-PROResolution
No No No No Yes No Yes
WH-Trace
Detection
Yes No No No Yes No Yes
Relative Pronoun Resolution No No No No Yes No Yes
PP Attachment
Resolution
No No Yes No No No Yes
Clausal Attachment Resolution Yes No No No Yes No Yes
Distinguishing Arguments
from Adjuncts
No No No No No No Yes
Small Clause Detection No Yes Yes No Yes No Yes
Rules for generating
7/27/2019 english-unl-analysis-hyderabad-10jun08
73/108
CW1 FW CW2 REL(UW1,UW2)
Syntactic
Feature
Semantic
Feature
Syntactic
Feature
Semantic Feature Syntactic
Feature
Semantic
Feature
SynCat POS SemCat Lex SynCat POS SemCat Lex SynCat POS SemCat Lex Rel UW1 UW2- - V020 - - - - into N - - - gol 1 3
V - - - - - - within N - TIME - dur 1 3
Rules for generating
Semantic Relations
e.g., turn water into steam
e.g., finish within a week
R l f ti tt ib t
7/27/2019 english-unl-analysis-hyderabad-10jun08
74/108
Rules for generating attributes
String of FWs CW UNL attribute list
generated
has_been VBG @present
@complete
@progress
has_been VBN @present
@complete
@passive
S t hit t
7/27/2019 english-unl-analysis-hyderabad-10jun08
75/108
System architecture
E l ti h
7/27/2019 english-unl-analysis-hyderabad-10jun08
76/108
Evaluation: scheme
E l ti l
7/27/2019 english-unl-analysis-hyderabad-10jun08
77/108
Evaluation: example
Input: He worded the statement carefully.[unlGenerated:76]
agt(word.@entry, he)
obj(word.@entry, statement.@def)
man(word.@entry, carefully)[\unl]
[unlGold:76]
agt(word.@entry.@past, he)obj(word.@entry.@past, statement.@def)
man(word.@entry.@past, carefully)
[\unl]
F1-Score = 0.945
Not heavily
punishedsince attributes
are not crucial
to the meaning!!
7/27/2019 english-unl-analysis-hyderabad-10jun08
78/108
Approach 2: switch to rule
based parsing: LFGlink
7/27/2019 english-unl-analysis-hyderabad-10jun08
79/108
Using Functional Structure from an LFGParser
Sentence
John eats a pastry
SUBJ (eat, John)OBJ (eat, pastry)VTYPE (eat, main)
agt (eat, Ram)obj (eat, mango)
Functional Structure
(Transfer Facts)UNL
Lexical Functional Grammar
7/27/2019 english-unl-analysis-hyderabad-10jun08
80/108
Lexical Functional Grammar
Considers two aspects
Lexical: considers lexical structures and relations Functional: considers grammatical functions of different
constituents, like SUBJECT, OBJECT
Two structures: C-structure(Constituent-structure) F-structure(Functional-structure)
Languages vary in C-structure (word order, phrasalstructure) but have the same functional structure
(SUBJECT, OBJECT, etc.)
LFG St t l
7/27/2019 english-unl-analysis-hyderabad-10jun08
81/108
LFG Structuresexample
Sentence: He gave her a kiss.
C-structure F-structure
XLE P
7/27/2019 english-unl-analysis-hyderabad-10jun08
82/108
XLE Parser
Developed by Xerox Corporation
Gives C-structures, F-structures andmorphology of the sentence constituents
Supports packed rewriting systemconverting F-structure to transfer facts,used by our system
Works on Solaris, Linux and MacOSX
N ti f T f F t
7/27/2019 english-unl-analysis-hyderabad-10jun08
83/108
Notion of Transfer Facts
Serialized representation of the Functional structure
Particularly useful for transfer-based MT systems
We use it as the starting point for UNL generation
Example transfer facts
Transfer Facts - Example
7/27/2019 english-unl-analysis-hyderabad-10jun08
84/108
Transfer Facts - Example
Sentence:
The boy ate the apples hastily.
Transfer facts (selected):ADJUNCT,eat:2,hastily:6
ADV-TYPE,hastily:6,vpadv
DET,apple:5,the:4
DET,boy:1,the:0
DET-TYPE,the:0,def
DET-TYPE,the:4,def
NUM,apple:5,pl
NUM,boy:1,sg
OBJ,eat:2,apple:5
PASSIVE,eat:2,-
PERF,eat:2,-_PROG,eat:2,-_
SUBJ,eat:2,boy:1
TENSE,eat:2,past
VTYPE,eat:2,main
_SUBCAT-FRAME,eat:2,V-SUBJ-OBJ
Workflow in detail
7/27/2019 english-unl-analysis-hyderabad-10jun08
85/108
Ph 1 S t t t f f t
7/27/2019 english-unl-analysis-hyderabad-10jun08
86/108
Phase 1: Sentence to transfer facts
Input: SentenceThe boy ate the apples hastily.
Output: Transfer facts (selected are shown here)ADJUNCT,eat:2,hastily:6
ADV-TYPE,hastily:6,vpadv
DET,apple:5,the:4
DET,boy:1,the:0
DET-TYPE,the:0,def
DET-TYPE,the:4,def
NUM,apple:5,pl
NUM,boy:1,sg
OBJ,eat:2,apple:5PASSIVE,eat:2,-
PERF,eat:2,-_
PROG,eat:2,-_
SUBJ,eat:2,boy:1
TENSE,eat:2,past
VTYPE,eat:2,main
_SUBCAT-FRAME,eat:2,V-SUBJ-OBJ
Phase 2: Transfer facts to word entry
7/27/2019 english-unl-analysis-hyderabad-10jun08
87/108
Phase 2: Transfer facts to word entrycollection
Input: transfer facts as in the previous example Output: word entry collection
Word entry eat:2, lex item: eat
(PERF:-_ PASSIVE:- _SUBCAT-FRAME:V-SUBJ-OBJ VTYPE:mainSUBJ:boy:1 OBJ:apple:5 ADJUNCT:hastily:6 CLAUSE-TYPE:declTENSE:past PROG:-_ MOOD:indicative )
Word entry boy:1, lex item: boy
(CASE:nom _LEX-SOURCE:countnoun-lex COMMON:count DET:the:0NSYN:common PERS:3 NUM:sg )
Word entry apple:5, lex item: apple
(CASE:obl _LEX-SOURCE:morphology COMMON:count DET:the:4NSYN:common PERS:3 NUM:pl )
Word entry hastily:6, lex item: hastily
(DEGREE:positive _LEX-SOURCE:morphology ADV-TYPE:vpadv )
Word entry the:0, lex item: the
(DET-TYPE:def )
Word entry the:4, lex item: the
(DET-TYPE:def )
Phase 3 (1): UW and Attribute
7/27/2019 english-unl-analysis-hyderabad-10jun08
88/108
ase 3 ( ) U a d tt butegeneration
Input: word entry collection Output: Universal Words with (some) attributes generated
In our example:
UW (eat:2.@entry.@past) UW (hastily:6)
UW (boy:1) UW (the:0)
UW (apple:5.@pl) UW (the:4)
Example transfer facts and their mapping to UNL attributes
Digression: Subcat Frames, Arguments and
7/27/2019 english-unl-analysis-hyderabad-10jun08
89/108
Digression: Subcat Frames, Arguments andAdjuncts
Subcat frames and arguments A predicatesubcategorizes for its arguments, or
arguments aregoverned by the predicate.
Example: predicate eat subcategorizes for a
SUBJECTargument and an OBJECTargument.The correspondingsubcat frameis
V-SUBJ-OBJ.
Arguments are mandatory for a predicate.
Adjuncts Give additional information about the predicate
Not mandatory
Example: hastily in The boy ate the apples hastily.
Ph 3(1) H dli f S b t F
7/27/2019 english-unl-analysis-hyderabad-10jun08
90/108
Phase 3(1): Handling of Subcat Frames
Input: Word entry collection
Mapping of subcat frames to transfer facts
Mapping of transfer facts to relations orattributes
Output: relations and / or attributes
Example: for our sentence, agt(eat,boy),obj(eat,apple)relations are generated inthis phase.
R l b f S b t h dli l (1)
7/27/2019 english-unl-analysis-hyderabad-10jun08
91/108
Rule bases for Subcat handlingexamples (1)
Mapping Subcat frames to transfer facts
R l b f S b t h dli l (2)
7/27/2019 english-unl-analysis-hyderabad-10jun08
92/108
Rule bases for Subcat handlingexamples (2)
Mapping Subcat frames, transfer facts to relations / attributesSome simplified rules
Phase 3(2): Handling of adjuncts
7/27/2019 english-unl-analysis-hyderabad-10jun08
93/108
Phase 3(2): Handling of adjuncts
Input: Word entry collection
List of transfer facts to be considered for adjuncthandling
Rules for relation generation based on transfer factsand word properties
Output: relations and / or attributes
Example: for our sentence, man(eat,hastily)relation; and @defattributes forboy, appleare generated in this phase.
Rule bases for adjunct handling
7/27/2019 english-unl-analysis-hyderabad-10jun08
94/108
j gexamples (1)
Mapping adjunct transfer facts to relations / attributesSome simplified rules
Rule bases for adjunct handling examples (2)
7/27/2019 english-unl-analysis-hyderabad-10jun08
95/108
Rule bases for adjunct handlingexamples (2)
Mapping adjuncts to relations / attributes based on prepositions- some example rules
Final UNL Expression
7/27/2019 english-unl-analysis-hyderabad-10jun08
96/108
Final UNL Expression
Sentence:The boy ate the apples hastily.
UNL Expression:
[unl:1]
agt(eat:2.@entry.@past,boy:1.@def)
man(eat:2.@entry.@past,hastily:6)
obj(eat:2.@entry.@past,apple:5.@pl.@def)[\unl]
Design of Relation Generation Rules
7/27/2019 english-unl-analysis-hyderabad-10jun08
97/108
gan example
Subject
Verb Type Verb Type
agt aoj aoj aoj aoj obj
ANIMATE INANIMATE
dobe
occur dobe
occur
7/27/2019 english-unl-analysis-hyderabad-10jun08
98/108
Summary of Resources
Mohanty and Bhatacharyya, LREC 2008
Lexical Resources
7/27/2019 english-unl-analysis-hyderabad-10jun08
99/108
Verb
Knowledgebase
Semantic
Argument
Frame
Syntactic and Semantic
Argument mapping
Verb
Senses
Functional
Elements with
Grammatical
attributes
Lexical
Knowledgebase
with
Semantic
attributes
N V AdvA
UNL Expression Generation
Auxiliary verbs Determiners
Tense-Aspect morphemes
Syntactic Argument
database
PPs as
syntactic
arguments
N V Adv
Clause as
syntactic
arguments
A
SRS Generation
Use of a number of lexical data
7/27/2019 english-unl-analysis-hyderabad-10jun08
100/108
We have created these resources over a long periodof time from
Oxford Advanced Learners Dictionary (OALD) (Hornby, 2001)
VerbNet (Schuler, 2005)
Princeton WordNet 2.1 (Miller, 2005)
LCS database (Dorr, 1993)
Penn Tree Bank (LDC, 1995), and
XTAG lexicon (XTAG Research Group, 2001)
Use of a number of lexical data
7/27/2019 english-unl-analysis-hyderabad-10jun08
101/108
Verb Knowledge Base (VKB) Structure
VKB statistics
7/27/2019 english-unl-analysis-hyderabad-10jun08
102/108
VKB statistics
4115 unique verbs 22000 rows (different senses)
189verb groups
Verb categorization in UNL and its relationship totraditional verb categorization
7/27/2019 english-unl-analysis-hyderabad-10jun08
103/108
traditional verb categorization
Traditional(sytactic)
UNL(semantic)
Transitive(has direct
object)
Intransitive
Do(action)
Ram pulls therope
Ram goes home
(ergative
languages)
Be(state)
Ram knows
mathematics Ram sleeps
Occur(event)
Ram forgotmathematics
Earth cracks
Unergative(syntactic subject=semantic agent)
Unaccusative(syntactic subjectsemantic agent)
Accuracy on various phenomena
7/27/2019 english-unl-analysis-hyderabad-10jun08
104/108
and corpora
7/27/2019 english-unl-analysis-hyderabad-10jun08
105/108
Applications
MT and IR
7/27/2019 english-unl-analysis-hyderabad-10jun08
106/108
MT and IR
Smriti Singh, Mrugank Dalal, Vishal Vachani,Pushpak Bhattacharyya and Om Damani, HindiGeneration from Interlingua, Machine TranslationSummit (MTS 07), Copenhagen, September, 2007.
Sanjeet Khaitan, Kamaljeet Verma and Pushpak
Bhattacharyya, Exploi t ing Semant ic Proxim i ty forInformation Retrieval, IJCAI 2007, Workshop onCross Lingual Information Access, Hyderabad, India,Jan, 2007.
Kamaljeet Verma and Pushpak Bhattacharyya,Context-Sens i t ive Semant ic Smoothing us ingSemant ical ly Relatable Sequences, submitted
Conclusions and future work
7/27/2019 english-unl-analysis-hyderabad-10jun08
107/108
Conclusions and future work
Presented two approaches to UNLgeneration
Demonstrated the need for Resources
Working on handling difficult languagephenomena
WSD for correct UW word
URLs
7/27/2019 english-unl-analysis-hyderabad-10jun08
108/108
URLs
For resourceswww.cfilt.iitb.ac.in
For publications
www.cse.iitb.ac.in/~pb
http://www.cfilt.iitb.ac.in/http://www.cse.iitb.ac.in/~pbhttp://www.cse.iitb.ac.in/~pbhttp://www.cfilt.iitb.ac.in/