+ All Categories
Home > Documents > english-unl-analysis-hyderabad-10jun08

english-unl-analysis-hyderabad-10jun08

Date post: 14-Apr-2018
Category:
Upload: arteepu4
View: 216 times
Download: 0 times
Share this document with a friend

of 108

Transcript
  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    1/108

    Semantic Parsing

    Pushpak Bhattacharyya,Computer Science and Engineering

    Department,IIT Bombay

    [email protected] contributions from Rajat Mohanty, S. Krishna, Sandeep Limaye

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    2/108

    Motivation

    Semantics Extraction has many applications MT IR IE

    Does not come free Resource intensive

    Properties of words Conditions of relation establishment between words

    Disambiguation at many levels Current Computational Parsing less than

    satisfactory for deep semantic analysis

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    3/108

    Roadmap

    Current important parsers Experimental observations Handling of difficult language phenomena

    Brief Introduction to the adopted Semantic

    Representation: Universal Networking Language (UNL) Two stage process to UNL generation:

    approach-1 Use of better parser: approach-2

    Consolidating statement of resources Observations on treatment of verb Conclusions and future work

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    4/108

    Current parsers

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    5/108

    Categorization of parsers

    Output

    MethodConstituency Dependency

    Rule Based

    Earley Chart(1970), CYK(1965-70), LFG(1970), HPSG

    (1985)

    Link (1991),Minipar (1993)

    Probabilistic

    Charniack(2000), Collins(1999),

    Stanford

    Stanford(2006), MST(2005), MALT

    (2007)

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    6/108

    Observations on some well-known Probabilistic Constituency

    Parsers

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    7/108

    Parsers investigated

    Charniak: Probabilistic Lexicalized Bottom-UpChart Parser

    Collins: Head-driven statistical Beam Search

    Parser Stanford: Probabilistic A* Parser

    RASP: Probabilistic GLR Parser

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    8/108

    Investigations based on

    Robustness to Ungrammaticality

    Ranking in case of multiple parses

    Handling of embeddings

    Handling of multiple POS

    Words repeated with multiple POS

    Complexity

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    9/108

    Handling ungrammatical

    sentences

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    10/108

    Charniak

    S

    NP VP

    NNP

    Joe

    AUX

    has

    VP

    VBG

    reading

    NP

    DT NN

    the book

    Joe has reading the book

    haslabelledas aux

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    11/108

    Collins

    hasshouldhavebeen

    AUX

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    12/108

    Stanford

    has is treated asVBZ and not AUX.

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    13/108

    RASP

    Confuses as acase of sentenceembedding

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    14/108

    Ranking in case of multiple

    parses

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    15/108

    Charniak

    S

    NP VP SBAR

    NNP VBD S

    John said NP VP

    VB

    Marry

    VBD NP PP

    sang DT NN IN NP

    NNPwithsongthe

    MaX

    John said Marry sang the song with Max

    semanticallycorrect onechosen fromamong

    possiblemultipleparse trees

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    16/108

    Collins

    Wrongattachment

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    17/108

    Stanford

    Same asCharniak

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    18/108

    RASP Different

    POS Tags,but parsetrees arecomparable

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    19/108

    Time complexity

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    20/108

    Time taken

    54 instances of the sentence This is jus t tocheck the t ime is used to check the time Time taken

    Collins : 40s Stanford : 14s Charniak : 8s RASP : 5s

    Reported complexity Charniack: O(n5) Collins: O(n5) Stanford: O(n3) RASP: not known

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    21/108

    Embedding Handling

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    22/108

    Charniak

    S

    NP

    NP SBAR

    DT NN WHNP S

    The cat WDT VP

    that VBD NP

    killed NP SBAR

    NNDT WHNP S

    the rat WDT VP

    that VBD NP

    stole NP SBAR

    DT WHNPNN S

    WDTthat VP

    NP

    NPNP

    escaped

    VBD

    VP

    A

    A

    VBD PP

    SBARNP

    VP

    INspilled

    SINNNDTon

    was

    thatfloorthe

    AUX ADJP

    slippery

    The cat that killed the rat that stole the milk that spilled on thefloor that was slippery escaped.

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    23/108

    Collins

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    24/108

    Stanford

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    25/108

    RASP

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    26/108

    Handling words with multiple

    POS tags

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    27/108

    Charniack

    S

    NP VP

    NNP VBZ PP

    Time flies IN NP

    like DT NN

    an arrow

    Time flies like an arrow

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    28/108

    Collins

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    29/108

    Stanford

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    30/108

    RASP

    Flies tagged asnoun!

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    31/108

    Repeated Word handling

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    32/108

    Charniak

    S

    NP VP

    NNP VBZ SBAR

    Buffalo buffaloes S

    NP VP

    NNP VBZ SBAR

    Buffalo buffaloes S

    NP VP

    NN NNP NNP VBZ

    buffalo buffaloBuffalo buffaloes

    Buffalo buffaloes Buffalo buffaloes buffalo

    buffalo Buffalo buffaloes

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    33/108

    Collins

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    34/108

    Stanford

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    35/108

    RASP

    Tags all words as nouns!

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    36/108

    Sentence Length

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    37/108

    Sentence with 394 words

    One day, Sam left his small, yellow home to head towards the meat-packing plant where he worked, a task whichwas never completed, as on his way, he tripped, fell, and went careening off of a cliff, landing on and destroyingMax, who, incidentally, was also heading to his job at the meat-packing plant, though not the same plant at whichSam worked, which he would be heading to, if he had been aware that that the plant he was currently headingtowards had been destroyed just this morning by a mysterious figure clad in black, who hailed from the small,remote country of France, and who took every opportunity he could to destroy small meat-packing plants, due tothe fact that as a child, he was tormented, and frightened, and beaten savagely by a family of meat-packing plantswho lived next door, and scarred his little mind to the point where he became a twisted and sadistic creature,capable of anything, but specifically capable of destroying meat-packing plants, which he did, and did quite often,much to the chagrin of the people who worked there, such as Max, who was not feeling quite so much chagrin asmost others would feel at this point, because he was dead as a result of an individual named Sam, who worked ata competing meat-packing plant, which was no longer a competing plant, because the plant that it would becompeting against was, as has already been mentioned, destroyed in, as has not quite yet been mentioned, amassive, mushroom cloud of an explosion, resulting from a heretofore unmentioned horse manure bombmanufactured from manure harvested from the farm of one farmer J. P. Harvenkirk, and more specificallyharvested from a large, ungainly, incontinent horse named Seabiscuit, who really wasn't named Seabiscuit, butwas actually named Harold, and it completely baffled him why anyone, particularly the author of a very longsentence, would call him Seabiscuit; actually, it didn't baffle him, as he was just a stupid, manure-making horse,who was incapable of cognitive thought for a variety of reasons, one of which was that he was a horse, and theother of which was that he was just knocked unconscious by a flying chunk of a meat-packing plant, which hadbeen blown to pieces just a few moments ago by a shifty character from France.

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    38/108

    Partial RASP Parse

    (|One_MC1| |day_NNT1| |,_,| |Sam_NP1| |leave+ed_VVD| |his_APP$| |small_JJ| |,_,| |yellow_JJ| |home_NN1| |to_TO| |head_VV0| |towards_II| |the_AT| |meat-packing_JJ| |plant_NN1| |where_RRQ| |he_PPHS1| |work+ed_VVD| |,_,| |a_AT1| |task_NN1| |which_DDQ| |be+ed_VBDZ| |never_RR| |complete+ed_VVN| |,_,| |as_CSA||on_II| |his_APP$| |way_NN1| |,_,| |he_PPHS1| |trip+ed_VVD| |,_,| |fall+ed_VVD| |,_,| |and_CC| |go+ed_VVD| |careen+ing_VVG| |off_RP| |of_IO| |a_AT1| |cliff_NN1| |,_,||land+ing_VVG| |on_RP| |and_CC| |destroy+ing_VVG| |Max_NP1| |,_,| |who_PNQS| |,_,| |incidentally_RR| |,_,| |be+ed_VBDZ| |also_RR| |head+ing_VVG| |to_II||his_APP$| |job_NN1| |at_II| |the_AT| |meat-packing_JB| |plant_NN1| |,_,| |though_CS| |not+_XX| |the_AT| |same_DA| |plant_NN1| |at_II| |which_DDQ| |Sam_NP1||work+ed_VVD| |,_,| |which_DDQ| |he_PPHS1| |would_VM| |be_VB0| |head+ing_VVG| |to_II| |,_,| |if_CS| |he_PPHS1| |have+ed_VHD| |be+en_VBN| |aware_JJ||that_CST| |that_CST| |the_AT| |plant_NN1| |he_PPHS1| |be+ed_VBDZ| |currently_RR| |head+ing_VVG| |towards_II| |have+ed_VHD| |be+en_VBN| |destroy+ed_VVN||just_RR| |this_DD1| |morning_NNT1| |by_II| |a_AT1| |mysterious_JJ| |figure_NN1| |clothe+ed_VVN| |in_II| |black_JJ| |,_,| |who_PNQS| |hail+ed_VVD| |from_II| |the_AT||small_JJ| |,_,| |remote_JJ| |country_NN1| |of_IO| |France_NP1| |,_,| |and_CC| |who_PNQS| |take+ed_VVD| |every_AT1| |opportunity_NN1| |he_PPHS1| |could_VM||to_TO| |destroy_VV0| |small_JJ| |meat-packing_NN1| |plant+s_NN2| |,_,| |due_JJ| |to_II| |the_AT| |fact_NN1| |that_CST| |as_CSA| |a_AT1| |child_NN1| |,_,| |he_PPHS1||be+ed_VBDZ| |torment+ed_VVN| |,_,| |and_CC| |frighten+ed_VVD| |,_,| |and_CC| |beat+en_VVN| |savagely_RR| |by_II| |a_AT1| |family_NN1| |of_IO| |meat-packing_JJ||plant+s_NN2| |who_PNQS| |live+ed_VVD| |next_MD| |door_NN1| |,_,| |and_CC| |scar+ed_VVD| |his_APP$| |little_DD1| |mind_NN1| |to_II| |the_AT| |point_NNL1||where_RRQ| |he_PPHS1| |become+ed_VVD| |a_AT1| |twist+ed_VVN| |and_CC| |sadistic_JJ| |creature_NN1| |,_,| |capable_JJ| |of_IO| |anything_PN1| |,_,| |but_CCB||specifically_RR| |capable_JJ| |of_IO| |destroy+ing_VVG| |meat-packing_JJ| |plant+s_NN2| |,_,| |which_DDQ| |he_PPHS1| |do+ed_VDD| |,_,| |and_CC| |do+ed_VDD||quite_RG| |often_RR| |,_,| |much_DA1| |to_II| |the_AT| |chagrin_NN1| |of_IO| |the_AT| |people_NN| |who_PNQS| |work+ed_VVD| |there_RL| |,_,| |such_DA| |as_CSA||Max_NP1| |,_,| |who_PNQS| |be+ed_VBDZ| |not+_XX| |feel+ing_VVG| |quite_RG| |so_RG| |much_DA1| |chagrin_NN1| |as_CSA| |most_DAT| |other+s_NN2| |would_VM|

    |feel_VV0| |at_II| |this_DD1| |point_NNL1| |,_,| |because_CS| |he_PPHS1| |be+ed_VBDZ| |dead_JJ| |as_CSA| |a_AT1| |result_NN1| |of_IO| |an_AT1| |individual_NN1||name+ed_VVN| |Sam_NP1| |,_,| |who_PNQS| |work+ed_VVD| |at_II| |a_AT1| |compete+ing_VVG| |meat-packing_JJ| |plant_NN1| |,_,| |which_DDQ| |be+ed_VBDZ||no_AT| |longer_RRR| |a_AT1| |compete+ing_VVG| |plant_NN1| |,_,| |because_CS| |the_AT| |plant_NN1| |that_CST| |it_PPH1| |would_VM| |be_VB0| |compete+ing_VVG||against_II| |be+ed_VBDZ| |,_,| |as_CSA| |have+s_VHZ| |already_RR| |be+en_VBN| |mention+ed_VVN| |,_,| |destroy+ed_VVN| |in_RP| |,_,| |as_CSA| |have+s_VHZ||not+_XX| |quite_RG| |yet_RR| |be+en_VBN| |mention+ed_VVN| |,_,| |a_AT1| |massive_JJ| |,_,| |mushroom_NN1| |cloud_NN1| |of_IO| |an_AT1| |explosion_NN1| |,_,||result+ing_VVG| |from_II| |a_AT1| |heretofore_RR| |unmentioned_JJ| |horse_NN1| |manure_NN1| |bomb_NN1| |manufacture+ed_VVN| |from_II| |manure_NN1||harvest+ed_VVN| |from_II| |the_AT| |farm_NN1| |of_IO| |one_MC1| |farmer_NN1| J._NP1 P._NP1 |Harvenkirk_NP1| |,_,| |and_CC| |more_DAR| |specifically_RR||harvest+ed_VVN| |from_II| |a_AT1| |large_JJ| |,_,| |ungainly_JJ| |,_,| |incontinent_NN1| |horse_NN1| |name+ed_VVN| |Seabiscuit_NP1| |,_,| |who_PNQS| |really_RR||be+ed_VBDZ| |not+_XX| |name+ed_VVN| |Seabiscuit_NP1| |,_,| |but_CCB| |be+ed_VBDZ| |actually_RR| |name+ed_VVN| |Harold_NP1| |,_,| |and_CC| |it_PPH1||completely_RR| |baffle+ed_VVD| |he+_PPHO1| |why_RRQ| |anyone_PN1| |,_,| |particularly_RR| |the_AT| |author_NN1| |of_IO| |a_AT1| |very_RG| |long_JJ||sentence_NN1| |,_,| |would_VM| |call_VV0| |he+_PPHO1| |Seabiscuit_NP1| |;_;| |actually_RR| |,_,| |it_PPH1| |do+ed_VDD| |not+_XX| |baffle_VV0| |he+_PPHO1| |,_,||as_CSA| |he_PPHS1| |be+ed_VBDZ| |just_RR| |a_AT1| |stupid_JJ| |,_,| |manure-making_NN1| |horse_NN1| |,_,| |who_PNQS| |be+ed_VBDZ| |incapable_JJ| |of_IO||cognitive_JJ| |thought_NN1| |for_IF| |a_AT1| |variety_NN1| |of_IO| |reason+s_NN2| |,_,| |one_MC1| |of_IO| |which_DDQ| |be+ed_VBDZ| |that_CST| |he_PPHS1||be+ed_VBDZ| |a_AT1| |horse_NN1| |,_,| |and_CC| |the_AT| |other_JB| |of_IO| |which_DDQ| |be+ed_VBDZ| |that_CST| |he_PPHS1| |be+ed_VBDZ| |just_RR||knock+ed_VVN| |unconscious_JJ| |by_II| |a_AT1| |flying_NN1| |chunk_NN1| |of_IO| |a_AT1| |meat-packing_JJ| |plant_NN1| |,_,| |which_DDQ| |have+ed_VHD||be+en_VBN| |blow+en_VVN| |to_II| |piece+s_NN2| |just_RR| |a_AT1| |few_DA2| |moment+s_NNT2| |ago_RA| |by_II| |a_AT1| |shifty_JJ| |character_NN1| |from_II||France_NP1| ._.) -1 ; ()

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    39/108

    What do we learn?

    All parsers have problems dealing withlong sentences

    Complex language phenomena causethem to falter

    Good as starting points for structure

    detection But need output correction very often

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    40/108

    Needs of high accuracy parsing(difficult language phenomena)

    Systems

    LanguagePhenomena

    Link Parser Charniak

    Parser

    Stanford

    Parser

    Machinese

    Syntax

    MiniPar Collins

    Parser

    Our System

    Empty-PRO Detection No No No No Yes No Yes

    Empty-PROResolution

    No No No No Yes No Yes

    WH-TraceDetection

    Yes No No No Yes No Yes

    Relative Pronoun Resolution No No No No Yes No Yes

    PP Attachment

    Resolution

    No No Yes No No No Yes

    Clausal Attachment Resolution Yes No No No Yes No Yes

    Distinguishing Arguments

    from Adjuncts

    No No No No No No Yes

    Small Clause Detection No Yes Yes No Yes No Yes

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    41/108

    Context of our work: Universal

    Networking Language (UNL)

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    42/108

    A vehicle for machine translation

    Much more demanding than transfer approach or direct approach

    Interlingua(UNL)

    English

    French

    Hindi

    Chinese

    generation

    Analysis

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    43/108

    A United Nations project

    Started in 1996 10 year program

    15 research groups across continents

    First goal: generators

    Next goal: analysers (needs solving various ambiguityproblems)

    Current active groups: UNL-Spanish, UNL-Russian,UNL-French, UNL-Hindi

    IIT Bombay concentrating on UNL-Hindi and UNL-English

    Dave, Parikh and Bhattacharyya, Journal of Machine Translation, 2002

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    44/108

    UNL represents knowledge:John eats rice with a spoon

    Semantic relations

    attributes

    Universal words

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    45/108

    Sentence EmbeddingsMary claimed that she had composed a poem

    compose(icl>do)

    Mary(iof>person)

    claim(icl>do)

    she poem(icl>art)

    :01

    agt

    obj

    objagt

    @entry.@past

    @entry.@past@complete

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    46/108

    Relation repository

    Number 39 Groups:

    Agent-object-instrument: agt, obj, ins, met

    Time: tim, tmf, tmt Place:plc, plf, plt

    Restriction: mod, aoj

    Prepositions taking object: go, frm

    Ontological: icl, iof, equ

    Etc. etc.

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    47/108

    Semantically Relatable

    Sequences (SRS)Mohanty, Dutta and

    Bhattacharyya, Machine

    Translation Summit, 2005

    S ti ll R l t bl S

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    48/108

    Semantically Relatable Sequences(SRS)

    Defini t ion: A semantically relatablesequence (SRS)of a sentence is a groupof unordered words in the sentence (not

    necessarily consecutive) that appear in thesemantic graph of the sentence as linkednodes or nodes with speech act labels

    E l t ill t t SRS

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    49/108

    Example to illustrate SRS

    The man bought a

    new car in Junein: modifier

    a: indefinite

    the: definite

    man

    past tense

    agent

    bought

    object

    time

    car

    new

    June

    modifier

    SRS f th b ht

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    50/108

    SRSsfrom the man bought a newcar in June

    a. {man, bought}b. {bought, car}c. {bought, in, June}d. {new, car}e. {the, man}f. {a, car}

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    51/108

    Basic questions

    What are the SRSs of a given sentence? What semantic relations can link the words

    in an SRS?

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    52/108

    Postulate

    A sentence needs to be broken into sets ofat most three forms

    {CW, CW}

    {CW, FW, CW} {FW, CW}

    where CW refers to content word or a clauseand FW to function word

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    53/108

    Language Phenomena and

    SRS

    Clausal constructs

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    54/108

    Clausal constructs

    Sentences: The boy said that he was reading a

    novela. {the boy}

    b. {boy, said}c. {said, that, SCOPE}d. SCOPE:{he, reading}

    e. SCOPE:{reading, novel}

    f. SCOPE:{a, novel}

    g. SCOPE:{was, reading}

    scope: umbrella for clauses or compounds

    P iti Ph (PP)

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    55/108

    Preposition Phrase (PP)Attachment

    John published the article in June{John, published}: {CW,CW}

    {published, article}: {CW,CW}

    {published, in, June}: {CW,FW,CW}

    {the, article}: {FW,CW}

    Contrast with

    The article in June was published by John

    {The, article}: {FW,CW}{article, in, June}: {CW,FW,CW}

    {article, was, published}: {CW,CW}

    {published, by, John}: {CW,CW}

    T I fi iti l

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    56/108

    To-Infinitival

    PRO element co-indexed with the object him

    I forced Johni[PRO]ito throw a party

    PRO element co-indexed with the subject I

    Iipromised John [PRO]ito throw a party

    SRSs are {I, forced}: {CW,CW}

    {forced, John}: {CW,CW}

    {forced, SCOPE}: {CW,CW}

    SCOPE:{John, to, throw}: {CW,FW,CW} SCOPE:{throw, party}: {CW,CW}

    SCOPE:{a, party}: {FW,CW}

    replacedwith Iin the 2nd

    sentence

    go deeperthan surfacephenomena

    C l iti f th t

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    57/108

    Complexities of that

    Embedded clausal constructs as opposed torelative clauses need to be resolved Mary claimed that she had composed a poem

    The poem that Mary composed was beautiful

    Dangling that I told the child that I know that he played well

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    58/108

    Two possibilities

    told

    I the child

    that hePlayed

    well

    that Iknow

    told

    I the child that I knowthat that he

    Playedwell

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    59/108

    SRS Implementation

    Syntactic constituents to Semantic

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    60/108

    Syntactic constituents to Semanticconstituents

    Used a probabilistic parser (Charniak,04)

    Output of Charniack parser: tags give

    indications of CW and FW NP, VP, ADJP and ADVP

    CW

    PP (prepositional phrase), IN(preposition) and DT (determiner)FW

    Observation:

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    61/108

    Observation:Headwords of sibling nodes form SRSs

    John hasbought

    a car.

    SRS:

    {has, bought},{a, car},

    {bought, car}a

    (C) VP bought

    (F) AUX has (C) VP bought

    (C) VBD bought (C) NPcar

    (F) DT a (C) NN car

    bought

    car

    has

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    62/108

    Work needed on the parse

    tree

    Correction of wrong PP attachment

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    63/108

    Correction of wrong PP attachment

    John has publishedan article onlinguistics

    Use PP attachmentheuristics

    Get

    {article, on, linguistics}

    on linguistics

    (C)VP published

    (F) PP on(C)VBD published (C)NP article

    published

    (F)DT an

    an

    (C)NNarticle

    (F)IN on

    article

    (C)NNSlinguistics

    (C)NPlinguistics

    To Infinitival

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    64/108

    To-Infinitival

    Clause boundary is the VP

    node, labeled with SCOPE.

    Tag is modified to TO, a FW

    tag, indicating that it heads

    a to-infinitival clause.

    The duplication and insertion

    of the NP node with head h im

    (depicted by shaded nodes) as

    a sibling of the VBD node with

    head forcedis done to bring out

    the existence of a semantic

    relation between fo rceand h im.

    (C)VPwatch

    (C)VBD forced (C)NP him(C) S SCOPE

    (F)TOtoto

    (C)VP forced

    to

    forced

    (C)VP

    (C)PRP him

    him

    (C)NPhim

    him

    (C)PRP him

    Linking of clauses:

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    65/108

    Linking of clauses:John said that he was reading a novel

    Head of S nodemarked as ScopeSRS:

    {said, that, SCOPE}

    Adverbial clauseshave similar parsetree structures

    except that thesubordinatingconjunctions aredifferent from that

    (C)VBD said (F) SBAR that

    (C) VP said

    (F) IN that(C) S SCOPE

    said that

    Implementation

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    66/108

    Implementation Block Diagram of the system

    Parse Tree

    Charniak Parser

    Scope Handler

    Attachment Resolver

    WordNet 2.0

    Sub-categorization Database

    Input Sentence

    Parse Tree modification and

    augmentation with head and scope

    information

    Augmented

    Parse Tree

    Semantically Related Sets

    Noun classification

    Semantically Relatable Sets

    Generator

    THAT clause as Subcat property

    Preposition as Subcat property

    Time and Place

    features

    Evaluation

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    67/108

    Evaluation

    Used the Penn Treebank (LDC, 1995)asthe test bed

    The un-annotated sentences, actually fromthe WSJ corpus (Charniak et. al.1987),

    were passed through the SRS generator Results were compared with the

    Treebanks annotated sentences

    Results on SRS generation

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    68/108

    Results on SRS generation

    0 20 40 60 80 100

    Total SRSs

    (FW,CW)

    (CW,FW,CW)

    (CW,CW)

    Parametersmatched

    Recall/Precision

    Recall

    Precision

    Results on sentence constructs

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    69/108

    Results on sentence constructs

    0 20 40 60 80 100

    To-infinitival clause resolution

    Complement-clause resolution

    Clause linkings

    PP Resolution

    Parameter

    Recall/Precision

    Recall

    Precision

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    70/108

    SRS to UNL

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    71/108

    Features of the system

    High accuracy resolution of different kinds of attachment Precise and fine grained semantic relations between

    sentence constituents

    Empty-pronominal detection and resolution

    Exhaustive knowledge bases of sub-categorizationframes, verb knowledge bases and rule templates forestablishing semantic relations and speech act likeattributes using Oxford Advanced Learners Dictionary (Hornby, 2001)

    VerbNet (Schuler, 2005)

    WordNet 2.1 (Miller, 2005)

    Penn Tree Bank (LDC, 1995) and

    XTAG lexicon (XTAG, 2001)

    Side effect: high accuracy parsing

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    72/108

    Side effect: high accuracy parsing(comparison with other parsers)

    Systems

    LanguagePhenomena

    Link Parser Charniak

    Parser

    Stanford

    Parser

    Machinese

    Syntax

    MiniPar Collins

    Parser

    Our System

    Empty-PRO Detection No No No No Yes No Yes

    Empty-PROResolution

    No No No No Yes No Yes

    WH-Trace

    Detection

    Yes No No No Yes No Yes

    Relative Pronoun Resolution No No No No Yes No Yes

    PP Attachment

    Resolution

    No No Yes No No No Yes

    Clausal Attachment Resolution Yes No No No Yes No Yes

    Distinguishing Arguments

    from Adjuncts

    No No No No No No Yes

    Small Clause Detection No Yes Yes No Yes No Yes

    Rules for generating

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    73/108

    CW1 FW CW2 REL(UW1,UW2)

    Syntactic

    Feature

    Semantic

    Feature

    Syntactic

    Feature

    Semantic Feature Syntactic

    Feature

    Semantic

    Feature

    SynCat POS SemCat Lex SynCat POS SemCat Lex SynCat POS SemCat Lex Rel UW1 UW2- - V020 - - - - into N - - - gol 1 3

    V - - - - - - within N - TIME - dur 1 3

    Rules for generating

    Semantic Relations

    e.g., turn water into steam

    e.g., finish within a week

    R l f ti tt ib t

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    74/108

    Rules for generating attributes

    String of FWs CW UNL attribute list

    generated

    has_been VBG @present

    @complete

    @progress

    has_been VBN @present

    @complete

    @passive

    S t hit t

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    75/108

    System architecture

    E l ti h

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    76/108

    Evaluation: scheme

    E l ti l

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    77/108

    Evaluation: example

    Input: He worded the statement carefully.[unlGenerated:76]

    agt(word.@entry, he)

    obj(word.@entry, statement.@def)

    man(word.@entry, carefully)[\unl]

    [unlGold:76]

    agt(word.@entry.@past, he)obj(word.@entry.@past, statement.@def)

    man(word.@entry.@past, carefully)

    [\unl]

    F1-Score = 0.945

    Not heavily

    punishedsince attributes

    are not crucial

    to the meaning!!

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    78/108

    Approach 2: switch to rule

    based parsing: LFGlink

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    79/108

    Using Functional Structure from an LFGParser

    Sentence

    John eats a pastry

    SUBJ (eat, John)OBJ (eat, pastry)VTYPE (eat, main)

    agt (eat, Ram)obj (eat, mango)

    Functional Structure

    (Transfer Facts)UNL

    Lexical Functional Grammar

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    80/108

    Lexical Functional Grammar

    Considers two aspects

    Lexical: considers lexical structures and relations Functional: considers grammatical functions of different

    constituents, like SUBJECT, OBJECT

    Two structures: C-structure(Constituent-structure) F-structure(Functional-structure)

    Languages vary in C-structure (word order, phrasalstructure) but have the same functional structure

    (SUBJECT, OBJECT, etc.)

    LFG St t l

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    81/108

    LFG Structuresexample

    Sentence: He gave her a kiss.

    C-structure F-structure

    XLE P

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    82/108

    XLE Parser

    Developed by Xerox Corporation

    Gives C-structures, F-structures andmorphology of the sentence constituents

    Supports packed rewriting systemconverting F-structure to transfer facts,used by our system

    Works on Solaris, Linux and MacOSX

    N ti f T f F t

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    83/108

    Notion of Transfer Facts

    Serialized representation of the Functional structure

    Particularly useful for transfer-based MT systems

    We use it as the starting point for UNL generation

    Example transfer facts

    Transfer Facts - Example

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    84/108

    Transfer Facts - Example

    Sentence:

    The boy ate the apples hastily.

    Transfer facts (selected):ADJUNCT,eat:2,hastily:6

    ADV-TYPE,hastily:6,vpadv

    DET,apple:5,the:4

    DET,boy:1,the:0

    DET-TYPE,the:0,def

    DET-TYPE,the:4,def

    NUM,apple:5,pl

    NUM,boy:1,sg

    OBJ,eat:2,apple:5

    PASSIVE,eat:2,-

    PERF,eat:2,-_PROG,eat:2,-_

    SUBJ,eat:2,boy:1

    TENSE,eat:2,past

    VTYPE,eat:2,main

    _SUBCAT-FRAME,eat:2,V-SUBJ-OBJ

    Workflow in detail

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    85/108

    Ph 1 S t t t f f t

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    86/108

    Phase 1: Sentence to transfer facts

    Input: SentenceThe boy ate the apples hastily.

    Output: Transfer facts (selected are shown here)ADJUNCT,eat:2,hastily:6

    ADV-TYPE,hastily:6,vpadv

    DET,apple:5,the:4

    DET,boy:1,the:0

    DET-TYPE,the:0,def

    DET-TYPE,the:4,def

    NUM,apple:5,pl

    NUM,boy:1,sg

    OBJ,eat:2,apple:5PASSIVE,eat:2,-

    PERF,eat:2,-_

    PROG,eat:2,-_

    SUBJ,eat:2,boy:1

    TENSE,eat:2,past

    VTYPE,eat:2,main

    _SUBCAT-FRAME,eat:2,V-SUBJ-OBJ

    Phase 2: Transfer facts to word entry

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    87/108

    Phase 2: Transfer facts to word entrycollection

    Input: transfer facts as in the previous example Output: word entry collection

    Word entry eat:2, lex item: eat

    (PERF:-_ PASSIVE:- _SUBCAT-FRAME:V-SUBJ-OBJ VTYPE:mainSUBJ:boy:1 OBJ:apple:5 ADJUNCT:hastily:6 CLAUSE-TYPE:declTENSE:past PROG:-_ MOOD:indicative )

    Word entry boy:1, lex item: boy

    (CASE:nom _LEX-SOURCE:countnoun-lex COMMON:count DET:the:0NSYN:common PERS:3 NUM:sg )

    Word entry apple:5, lex item: apple

    (CASE:obl _LEX-SOURCE:morphology COMMON:count DET:the:4NSYN:common PERS:3 NUM:pl )

    Word entry hastily:6, lex item: hastily

    (DEGREE:positive _LEX-SOURCE:morphology ADV-TYPE:vpadv )

    Word entry the:0, lex item: the

    (DET-TYPE:def )

    Word entry the:4, lex item: the

    (DET-TYPE:def )

    Phase 3 (1): UW and Attribute

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    88/108

    ase 3 ( ) U a d tt butegeneration

    Input: word entry collection Output: Universal Words with (some) attributes generated

    In our example:

    UW (eat:2.@entry.@past) UW (hastily:6)

    UW (boy:1) UW (the:0)

    UW (apple:5.@pl) UW (the:4)

    Example transfer facts and their mapping to UNL attributes

    Digression: Subcat Frames, Arguments and

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    89/108

    Digression: Subcat Frames, Arguments andAdjuncts

    Subcat frames and arguments A predicatesubcategorizes for its arguments, or

    arguments aregoverned by the predicate.

    Example: predicate eat subcategorizes for a

    SUBJECTargument and an OBJECTargument.The correspondingsubcat frameis

    V-SUBJ-OBJ.

    Arguments are mandatory for a predicate.

    Adjuncts Give additional information about the predicate

    Not mandatory

    Example: hastily in The boy ate the apples hastily.

    Ph 3(1) H dli f S b t F

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    90/108

    Phase 3(1): Handling of Subcat Frames

    Input: Word entry collection

    Mapping of subcat frames to transfer facts

    Mapping of transfer facts to relations orattributes

    Output: relations and / or attributes

    Example: for our sentence, agt(eat,boy),obj(eat,apple)relations are generated inthis phase.

    R l b f S b t h dli l (1)

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    91/108

    Rule bases for Subcat handlingexamples (1)

    Mapping Subcat frames to transfer facts

    R l b f S b t h dli l (2)

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    92/108

    Rule bases for Subcat handlingexamples (2)

    Mapping Subcat frames, transfer facts to relations / attributesSome simplified rules

    Phase 3(2): Handling of adjuncts

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    93/108

    Phase 3(2): Handling of adjuncts

    Input: Word entry collection

    List of transfer facts to be considered for adjuncthandling

    Rules for relation generation based on transfer factsand word properties

    Output: relations and / or attributes

    Example: for our sentence, man(eat,hastily)relation; and @defattributes forboy, appleare generated in this phase.

    Rule bases for adjunct handling

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    94/108

    j gexamples (1)

    Mapping adjunct transfer facts to relations / attributesSome simplified rules

    Rule bases for adjunct handling examples (2)

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    95/108

    Rule bases for adjunct handlingexamples (2)

    Mapping adjuncts to relations / attributes based on prepositions- some example rules

    Final UNL Expression

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    96/108

    Final UNL Expression

    Sentence:The boy ate the apples hastily.

    UNL Expression:

    [unl:1]

    agt(eat:2.@entry.@past,boy:1.@def)

    man(eat:2.@entry.@past,hastily:6)

    obj(eat:2.@entry.@past,apple:5.@pl.@def)[\unl]

    Design of Relation Generation Rules

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    97/108

    gan example

    Subject

    Verb Type Verb Type

    agt aoj aoj aoj aoj obj

    ANIMATE INANIMATE

    dobe

    occur dobe

    occur

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    98/108

    Summary of Resources

    Mohanty and Bhatacharyya, LREC 2008

    Lexical Resources

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    99/108

    Verb

    Knowledgebase

    Semantic

    Argument

    Frame

    Syntactic and Semantic

    Argument mapping

    Verb

    Senses

    Functional

    Elements with

    Grammatical

    attributes

    Lexical

    Knowledgebase

    with

    Semantic

    attributes

    N V AdvA

    UNL Expression Generation

    Auxiliary verbs Determiners

    Tense-Aspect morphemes

    Syntactic Argument

    database

    PPs as

    syntactic

    arguments

    N V Adv

    Clause as

    syntactic

    arguments

    A

    SRS Generation

    Use of a number of lexical data

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    100/108

    We have created these resources over a long periodof time from

    Oxford Advanced Learners Dictionary (OALD) (Hornby, 2001)

    VerbNet (Schuler, 2005)

    Princeton WordNet 2.1 (Miller, 2005)

    LCS database (Dorr, 1993)

    Penn Tree Bank (LDC, 1995), and

    XTAG lexicon (XTAG Research Group, 2001)

    Use of a number of lexical data

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    101/108

    Verb Knowledge Base (VKB) Structure

    VKB statistics

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    102/108

    VKB statistics

    4115 unique verbs 22000 rows (different senses)

    189verb groups

    Verb categorization in UNL and its relationship totraditional verb categorization

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    103/108

    traditional verb categorization

    Traditional(sytactic)

    UNL(semantic)

    Transitive(has direct

    object)

    Intransitive

    Do(action)

    Ram pulls therope

    Ram goes home

    (ergative

    languages)

    Be(state)

    Ram knows

    mathematics Ram sleeps

    Occur(event)

    Ram forgotmathematics

    Earth cracks

    Unergative(syntactic subject=semantic agent)

    Unaccusative(syntactic subjectsemantic agent)

    Accuracy on various phenomena

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    104/108

    and corpora

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    105/108

    Applications

    MT and IR

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    106/108

    MT and IR

    Smriti Singh, Mrugank Dalal, Vishal Vachani,Pushpak Bhattacharyya and Om Damani, HindiGeneration from Interlingua, Machine TranslationSummit (MTS 07), Copenhagen, September, 2007.

    Sanjeet Khaitan, Kamaljeet Verma and Pushpak

    Bhattacharyya, Exploi t ing Semant ic Proxim i ty forInformation Retrieval, IJCAI 2007, Workshop onCross Lingual Information Access, Hyderabad, India,Jan, 2007.

    Kamaljeet Verma and Pushpak Bhattacharyya,Context-Sens i t ive Semant ic Smoothing us ingSemant ical ly Relatable Sequences, submitted

    Conclusions and future work

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    107/108

    Conclusions and future work

    Presented two approaches to UNLgeneration

    Demonstrated the need for Resources

    Working on handling difficult languagephenomena

    WSD for correct UW word

    URLs

  • 7/27/2019 english-unl-analysis-hyderabad-10jun08

    108/108

    URLs

    For resourceswww.cfilt.iitb.ac.in

    For publications

    www.cse.iitb.ac.in/~pb

    http://www.cfilt.iitb.ac.in/http://www.cse.iitb.ac.in/~pbhttp://www.cse.iitb.ac.in/~pbhttp://www.cfilt.iitb.ac.in/

Recommended