A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica...

A Multilanguage Non-Projective A Multilanguage Non-Projective Dependency ParserDependency Parser

Giuseppe AttardiGiuseppe Attardi

Dipartimento di InformaticaDipartimento di Informatica

Università di PisaUniversità di Pisa

Università di Pisa

Language and IntelligenceLanguage and Intelligence

““Understanding cannot be measured by Understanding cannot be measured by external behavior; it is an internal metric external behavior; it is an internal metric of how the brain remembers things and of how the brain remembers things and uses its memories to make predictions”.uses its memories to make predictions”.

““The difference between the intelligence of The difference between the intelligence of humans and other mammals is that we humans and other mammals is that we have language”.have language”.

Jeff Hawkins, “On Intelligence”, 2004

Hawkins’ Memory-Prediction Hawkins’ Memory-Prediction frameworkframework

The brain uses vast amounts of The brain uses vast amounts of memory to create a model of the memory to create a model of the world. Everything you know and world. Everything you know and have learned is stored in this model. have learned is stored in this model. The brain uses this memory-based The brain uses this memory-based model to make continuous model to make continuous predictions of future events. It is the predictions of future events. It is the ability to make predictions about the ability to make predictions about the future that is the crux of intelligence. future that is the crux of intelligence.

More …More …

““Spoken and written words are just patterns Spoken and written words are just patterns in the world…in the world…

The syntax and semantics of language are The syntax and semantics of language are not different from the hierarchical not different from the hierarchical structure of everyday objects.structure of everyday objects.

We associate spoken words with our We associate spoken words with our memory of their physical and semantic memory of their physical and semantic counterparts.counterparts.

Through language one human can invoke Through language one human can invoke memories and create next justapositions memories and create next justapositions of mental objects in another human.”of mental objects in another human.”

ConclusionConclusion

Ability to process language should Ability to process language should be essential in many computer be essential in many computer applicationsapplications

Why NLP is not needed in IR?Why NLP is not needed in IR?

Document retrieval as primary measure of Document retrieval as primary measure of information retrieval successinformation retrieval success

Document retrieval reduces the need for Document retrieval reduces the need for NLP techniquesNLP techniques– Discourse factors can be ignored– Query words perform word-sense

disambiguation

Lack of robustness:Lack of robustness:– NLP techniques are typically not as robust as

word indexing

Question AnsweringQuestion Answering

Question Answering from Open-Domain TextQuestion Answering from Open-Domain Text

Search Engines return list of Search Engines return list of (possibly) (possibly) relevantrelevant documents documents

Users still to have to dig through Users still to have to dig through returned list to find answerreturned list to find answer

QA: give the user a (short) QA: give the user a (short) answer to their question, perhaps answer to their question, perhaps supported by evidence supported by evidence

The Google answer #1The Google answer #1

Include question words (why, who, Include question words (why, who, etc.) in stop-listetc.) in stop-list

Do standard IRDo standard IRSometimes this (sort of) works:Sometimes this (sort of) works:

– Question: Who was the prime minister of Australia during the Great Depression?

– Answer: James Scullin (Labor) 1929–31

Page about Curtin (WW II Labor Prime Minister)(Can deduce answer)

Page about Curtin (WW II Labor Prime Minister)

(Lacks answer)

Page about Chifley(Labor Prime Minister)(Can deduce answer)

But often it doesn’t…But often it doesn’t…

Question: Question: How much money did IBM How much money did IBM spend on advertising in 2002?spend on advertising in 2002?

Answer: Answer: I dunno, but I’d like to … I dunno, but I’d like to …

The Google answer #2The Google answer #2

Take the question and try to find Take the question and try to find it as a string on the webit as a string on the web

Return the next sentence on that Return the next sentence on that web page as the answerweb page as the answer

Works brilliantly if this exact Works brilliantly if this exact question appears as a FAQ question appears as a FAQ question, etc.question, etc.

Works lousily most of the timeWorks lousily most of the timeBut, wait …But, wait …

AskJeeves AskJeeves

AskJeeves was the most hyped example of AskJeeves was the most hyped example of “Question answering”“Question answering”– Have basically given up now: just web search except

when there are factoid answers of the sort MSN also does

It largely did pattern matching to match your It largely did pattern matching to match your question to their own knowledge base of question to their own knowledge base of questionsquestions

If that works, you get the human-curated answers If that works, you get the human-curated answers to that known questionto that known question

If that fails, it falls back to regular web searchIf that fails, it falls back to regular web search A potentially interesting middle ground, but a A potentially interesting middle ground, but a

fairly weak shadow of real QAfairly weak shadow of real QA

Question Answering at TRECQuestion Answering at TREC

Consists of answering a set of 500 fact-Consists of answering a set of 500 fact-based questions, e.g. based questions, e.g. “When was Mozart “When was Mozart bornborn?”?”

Systems were allowed to return 5 ranked Systems were allowed to return 5 ranked answer snippets to each question.answer snippets to each question.– IR think– Mean Reciprocal Rank (MRR) scoring:

• 1, 0.5, 0.33, 0.25, 0.2, 0 for 1, 2, 3, 4, 5, 6+ doc

– Mainly Named Entity answers (person, place, date, …)

From 2002 systems are only allowed to From 2002 systems are only allowed to return a single return a single exactexact answer answer

TREC 2000 Results (long)TREC 2000 Results (long)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

SMU

Queens

Wat

erlo

oIB

MLIM

SINTT IC

Pisa

MRR

FalconFalcon

The Falcon system from SMU was by The Falcon system from SMU was by far best performing system at TREC far best performing system at TREC 20002000

It used NLP and performed deep It used NLP and performed deep semantic processingsemantic processing

Question parseQuestion parse

Who was the first Russian astronaut to walk in space

WP VBD DT JJ NNP NP TO VB IN NN

NP NP

PP

VP

S

VP

S

Question semantic formQuestion semantic form

astronaut

walk space

Russianfirst

PERSON

first(x) astronaut(x) Russian(x) space(z) walk(y, z, x) PERSON(x)

Question logic form:Question logic form:

Answer type

TREC 2001: no NLPTREC 2001: no NLP

Best system from Insight Software Best system from Insight Software using surface patternsusing surface patterns

AskMSR uses a Web Mining AskMSR uses a Web Mining approach, by retrieving suggestions approach, by retrieving suggestions from Web searchesfrom Web searches

Insight Sofware: Surface patterns approachInsight Sofware: Surface patterns approach

Best at TREC 2001: 0.68 MRRBest at TREC 2001: 0.68 MRR Use of Characteristic PhrasesUse of Characteristic Phrases ““When was <person> born”When was <person> born”

– Typical answers• “Mozart was born in 1756.”• “Gandhi (1869-1948)...”

– Suggests phrases (regular expressions) like• “<NAME> was born in <BIRTHDATE>”• “<NAME> ( <BIRTHDATE>-”

– Use of Regular Expressions can help locate correct answer

AskMSR: Web MiningAskMSR: Web Mining

1 2

3

45

Step 1: Rewrite queriesStep 1: Rewrite queries

Intuition: The user’s question is Intuition: The user’s question is often syntactically quite close to often syntactically quite close to sentences that contain the answersentences that contain the answer– Where is the Louvre Museum located?– The Louvre Museum is located in Paris

– Who created the character of Scrooge?– Charles Dickens created the character of

Scrooge.

Query rewritingQuery rewriting

Classify question into seven categoriesClassify question into seven categories– Who is/was/are/were…?– When is/did/will/are/were …?– Where is/are/were …?

a. Category-specific transformation rulesa. Category-specific transformation ruleseg “For Where questions, move ‘is’ to all possible eg “For Where questions, move ‘is’ to all possible locations”locations”

““Where Where isis the Louvre Museum located” the Louvre Museum located” ““isis the Louvre Museum located” the Louvre Museum located” ““the the isis Louvre Museum located” Louvre Museum located” ““the Louvre the Louvre isis Museum located” Museum located” ““the Louvre Museum the Louvre Museum isis located” located” ““the Louvre Museum located the Louvre Museum located isis””

b. Expected answer “Datatype” (eg, Date, Person, Location, …)b. Expected answer “Datatype” (eg, Date, Person, Location, …)WhenWhen was the French Revolution? was the French Revolution? DATE DATE

Hand-crafted classification/rewrite/datatype rulesHand-crafted classification/rewrite/datatype rules

Nonsense,but whocares? It’sonly a fewmore queriesto Google.

Step 2: Query search engineStep 2: Query search engine

Send all rewrites to a Web search Send all rewrites to a Web search engineengine

Retrieve top N answersRetrieve top N answersFor speed, rely just on search For speed, rely just on search

engine’s “snippets”, not the full text engine’s “snippets”, not the full text of the actual documentof the actual document

Nevertheless …Nevertheless …

NLP Technologies are usedNLP Technologies are used

Question Analysis:Question Analysis:– identify the semantic type of the

expected answer implicit in the queryNamed-Entity Detection:Named-Entity Detection:

– determine the semantic type of proper nouns and numeric amounts in text

Parsing in QAParsing in QA

Top systems in TREC 2005 perform Top systems in TREC 2005 perform parsing of queries and answer parsing of queries and answer paragraphsparagraphs

Some use specially built parserSome use specially built parserParsers are slow: ~ 1min/sentenceParsers are slow: ~ 1min/sentence

Parsing TechnologyParsing Technology

Constituent ParsingConstituent Parsing

Requires Phrase Structure GrammarRequires Phrase Structure Grammar– CFG, PCFG, Unification Grammar

Produces phrase structure parse treeProduces phrase structure parse tree

Rolls-Royce Inc. said it expects its sales to remain steady

ADJP

VPNP

S

VP

S

NP

VP

NP

VP

Statistical Methods in NLPStatistical Methods in NLP

Some NLP problems:– Information extraction

• Named entities, Relationships between entities, etc.

– Finding linguistic structure• Part-of-speech tagging, Chunking, Parsing

Can be cast as learning mapping:– Strings to hidden state sequences

• NE extraction, POS tagging

– Strings to strings• Machine translation

– Strings to trees• Parsing

– Strings to relational data structures• Information extraction

TechniquesTechniques

– Log-linear (Maximum Entropy) taggers– Probabilistic context-free grammars

(PCFGs)– Discriminative methods:

• Conditional MRFs, Perceptron, Kernel methods

Learning mappingLearning mapping

Strings to hidden state sequences– NE extraction, POS tagging

Strings to strings– Machine translation

Strings to trees– Parsing

Strings to relational data structures– Information extraction

POS as TaggingPOS as Tagging

INPUT:

Profits soared at Boeing Co., easily topping forecasts on Wall Street.

OUTPUT:

Profits/N soared/V at/P Boeing/N Co./N ,/, easily/ADV topping/V forecasts/N on/P Wall/N Street/N ./.

NE as TaggingNE as Tagging

INPUT:

Profits soared at Boeing Co., easily topping forecasts on Wall Street.

OUTPUT:

Profits/O soared/O at/O Boeing/BC Co./IC ,/O easily/O topping/O forecasts/O on/NA Wall/BL Street/IL ./O

Statistical ParsersStatistical Parsers

Probabilistic Generative Model of Probabilistic Generative Model of Language which include parse Language which include parse structure (e.g. Collins 1997)structure (e.g. Collins 1997)– Learning consists in estimating the

parameters of the model with simple likelihood based techniques

Conditional parsing models Conditional parsing models (Charniak 2000; McDonald 2005)(Charniak 2000; McDonald 2005)

ResultsResults

Method Accuracy

PCFGs (Charniak 97) 73.0%

Conditional Models – Decision Trees (Magerman 95) 84.2%

Lexical Dependencies (Collins 96) 85.5%

Conditional Models – Logistic (Ratnaparkhi 97) 86.9%

Generative Lexicalized Model (Charniak 97) 86.7%

Generative Lexicalized Model (Collins 97) 88.2%

Logistic-inspired Model (Charniak 99) 89.6%

Boosting (Collins 2000) 89.8%

Linear Models for Parsing and TaggingLinear Models for Parsing and Tagging

Three components:GEN is a function from a string to a set of

candidates

maps a candidate to a feature vector

W is a parameter vector

Component 1: GENComponent 1: GEN

GEN enumerates a set of candidates for a sentence

She announced a program to promote safety in trucks and vans

GEN

Examples of GENExamples of GEN

A context-free grammarA finite-state machineTop N most probable analyses from

a probabilistic grammar

Component 2: Component 2:

maps a candidate to a feature vector Rd

defines the representation of a candidate

<1, 0, 2, 0, 0, 15, 5>

FeatureFeature

A “feature” is a function on a structure, e.g.,

h(x) = Number of times is seen in x

Feature vector:Feature vector:

A set of functions h1…hd define a feature vector

(x) = <h1(x), h2(x) … hd(x)>

A

B C

Component 3: Component 3: WW

W is a parameter vector Rd

. W map a candidate to a real-valued score

Putting it all togetherPutting it all together

X is set of sentences, Y is set of possible outputs (e.g. trees)

Need to learn a function : X → Y GEN, , W define

Choose the highest scoring tree as the most plausible structure

WyxFxGENy

)(argmax)()(

Constituent ParsingConstituent Parsing

Requires GrammarRequires Grammar– CFG, PCFG, Unification Grammar

Produces phrase structure parse treeProduces phrase structure parse tree


ADJP

VPNP

S

VP

S

NP

VP

NP

VP

Dependency TreeDependency Tree

Word-word dependency relationsWord-word dependency relationsFar easier to understand and to Far easier to understand and to

annotateannotate


Inductive Dependency ParserInductive Dependency Parser

Traditional statistical parsers are Traditional statistical parsers are trained directly on the trained directly on the task of task of tagging a sentencetagging a sentence

Instead an Inductive Parser is trained Instead an Inductive Parser is trained and and learns the sequence of parse learns the sequence of parse actionsactions required to build the parse required to build the parse treetree

Grammar Not RequiredGrammar Not Required

A traditional parser requires a A traditional parser requires a grammar for generating candidate grammar for generating candidate treestrees

An inductive parser needs no An inductive parser needs no grammargrammar

Parsing as ClassificationParsing as Classification

Inductive dependency parsingInductive dependency parsingParsing based on Shift/Reduce Parsing based on Shift/Reduce

actionsactionsLearn from annotated corpus which Learn from annotated corpus which

action to perform at each stepaction to perform at each step

Parser ActionsParser ActionsRight Ho

VER:auxvisto

VER:pperunaDET

ragazzaNOM

conPRE

gliDET

occhialiNOM

.POS

nexttop

Shift

Left

Dependency GraphDependency Graph

Let Let RR = { = {rr11, … , , … , rrmm}} be the set of permissible be the set of permissible dependency typesdependency types

A dependency graph for a string of wordsA dependency graph for a string of words

WW = = ww11 … … wwnn is a labeled directed graph is a labeled directed graph

D = (W, A)D = (W, A), where, where(a) (a) WW is the set of nodes, i.e. word tokens in is the set of nodes, i.e. word tokens in

the input string,the input string,

(b) (b) AA is a set of labeled arcs is a set of labeled arcs ((wwii, , rr, , wwjj),),wwii, , wwjj WW, , rr RR,,

(c) (c) wwjj WW, there is at most one arc, there is at most one arc((wwii, , rr, , wwjj) ) AA..

Parser StateParser State

The parser state is a quadrupleThe parser state is a quadrupleSS, , II, , TT, , AA, where, whereS is a stack of partially processed tokensI is a list of (remaining) input tokensT is a stack of temporary tokensA is the arc relation for the dependency

graph

(w, r, h) A represents an arc w → h, tagged with dependency r

Parser ActionsParser Actions

ShiftShiftSS, , nn||II, , TT, , AA

nn||SS, , II, , TT, , AA

RightRightss||SS, , nn||II, , TT, , AA

SS, , nn||II, , TT, , AA{({(ss, , rr, , nn)})}

LeftLeftss||SS, , nn||II, , TT, , AA

SS, , ss||II, , TT, , AA{({(nn, , rr, , ss)})}

Parser AlgorithmParser Algorithm

The parsing algorithm is fully The parsing algorithm is fully deterministic and works as follows:deterministic and works as follows:Input Sentence: (w1, p1), (w2, p2), … ,

(wn, pn) S = <> T = <(w1, p1), (w2, p2), … , (wn, pn)> L = <> while T != <> do beginx = getContext(S, T, L);y = estimateAction(model, x);performAction(y, S, T, L); end

Learning PhaseLearning Phase

Learning FeaturesLearning Features

feature Value

W word

L lemma

P part of speech (POS) tag

M morphology: e.g. singular/plural

W< word of the leftmost child node

L< lemma of the leftmost child node

P< POS tag of the leftmost child node, if present

M< whether the rightmost child node is singular/plural

W> word of the rightmost child node

L> lemma of the rightmost child node

P> POS tag of the rightmost child node, if present

M> whether the rightmost child node is singular/plural

Learning EventLearning Event

leggiNOM

leDET

antiADV

chePRO

,PON

SerbiaNOM

eranoVER

discusseADJ

chePRO

SostenevaVER

context

left context target nodes right context

(-3, W, che), (-3, P, PRO),(-2, W, leggi), (-2, P, NOM), (-2, M, P), (-2, W<, le), (-2, P<, DET), (-2, M<, P),(-1, W, anti), (-1, P, ADV),(0, W, Serbia), (0, P, NOM), (0, M, S),(+1, W, che), ( +1, P, PRO), (+1, W>, erano), (+1, P>, VER), (+1, M>, P),(+2, W, ,), (+2, P, PON)

Parser ArchitectureParser Architecture

Modular learners architecture:Modular learners architecture:– MaxEntropy, MBL, SVM, Winnow,

PerceptronFeatures can be selectedFeatures can be selected

Feature used in ExperimentsFeature used in Experiments

LemmaFeatures -2 -1 0 1 2 3LemmaFeatures -2 -1 0 1 2 3PosFeatures -2 -1 0 1 2 3PosFeatures -2 -1 0 1 2 3MorphoFeatures -1 0 1 2MorphoFeatures -1 0 1 2DepFeatures -1 0DepFeatures -1 0PosLeftChildren 2PosLeftChildren 2PosLeftChild -1 0PosLeftChild -1 0DepLeftChild -1 0DepLeftChild -1 0PosRightChildren 2PosRightChildren 2PosRightChild -1 0PosRightChild -1 0DepRightChild -1DepRightChild -1PastActions 1PastActions 1

ProjectivityProjectivity

An arc An arc wwii→→wwkk is projective iff is projective iff

jj, , ii < < jj < < kk or or i i > > jj > > kk,,wwii →*→* wwkk

A dependency tree is projective iff A dependency tree is projective iff every arc is projectiveevery arc is projective

Intuitively: arcs can be drawn on a Intuitively: arcs can be drawn on a plane without intersectionsplane without intersections

Non ProjectiveNon Projective

Většinu těchto přístrojů lze take používat nejen jako fax , ale

Actions for non-projective arcsActions for non-projective arcs

Right2Right2ss11||ss22||SS, , nn||II, , TT, , AA

ss11||SS, , nn||II, , TT, , AA{({(ss22, , rr, , nn)})}

Left2Left2ss11||ss22||SS, , nn||II, , TT, , AA

ss22||SS, , ss11||II, , TT, , AA{({(nn, , rr, , ss22)})}

Right3Right3ss11||ss22||ss33||SS, , nn||II, , TT, , AA

ss11||ss22||SS, , nn||II, , TT, , AA{({(ss33, , rr, , nn)})}

Left3Left3ss11||ss22||ss33||SS, , nn||II, , TT, , AA

ss22||ss33||SS, , ss11||II, , TT, , AA{({(nn, , rr, , ss33)})}

ExtractExtractss11||ss22||SS, , nn||II, , TT, , AA

nn||ss11||SS, , II, , ss22||TT, , AA

InsertInsertSS, , II, , ss11||TT, , AA

ss11||SS, , II, , TT, , AA

ExampleExample

Right2Right2 ( (nejennejen → → aleale) and ) and Left3Left3 ( (faxfax → → VětšinuVětšinu) )

Většinu těchto přístrojů lze take používat nejen jako fax , ale

ExamplesExamples

zou gemaakt moeten worden in

zou moeten worden gemaakt in

Extract followed by Insert

ExperimentsExperiments

three classifiers: one to decide three classifiers: one to decide between Shift/Reduce, one to between Shift/Reduce, one to decide which Reduce action and a decide which Reduce action and a third one to chose the dependency third one to chose the dependency in case of Left/Right actionin case of Left/Right action

two classifiers: one to decide which two classifiers: one to decide which action to perform and a second one action to perform and a second one to chose the dependency in case of to chose the dependency in case of Left/Right actionLeft/Right action

CoNLL-X Shared TaskCoNLL-X Shared Task

To assign labeled dependency structures To assign labeled dependency structures for a range of languages by means of a for a range of languages by means of a fully automatic dependency parserfully automatic dependency parser

Input: tokenized and tagged sentencesInput: tokenized and tagged sentences Tags: token, lemma, POS, morpho Tags: token, lemma, POS, morpho

features, ref. to head, dependency labelfeatures, ref. to head, dependency label For each token, the parser must output its For each token, the parser must output its

head and the corresponding dependency head and the corresponding dependency relationrelation

CoNLL-X: Data FormatCoNLL-X: Data Format

NN WORDWORD LEMMALEMMA CPOSCPOS POSPOS FEATSFEATS HEADHEAD DEPREL PHEAD PDEPRELDEPREL PHEAD PDEPREL

11 AA oo artart artart <artd>|F|S<artd>|F|S 22 >N>N __ __22 direcçãodirecção direcçãodirecção nn nn F|SF|S 44 SUBJSUBJ __ __33 jájá jájá advadv advadv __ 44 ADVLADVL __ __44 mostroumostrou mostrarmostrar vv v-finv-fin PS|3S|INDPS|3S|IND 00 STASTA __ __55 boa_vontade boa_vontadeboa_vontade boa_vontade nn nn F|SF|S 44 ACCACC __ __66 ,, ,, puncpunc puncpunc __ 44 PUNCPUNC __ __77 masmas masmas conjconj conj-cconj-c <co-vfin>|<co-fmc><co-vfin>|<co-fmc> 44 COCO __ __88 aa oo artart artart <artd>|F|S<artd>|F|S 99 >N>N __ __99 grevegreve grevegreve nn nn F|SF|S 1010 SUBJSUBJ __ __1010 prossegueprossegue prosseguirprosseguir vv v-finv-fin PR|3S|INDPR|3S|IND 44 CJTCJT __ __1111 emem emem prpprp prpprp __ 1010 ADVLADVL __ __1212 todas_astodas_as todo_otodo_o pronpron pron-detpron-det <quant>|F|P<quant>|F|P 1313 >N>N __ __1313 delegaçõesdelegações delegaçõodelegaçõo nn nn F|PF|P 1111 P<P< __ __1414 dede dede prpprp prpprp <sam-><sam-> 1313 N<N< __ __1515 oo oo artart artart <-sam>|<artd>|M|S<-sam>|<artd>|M|S 1616 >N>N __ __1616 paíspaís paíspaís nn nn M|SM|S 1414 P<P< __ __1717 .. .. puncpunc puncpunc __ 44 PUNCPUNC __ __

NN WORDWORD LEMMALEMMA CPOSCPOS POSPOS FEATSFEATS HEADHEAD DEPREL PHEAD PDEPRELDEPREL PHEAD PDEPREL

11 AA oo artart artart <artd>|F|S<artd>|F|S 22 >N>N __ __22 direcçãodirecção direcçãodirecção nn nn F|SF|S 44 SUBJSUBJ __ __33 jájá jájá advadv advadv __ 44 ADVLADVL __ __44 mostroumostrou mostrarmostrar vv v-finv-fin PS|3S|INDPS|3S|IND 00 STASTA __ __55 boa_vontade boa_vontadeboa_vontade boa_vontade nn nn F|SF|S 44 ACCACC __ __66 ,, ,, puncpunc puncpunc __ 44 PUNCPUNC __ __77 masmas masmas conjconj conj-cconj-c <co-vfin>|<co-fmc><co-vfin>|<co-fmc> 44 COCO __ __88 aa oo artart artart <artd>|F|S<artd>|F|S 99 >N>N __ __99 grevegreve grevegreve nn nn F|SF|S 1010 SUBJSUBJ __ __1010 prossegueprossegue prosseguirprosseguir vv v-finv-fin PR|3S|INDPR|3S|IND 44 CJTCJT __ __1111 emem emem prpprp prpprp __ 1010 ADVLADVL __ __1212 todas_astodas_as todo_otodo_o pronpron pron-detpron-det <quant>|F|P<quant>|F|P 1313 >N>N __ __1313 delegaçõesdelegações delegaçõodelegaçõo nn nn F|PF|P 1111 P<P< __ __1414 dede dede prpprp prpprp <sam-><sam-> 1313 N<N< __ __1515 oo oo artart artart <-sam>|<artd>|M|S<-sam>|<artd>|M|S 1616 >N>N __ __1616 paíspaís paíspaís nn nn M|SM|S 1414 P<P< __ __1717 .. .. puncpunc puncpunc __ 44 PUNCPUNC __ __

CoNLL-X: LanguagesCoNLL-X: Languages

The same parser should handle all The same parser should handle all languageslanguages

13 languages:13 languages:– Arabic, Bulgaria, Chinese, Czech,

Danish, Dutch, Japanese, German, Portuguese, Slovene, Spanish, Swedish, Turkish

CoNLL-X: CollectionsCoNLL-X: Collections

Ar Cn Cz Dk Du De Jp Pt Sl Sp Se Tr Bu

K tokens 54 337 1,249 94 195 700 151 207 29 89 191 58 190

K sents 1.5 57.0 72.7 5.2 13.3 39.2 17.0 9.1 1.5 3.3 11.0 5.0 12.8

Tokens/sentence 37.2 5.9 17.2 18.2 14.6 17.8 8.9 22.8 18.7 27.0 17.3 11.5 14.8

CPOSTAG 14 22 12 10 13 52 20 15 11 15 37 14 11

POSTAG 19 303 63 24 302 52 77 21 28 38 37 30 53

FEATS 19 0 61 47 81 0 4 146 51 33 0 82 50

DEPREL 27 82 78 52 26 46 7 55 25 21 56 25 18

% non-project. relations

0.4 0.0 1.9 1.0 5.4 2.3 1.1 1.3 1.9 0.1 1.0 1.5 0.4

% non-project. sentences

11.2 0.0 23.2 15.6 36.4 27.8 5.3 18.9 22.2 1.7 9.8 11.6 5.4

CoNLL: Evaluation MetricsCoNLL: Evaluation Metrics

Labeled Attachment Score (LAS)Labeled Attachment Score (LAS)– proportion of “scoring” tokens that are

assigned both the correct head and the correct dependency relation label

Unlabeled Attachment Score (UAS)Unlabeled Attachment Score (UAS)– proportion of “scoring” tokens that are

assigned the correct head

CoNLL-X Shared Task ResultsCoNLL-X Shared Task Results

Language

Maximum Entropy MBL

LAS%

UAS%

Trainsec

Parsesec

LAS%

UAS%

Trainsec

Parsesec

Arabic 56.43 70.96 181 2.6 59.70 74.69 24 950

Bulgarian 81.15 86.71 452 1.5 79.17 85.92 88 353

Chinese 81.19 86.10 1,156 1.8 72.17 83.08 540 478

Czech 62.10 73.44 13,800 12.8 69.20 80.22 496 13,500

Danish 75.25 80.96 386 3.2 76.13 83.65 52 627

Dutch 67.79 72.71 679 3.3 68.97 74.73 132 923

Japanese 84.17 87.15 129 0.8 83.39 86.73 44 97

German 75.88 80.25 9,315 4.3 79.79 84.31 1,399 3,756

Portuguese 79.40 87.58 1,044 4.9 80.97 87.74 160 670

Slovene 61.97 73.18 98 3.0 62.67 76.60 16 547

Spanish 72.35 76.06 204 2.4 74.37 79.70 54 769

Swedish 75.20 83.03 1,424 2.9 74.85 83.73 96 1,177

Turkish 49.27 65.29 177 2.3 47.58 65.25 43 727

CoNLL-X: Overall ResultsCoNLL-X: Overall Results

LAS UAS

Average Ours Average OursArabic 59.94 59.70 73.48 74.69Bulgarian 79.98 81.15 85.89 86.71Chinese 78.32 81.19 84.85 86.10Czech 67.17 69.20 77.01 80.22Danish 78.31 76.13 84.52 83.65Dutch 70.73 68.97 75.07 74.73Japanese 85.86 84.17 89.05 87.15German 78.58 79.79 82.60 84.31Portuguese 80.63 80.97 86.46 87.74Slovene 65.16 62.67 76.53 76.60Spanish 73.52 74.37 77.76 79.70Swedish 76.44 74.85 84.21 83.73Turkish 55.95 49.27 69.35 65.29

Average scores from 36 participant submissions

Average scores from 36 participant submissions

Well-formed Parse TreeWell-formed Parse Tree

A graph D = (W, A) is well-formed iff it is acyclic, projective and connected

Multiple HeadsMultiple Heads

Examples include:– verb coordination in which the subject

or object is an argument of several verbs

– relative clauses in which words must satisfy dependencies both inside and outside the clause

ExamplesExamples

Il governo garantirà sussidi a coloro che cercheranno lavoro

He designs and develops programs

SolutionSolution

Il governo garantirà sussidi a coloro che cercheranno lavoro

He designs and develops programs

N<PRED

SUBJSUBJ ACCACC

Italian TreebankItalian Treebank

Using SI-TAL collection from CNR Using SI-TAL collection from CNR ILCILC

Annotations split into separate Annotations split into separate morpho & functional filesmorpho & functional files

Not all tokens have relations, some Not all tokens have relations, some have more than one, no accents, …have more than one, no accents, …

Implemented some heuristics to Implemented some heuristics to generate an corpus in CoNLL formatgenerate an corpus in CoNLL format

Tool for visualization and annotationTool for visualization and annotation

DgAnnotatorDgAnnotator

A GUI tool for:A GUI tool for:– Annotating texts with dependency relations– Visualizing and comparing trees– Generating corpora in XML or CoNLL format– Exporting DG trees to PNG

DemoDemo Available at: Available at: http://http://

medialab.di.unipi.it/Project/QA/Parser/DgAmedialab.di.unipi.it/Project/QA/Parser/DgAnnotatornnotator//

Future DirectionsFuture Directions

Opinion ExtractionOpinion Extraction– Finding opinions (positive/negative)– Blog track in TREC2006

Intent AnalysisIntent Analysis– Determine author intent, such as:

problem (description, solution), agreement (assent, dissent), preference (likes, dislikes), statement (claim, denial)

ReferencesReferences

G. Attardi. 2006. Experiments with a G. Attardi. 2006. Experiments with a Multilanguage Non-projective Dependency Multilanguage Non-projective Dependency Parser. In Proc. CoNLL-X.Parser. In Proc. CoNLL-X.

H. Yamada, Y. Matsumoto. 2003. Statistical H. Yamada, Y. Matsumoto. 2003. Statistical Dependency Analysis with Support Vector Dependency Analysis with Support Vector Machines. In Machines. In Proc. Proc. IWPT.IWPT.

M. T. Kromann. 2001. Optimality parsing and local cost functions in discontinuous grammars. In Proc. FG-MOL.

Date post:	12-Jan-2016
Category:	Documents
Upload:	david-carter
View:	214 times
Download:	1 times

A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica...

Documents