Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from...

Progress update

Lin Ziheng

2

System overview

Components – Connective classifier

• Features from Pitler and Nenkova (2009):– Connective: because– Self category: IN– Parent category: SBAR– Left sibling category: none– Right sibling category: S– Right sibling contains a VP: yes

3

Components – Connective classifier

• New features– Conn POS– Prev word + conn: even though, particularly since– Prev word POS– Prev word POS + conn POS– Conn + Next word– Next word POS– Conn POS + Next word POS– All lemmatized verbs in the sentence containing conn

4

5

Components – Argument labeler

6

Argument labeler – Argument position classifier

• Relative positions of Arg1– Arg1 and Arg2 in the same sentence: SS (60.9%)– Arg1 in the immediately previous sentence: IPS (30.1%)– Arg1 in some non-adjacent previous sentence: NAPS (9.0%)– Arg1 in some following sentence: FS (0%, only 8 instances)

• FS ignored

Argument labeler – Argument position classifier

• Features:– Connective string– Conn POS– Conn position in the sentence: first, second, third, third last, second

last, or last– Prev word– Prev word POS– Prev word + conn– Prev word POS + conn POS– Second prev word– Second prev word POS– Second prev word + conn– Second prev word POS + conn POS

7

8

Argument labeler – Argument extractor

• SS cases: handcrafted a set of syntactically motivated rules to extract Arg1 and Arg2

9


• An example:

10


• IPS cases: label the sentence containing the connective as Arg2 and the immediately previous sentence as Arg1

• NAPS cases: – Arg1 locates in the second previous sentence in

45.8% of the NAPS cases– Use the majority decision and assume Arg1 is

always in the second previous sentence

11

Components – Explicit classifier

• Prasad et al. (2008) reported human agreements of 94% on Level 1 classes and 84% on Level 2 types

• A baseline using only connectives as features gives 95.7% and 86% on Sec. 23– Difficult to improve acc. on testing section

• 3 types of features:– Connective string– Conn POS– Conn + prev word

12

Components – Non-explicit classifier

• Non-explicit: Implicit, AltLex, EntRel, NoRel– 11 Level 2 types for Implicit/AltLex, plus EntRel and

NoRel 13 types• 4 feature sets from Lin et al. (2009)– Contextual features– Constituent parse features– Dependency parse features– Word-pair features

• 3 features to capture AltLex: Arg2_word1, Arg2_word2, Arg2_word3

13

Components – Attribution span labeler

• Two steps: split the text into clauses, and decide which clauses are attribution spans

• Rule-based clause splitter: – first split a sentence into clauses by punctuations – for each clause, we further split it if one of the

following production links if found: VPSBAR, SSINV, SS, SINVS, SSBAR, VPS

14

Components – Attribution span labeler

• Attr span classifier features: (curr, prev and next clauses)– Unigrams of curr– Lowercased and lemmatized vers in curr– The first and last terms of curr– The last term of prev– The first term of next– The last term of prev + the first term of curr– The last term of curr + the first term of next– The position of curr in the sentence– Punctuations rules extracted from curr

15

Evaluation

• Train: 02-21, dev: 22, test: 23• Each component is tested – without and with error propagation (EP) from

previous component– with gold standard (GS) parse trees and sentence

boundaries, and with automatic (Auto) parser and sentence splitter

16

Evaluation – Connective classifier

• GS: increased acc and F1 by 2.05% and 3.05%• Auto: increased acc and F1 by 1.71% and

2.54%• Contextual info is helpful

17

Evaluation – Argument position classifier

• Able to accurately label SS• But performs badly on the NAPS class– Due to the similarity between IPS and NAPS

classes

18

Evaluation – Argument extractor

• Human agreements on partial and exact matches: 94.5% and 90.2%

• Exact F1 much lower than partial F1– Due to small portions of text deleted

19

Evaluation – Explicit classifier

• Baseline: using only connective strings– 86%

• GS + no EP F1 increased by 0.44%

20

Evaluation – Non-explicit classifier

• Majority baseline: all classified as EntRel• Adding EP degrades F1 by ~13%, but still

outperforms baseline by ~6%

21

Evaluation – Attribution span labeler

• When EP added: the decrease of F1 is largely due to the drop in precision

• When Auto added: the decrease of F1 is largely due the drop in recall

22

Evaluation – The whole pipeline

• Definition: a relation is correct if its relation type is classified correctly, and both Arg1 and Arg2 are partially or exactly matched

• GS + EP– Partial: 46.38% F1– Exact: 31.72% F1

23

On-going changes

• Joint learning• Change rule-based argument extractor to a

machine learning approach

Date post:	14-Dec-2015
Category:	Documents
Upload:	winston-neal
View:	216 times
Download:	0 times

Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from...

Documents