+ All Categories
Home > Documents > Overview of the KBP 2013 Slot Filler Validation Track

Overview of the KBP 2013 Slot Filler Validation Track

Date post: 31-Jan-2016
Category:
Upload: storm
View: 55 times
Download: 0 times
Share this document with a friend
Description:
Overview of the KBP 2013 Slot Filler Validation Track. Hoa Trang Dang National Institute of Standards and Technology. Slot Filler Validation (SFV). Track Goals Allow teams without a full slot-filling system to participate, focus on answer validation rather than document retrieval - PowerPoint PPT Presentation
Popular Tags:
14
Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology
Transcript
Page 1: Overview of the KBP 2013 Slot Filler Validation Track

Overview of the KBP 2013Slot Filler Validation Track

Hoa Trang DangNational Institute of Standards and Technology

Page 2: Overview of the KBP 2013 Slot Filler Validation Track

Slot Filler Validation (SFV)

• Track Goals▫ Allow teams without a full slot-filling system to participate, focus

on answer validation rather than document retrieval▫ Evaluate the contribution of RTE systems on KBP slot-filling▫ Allow teams to experiment with system voting and global

• SFV input:▫ Candidate slot filler▫ Possibly additional information about candidate slot fillers

• SFV output:▫ Binary classification (Correct / Incorrect) of each candidate slot

filler• Can only improve precision, not recall of full slot-filling systems • Evaluation metrics depends on SFV use case and availability of

additional information about candidate fillers• TAC RTE KBP Validation task (2011)• TAC KBP Slot Filler Validation task (2012)

Page 3: Overview of the KBP 2013 Slot Filler Validation Track

TAC RTE KBP Validation task (2011)

1 RTE evaluation pair, where:• T is the entire document

supporting the slot filler• H is a set of synonymous

sentences, representing different realizations of the slot filler

Each slot filler returned by SF systems

Page 4: Overview of the KBP 2013 Slot Filler Validation Track

Use Case 1: SFV as Textual Entailment (2011)•SFV input:

▫ All regular English slot filling input (slot definitions, queries, source documents)

▫ Individual candidate slot fillers (filler, provenance)•Local Approach:

▫ Generic textual entailment: H is relation implied by candidate slot filler (e.g., “Barack Obama has lived in Chicago”), T is provenance (entire document, or smaller regions defined by justification offsets)

▫ Tailored textual entailment: train on different slot types; could be a validation module for a full slot filling system.

•Evaluation:▫ F score on entire pool of candidate slot fillers (unique slot filler,

provenance)▫ Baseline: All T’s classified as entailing the corresponding H:

P=R=percentage of entailing pairs in the pooled SF responses▫ Weak baseline, easily beat by all SFV systems; not a direct measure of

utility of SFV to SF

Page 5: Overview of the KBP 2013 Slot Filler Validation Track

Use Case 2: SFV impact on single SF systems

•SFV input:▫ All regular English slot filling input (slot definitions, queries,

source documents)▫ Individual candidate slot fillers (filler, provenance, confidence)

Broken out into individual slot filling runs•Global Approach:

▫ System Voting, leveraging features across multiple SF runs•Evaluation:

▫ Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run

Page 6: Overview of the KBP 2013 Slot Filler Validation Track

Slot Filler Validation (SFV) 2012

• SFV input:▫ All regular English slot filling input (slot definitions, queries,

source documents)▫ Individual candidate slot fillers (filler, provenance, confidence)

Broken out into individual slot filling runs▫ System profile for each SF run▫ Preliminary assessment of 10% of KBP 2013 Slot Filling

queries• SFV output:

▫ Binary classification (Correct / Incorrect) of each candidate slot filler

• Evaluation:• Filter out “Incorrect” slot fillers from each run, and score according

to regular English SF; compare to score for original run

Page 7: Overview of the KBP 2013 Slot Filler Validation Track

Slot Filler Validation (SFV) 2012

• SFV input:▫ All regular English slot filling input (slot definitions, queries,

source documents)▫ Individual candidate slot fillers (filler, provenance, confidence)

Broken out into individual slot filling runs▫ System profile for each SF run▫ Preliminary assessment of 10% of KBP 2013 Slot Filling queries

• SFV output:▫ Binary classification (Correct / Incorrect) of each candidate slot

filler• Evaluation:

• Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run

• One SFV submission, decreased F1 of almost all SF runs except poorest performing SF runs.

Page 8: Overview of the KBP 2013 Slot Filler Validation Track

Slot Filler Validation (SFV) 2013

• SFV input:▫ All regular English slot filling input (slot definitions, queries,

source documents)▫ Individual candidate slot fillers (filler, provenance, confidence)

Broken out into individual slot filling runs

• SFV output:▫ Binary classification (Correct / Incorrect) of each candidate

slot filler• Evaluation:

• Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run

Page 9: Overview of the KBP 2013 Slot Filler Validation Track

Slot Filler Validation (SFV) 2013

• SFV input:▫ All regular English slot filling input (slot definitions, queries,

source documents)▫ Individual candidate slot fillers (filler, provenance, confidence)

Broken out into individual slot filling runs▫ System profile for each SF run▫ Preliminary assessment of 10% of KBP 2013 Slot Filling

queries• SFV output:

▫ Binary classification (Correct / Incorrect) of each candidate slot filler

• Evaluation:• Filter out “Incorrect” slot fillers from each run, and score according

to regular English SF; compare to score for original run• Score only on the 90% of KBP 2013 slot filling queries that didn’t

have preliminary assessments released as part of SFV input

Page 10: Overview of the KBP 2013 Slot Filler Validation Track

SF System Profile• SF Team ranks in KBP 2009-2012• Did the system extract fillers from the KBP 2013 source corpus?• Do the Confidence Values have meaning?• Is the Confidence Value a probability?• Tools or methods for:

▫ Query expansion▫ Document retrieval▫ Sentence retrieval▫ NER nominal tagging▫ Coreference resolution▫ Third-party relation/event extraction▫ Dependency/Constituent parsing▫ POS tagging▫ Chunking▫ Main slot filling algorithm▫ Learning algorithm▫ Ensemble model▫ External resources

Page 11: Overview of the KBP 2013 Slot Filler Validation Track

Slot Filler Validation Teams and Approaches

• BIT: Beijing Institute of Technology [local]▫ Generic RTE approach based on word overlap, cosine similarity, and

token edit distance• Stanford: Stanford University [local]

▫ Based on Stanford’s full slot-filling system, especially component for checking consistency and validity of candidate fillers

• UI_CCG: University of Illinois at Urbana-Champaign [local]▫ Tailored RTE approach; check candidate for slot-specific constraints

• jhuapl: Johns Hopkins University Applied Physics Laboratory [weak global]▫ Consider only the confidence value associated with each candidate

filler and aggregate confidence values across systems.• RPI_BLENDER: Rensselaer Polytechnic Institute [strong global]

▫ Based on RPI_BLENDER full slot-filling system (like Stanford), but also leveraged full set of SFV input (including SF system profile and preliminary assessments) to rank systems and apply tier-specific filtering.

Page 12: Overview of the KBP 2013 Slot Filler Validation Track
Page 13: Overview of the KBP 2013 Slot Filler Validation Track

Impact of RPI_BLENDER2 SFV on SF Runs SF Run F1 of original SF run F1 after applying SFV filterlsv1 0.371212 0.012212lsv5 0.368462 0.025411lsv3 0.367438 0.029463ARPANI1 0.364683 -0.01695lsv4 0.363441 0.041238RPI_BLENDER3 0.336694 0.025749RPI_BLENDER1 0.333909 0.027718lsv2 0.333333 0.008259RPI_BLENDER5 0.332866 0.017108PRIS20133 0.327384 0.021544NYU1 0.253842 -0.00105UWashington1 0.184026 -0.011544UWashington2 0.156271 -0.004999UWashington3 0.140677 -0.013133SAFT_KRes3 0.134615 -0.004458CMUML3 0.098274 -0.002241TALP_UPC3 0.036237 -0.007019

Top 10 SF runs

Negatively impacted SF runs

Page 14: Overview of the KBP 2013 Slot Filler Validation Track

Conclusion

• Leveraging global features boosts scores of individual SF runs…. If done discriminately▫ Don’t treat all slot filling systems the same

• Even weak global features (e.g. raw confidence values) may help in some cases

• Caveat: other evaluation metrics also valid depending on use case.▫ RTE KBP validation (2011) metric may be appropriate if goal is to

make assessment more efficient


Recommended