+ All Categories
Home > Documents > 1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer...

1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer...

Date post: 02-Jan-2016
Category:
Upload: harold-mckenzie
View: 217 times
Download: 1 times
Share this document with a friend
Popular Tags:
20
1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens College and the Graduate Center City University of New York November 5, 2012
Transcript
Page 1: 1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.

1

Automating Slot Filling Validation to

Assist Human Assessment Suzanne Tamang and Heng Ji

Computer Science Department and Linguistics Department, Queens College and the Graduate Center

City University of New York

November 5, 2012

Page 2: 1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.

Overview KBP SF validation task Two-step validation

Logistic regression based reranking Predicted confidence adjustment and filtering

Validation features Shallow, Contextual, Emergent (voting)

System combination Perfect setting Limiting conditions Evaluation results Opportunities

Page 3: 1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.

SF Validation Task Standard answer format

id, slot, run, docid, filler, start and end offset for filler, start and end offset for justification, confidence

Richmond Flowers, per:title, SFV_10_1, APW_ENG_20070810.1457.LDC2009T13, Attorney General, 336, 351,321,44,1.0

Validation goal Use post-processing methods to label 1 or -1 Step one:

Combine runs, and rerank using a probabilistic classifier Identify a threshold for filtering best candidates

Step two: Automatically assess system quality When available, use deeper contextual information Adjust confidence values to dampen noisy system contribution

Page 4: 1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.

4/23

FeaturesFeature Description Value Type

document type provided by document collection as news wire, broadcast news, web log category shallow

*number of tokens count of white spaces (+1) between contiguous character string integer shallow

*acronym identify and concatenate first letter of each token binary shallow *url structural rules to determine if a valid url binary shallow named entity type label with gazetteer category shallow city, *state, *country, *title, ethnicity, religion appears in specific slot-related gazetteer binary shallow *alphanumeric indicate if numbers and letters appear binary shallow

date structural rules to determine if an acceptable date format binary shallow

capitalized first character of token(s) caps binary shallow same if query and fill strings match binary shallow

keywords used primarily for spouse and residence slots binary context dependency parse length from query to answer integer context

** system votes proportion of systems with answer agreement 0-1 emergent

** answer votes proportion of answers with answer agreement 0-1 emergent

* statistically significant predictor in select models** statistically significant predictor in most all models

Page 5: 1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.

5/23

Two Phased Validation Approach

Step 1: Classification Training with 2011KBP SF Data Using features extracted from the 2011 KBP results:

Model selection using stepwise procedure and AIC Threshold tuning on predicted confidence estimates

Step 2: Adjustment and filtering Automatic assessment of system quality Adjustment of predicted confidence using quality/DP Contextual analysis with answer provenance offsets

Features – answer, system and group level Shallow, Contextual, Emergent

Page 6: 1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.

6

Attribute Distribution in Automatic Slot Filling

Page 7: 1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.

7

PER Attribute Distribution

Page 8: 1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.

8

ORG Attribute Distribution

Page 9: 1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.

SF Performance: Training and Testing

Page 10: 1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.

10

Performance, Mean Confidence & Set Size

27 distinct runs; variable F1, size, confidence, &

offset use.

Page 11: 1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.

11/23

Results: Slot Filling Validation

Page 12: 1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.

12/23

Pre-Post Validation Results:

R P F1LDC 0.72 0.77 0.75

w/o validation 0.71 0.03 0.06validation P1 0.12 0.07 0.09validation P2 0.35 0.08 0.13

Page 13: 1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.

Reranking multi-systems Ideal case

Diversity of systems Comparable performance Rich information

Reliable answer context System approach / intermediate system results

KBP SF Task Twenty-seven runs, limited intermediate results, unkown

strategies, and variable performance Inconsistencies paired with `rigid’ framework

Provenance: unavailable, unreliable (off a little and a lot) Confidence may or may not be available

Page 14: 1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.

What have we learn that translates to more efficient assessment?

Confidence, provenance, approximating system quality, and flexibility

14

Page 15: 1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.

Challenges and Solutions Labor intensive

Training, quality control, tedious and unfulfilling 22% of total answers were redundant 1% gain on recall over systems

Validation Inconsistencies in reporting (provenance / confidence) Lack of intermediate output

Confidence

Uniform weighting Automatic assessment quality: inconsistency, confidence

distributions

R P F TP

LDC 0.72 0.77 0.75 1119

Systems 0.71 0.03 0.06 1081

Answer Key ? 1 ? 1543

Page 16: 1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.

16/23

Naïve Estimation of System Quality

Page 17: 1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.

Confidence of High and Low Performers

Shallow/emergent features reduce noise at the expense of better systems

Page 18: 1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.

18/23

Confidence-based Reranking

Confidence is and important factors to a validator informative at the >90 threshold paired with quality estimates, cull more valid answers

Page 19: 1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.

Summary Evaluation of a two-phase SF Validation approach

for KBP 2012 Improves overall F1 before (0.06) /after (0.13) Helps low performers at the expense of better systems Key observations

Shallow features contribute to establishing a baseline Voting features did not generalize, and susceptible to system noise Contextual features are helpful (P1 to P2 gains)

Opportunities Incorporating confidence as a classifier feature or filtering More flexible frameworks for using provenance information Improved methods for naively estimating low and high

performers in the multi-system setting

Page 20: 1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.

Thank you


Recommended