Date post: | 15-Dec-2015 |
Category: |
Documents |
Upload: | reanna-brill |
View: | 216 times |
Download: | 0 times |
Using Query Patterns to Learn the Durations of Events
Andrey Gusev
joint work withNate Chambers, Pranav Khaitan, Divye Khilnani, Steven Bethard, Dan Jurafsky
Examples of Event Durations
• Talk to a friend – minutes• Driving – hours• Study for an exam – days• Travel – weeks• Run a campaign – months• Build a museum – years
Why are we interested in durations?
• Event Understanding• Duration is an important aspectual property
• Can help build timelines and events
• Event coreference• Duration may be a cue that events are coreferent
• Gender (learned from the web) helps nominal coreference
• Integration into search products• Query: “healthy sleep time for age groups”
• Query: “president term length in [country x]”
Dataset (Pan et al., 2006)
• Labeled 58 documents from TimeBank with event durations• Average of minimum and maximum labeled durations
• A Brooklyn woman who was watching her clothes dry in a laundromat.• Min duration – 5 min• Max Duration – 1 hour• Average – 1950 seconds
Original Features (Pan et al., 2006)
• Event Properties• Event token, lemma, POS tag
• Subject and Object• Head word of syntactic subject and objects of the event,
along with their lemmas and POS tags.
• Hypernyms• WordNet hypernyms for the event, its subject and its object.• Starting from the first synset of each lemma, three
hyperhyms were extracted from the WordNet hierarchy.
New Features
• Event Attributes• Tense, aspect, modality, event class
• Named Entity Class of Subjects and Objects• Person, organization, locations, or other.
• Typed Dependencies• Binary feature for each typed dependency
• Reporting Verbs• Binary feature for reporting verbs (say, report, reply, etc.)
Limitations of the Supervised Approach
Need explicitly annotated datasets
• Sparse and limited data
• Limited to the annotated domain
• Low inter-annotator agreement• More than a Day and Less Than a Day– 87.7%
• Duration Buckets – 44.4%
• Approximate Duration Buckets– 79.8%
Overcoming Supervised Limitations
Statistical Web Count approach
• Lots of text/data that can be used
• Not limited to the annotated domain
• Implicit annotations from many sources
• Hearst(1998), Ji and Lin (2009)
Terms - Durations Buckets and Distributions
• “talked for * seconds”• “talked for * minutes”• “talked for * hours”• “talked for * days”• “talked for * weeks”• “talked for * months”• “talked for * years”
Duration Bucket
Distribution
- 1638 hits
- 61816 hits
- 68370 hits
- 4361 hits
- 3754 hits
- 5157 hits
- 103336 hits
Two Duration Prediction Tasks
• Coarse grained prediction• “Less than a day” or “Longer than a day”
• Fine grained prediction• Second, minute, hour, etc.
Yesterday Pattern for Coarse Grained Task
• <eventpast> yesterday
• <eventpastp> yesterday
• eventpast = past tense
• eventpastp= past progressive tense
• Normalize yesterday event pattern counts with counts of event occurrence in general
• Average the two ratios • Find threshold on the training set
Example: “to say” with Yesterday Pattern• “said yesterday” – 14,390,865 hits
• “said” – 1,693,080,248 hits
• “was saying yesterday” – 29,626 hits• “was saying” – 14,167,103 hits
• Average Ratio = 0.0053€
Ratiopastp =29,626
14,167,103= 0.0021
€
Ratiopast =14,390,865
1,693,080,248= 0.0085
Threshold for Yesterday Pattern
0.00
05
0.00
1
0.00
15
0.00
2
0.00
25
0.00
3
0.00
35
0.00
4
0.00
45
0.00
5
0.00
550.650.660.670.680.690.700.710.720.730.740.75
Ratio
Ac
cu
rac
y
t = 0.002
Fine Grained Durations from Web Counts
• How long does the event “X” last?
• Ask the web:• “X for * seconds”• “X for * minutes”• …
• Output distribution over time units
Said
Not All Time Units are Equal
• Need to look at the base distribution• “for * seconds”• “for * minutes”• …
• In habituals, etc. people like to say “for years”
Conditional Frequencies for Buckets
• Divide• “X for * seconds”
• By• “for * seconds”
• Reduce credit for seeing “X for years”
Said
Double Peak Distribution
• Two interpretations• Durative• Iterative
• Distributions show that with two peaks
S M H D W M Y D0.0
0.1
0.2
0.3
0.4
0.5to smile to run
Merging Patterns
• Multiple patterns
• Distributions averaged
• Reduces noise from individual patterns
• Pattern needs to have greater than 100 and less 100,000 hits
Said
Fine Grained Patterns
• Used Patterns• <eventpast> for * <bucket>
• <eventpastp> for * <bucket>
• spent * <bucket> <eventger>
• Patterns not used• <eventpast> in * <bucket>
• takes * <bucket> to <event>
• <eventpast> last <bucket>
Evaluation• TimeBank annotations (Pan, Mulkar and Hobbs 2006)
• Coarse Task: Greater or less than a day• Fine Task: Time units (seconds, minutes, hours, …, years)
• Counted as correct if within 1 time unit
• Baseline: Majority Class
• Fine Grained – months
• Coarse Grained – greater than a day
• Compare with re-implementation of supervised (Pan, Mulkar and Hobbs 2006)
New Split for TimeBank Dataset
• Train – 1664 events (714 unique verbs)
• Test – 471 events (274 unique verbs)
• TestWSJ – 147 events (84 unique verbs)
• Split info is available at • http://cs.stanford.edu/~agusev/durations/
Web Counts System Scoring
• Fine grained• Smooth over the adjacent buckets and select top bucket
score(bi) = bi-1 + bi + bi+1
• Coarse grained• “Yesterday” classifier with a threshold (t = 0.002)• Use fine grained approach
• Select coarse grained bucket based on fine grained bucket
Results
Coarse - Test Fine - Test Coarse - WSJ Fine - WSJ
Baseline 62.4 59.2 57.1 52.4
Supervised 73.0 62.4 74.8 66.0
Bucket Counts 72.4 66.5 73.5 68.7
Yesterday Counts 70.7 N/A 74.8 N/A
Web counts perform as well as the fully supervised system
Backoff Statistics (“Spent” Pattern)
Both Subject Object None
356 446 195 548
• Events in training dataset
• Had at least 10 hits
Both Subject Object None
3 86 84 1372
Effect of the Event Context
• Supervised classifier use context in their features
• Web counts system doesn’t use context of the events• Significantly fewer hits when including context• Better accuracy with more hits than with context
• What is the effect of subject/object context on the understanding of event duration?
MTurk Setup
• 10 MTurk workers for each event
• Without the context
• Event – choice for each duration bucket
• With the context
• Event with subject/object – choice for each duration bucket
Compare accuracy– Event with context
– Event without context
Coarse - Test Fine - Test Coarse - WSJ Fine - WSJ
Baseline 62.4 59.2 57.1 52.4
Event only 52.0 42.1 49.4 43.8
Event and context 65.0 56.7 70.1 59.9
Results: Mechanical Turk Annotations
Context significantly improves accuracy of MTurk annotations
Event Duration Lexicon
• Distributions for 1000 most frequent verbs from the NYT portion of the Gigaword with 10 most frequent grammatical objects of each verb
• Due to thresholds not all the events have distributions
EVENT=to use,
ID=e13-7,
OBJ=computer,
PATTERNS=2,
DISTR=[0.009;0.337;0.238;0.090;0.130;0.103;0.092;0.002;]
http://cs.stanford.edu/~agusev/durations/
Summary
• We learned aspectual information from the web• Event durations from the web counts are as accurate
as a supervised system• Web counts are domain-general, work well even
without context• New lexicon with 1000 most frequent verbs with 10
most frequent objects • MTurk suggests that context can improve accuracy of
event duration annotation