+ All Categories
Home > Documents > Exploiting Timelines to Enhance Multi-document Summarization

Exploiting Timelines to Enhance Multi-document Summarization

Date post: 23-Feb-2016
Category:
Upload: marlee
View: 81 times
Download: 0 times
Share this document with a friend
Description:
Exploiting Timelines to Enhance Multi-document Summarization. Jun-Ping Ng, Yan Chen, Min-Yen Kan , Zhoujun Li DSO National Laboratories National University of Singapore Beihang University. Outline. Overview Approach Experiments and Results Discussion. Overview. - PowerPoint PPT Presentation
Popular Tags:
35
Exploiting Timelines to Enhance Multi- document Summarization Jun-Ping Ng, Yan Chen, Min-Yen Kan, Zhoujun Li DSO National Laboratories National University of Singapore Beihang University
Transcript
Page 1: Exploiting Timelines to Enhance Multi-document Summarization

Exploiting Timelines to Enhance Multi-document Summarization

Jun-Ping Ng, Yan Chen, Min-Yen Kan, Zhoujun LiDSO National Laboratories

National University of SingaporeBeihang University

Page 2: Exploiting Timelines to Enhance Multi-document Summarization

2

Outline

• Overview• Approach• Experiments and Results • Discussion

Page 3: Exploiting Timelines to Enhance Multi-document Summarization

3

OVERVIEW

Page 4: Exploiting Timelines to Enhance Multi-document Summarization

4

Multi-document Summarization

Page 5: Exploiting Timelines to Enhance Multi-document Summarization

5

Extractive Summarization

• Find the most salient sentences in source collection

• Top-k sentences are extracted to compose final summary

• <Graphic>

Page 6: Exploiting Timelines to Enhance Multi-document Summarization

6

Two Storms

(1) A fierce cyclone packing extreme winds and torrential rain smashed into Bangladesh’s southwestern coast Thursday, wiping out homes and trees in what officials described as the worst storm in years.

(2) More than 100,000 coastal villagers have been evacuated before the cyclone made landfall.

(3) The storm matched one in 1991 that sparked a tidal wave that killed an estimated 138,000 people, Karmakar told AFP

Page 7: Exploiting Timelines to Enhance Multi-document Summarization

7

Two Storms

(1) A fierce cyclone packing extreme winds and torrential rain smashed into Bangladesh’s southwestern coast Thursday, wiping out homes and trees in what officials described as the worst storm in years.

(2) More than 100,000 coastal villagers have been evacuated before the cyclone made landfall.

(3) The storm matched one in 1991 that sparked a tidal wave that killed an estimated 138,000 people, Karmakar told AFP

Page 8: Exploiting Timelines to Enhance Multi-document Summarization

8

Timeline

Page 9: Exploiting Timelines to Enhance Multi-document Summarization

9

APPROACH

Page 10: Exploiting Timelines to Enhance Multi-document Summarization

10

Merging Timelines Into Summarization

Page 11: Exploiting Timelines to Enhance Multi-document Summarization

11

Temporal Processing

• Based on TimeML (Pustejovsky et al 2003)• Basic temporal units – events + timexes• Three steps

– Event-timex temporal relation classification– Event-event temporal relation classification– Timex normalization

• Merge to obtain timelines• <TODO>

Page 12: Exploiting Timelines to Enhance Multi-document Summarization

12

Timelines

Page 13: Exploiting Timelines to Enhance Multi-document Summarization

13

Summarization --- SWINGhttps://github.com/WING-NUS/SWING

Page 14: Exploiting Timelines to Enhance Multi-document Summarization

Sentence Scoring• Time span importance• Contextual time span importance• Sentence temporal coverage density

14

Page 15: Exploiting Timelines to Enhance Multi-document Summarization

Defining Timeline Features

15

Page 16: Exploiting Timelines to Enhance Multi-document Summarization

Time Span Importance (TSI)• Time spans which contain many events are more salient• Sentences which references events in these time spans are

thus better candidates for a summary

16

Page 17: Exploiting Timelines to Enhance Multi-document Summarization

Scoring TSI

17

Page 18: Exploiting Timelines to Enhance Multi-document Summarization

Contextual Time Span Importance (CTSI)

• Time spans near to “important” time spans may also be important

18

Page 19: Exploiting Timelines to Enhance Multi-document Summarization

Scoring CTSI

19

Page 20: Exploiting Timelines to Enhance Multi-document Summarization

Sentence Temporal Coverage Density (TCD)

• Number of sentences in a summary is limited• Favour sentences which

– contain more events– covering a wide variety of time spans

20

Page 21: Exploiting Timelines to Enhance Multi-document Summarization

Scoring TCD

21

Page 22: Exploiting Timelines to Enhance Multi-document Summarization

Sentence Re-ordering• SWING makes use of the Maximal Marginal Relevance

(MMR) algorithm to identify redundancies in selected sentences

• MMR is heavily biased towards lexicons and surface similarities

22

Page 23: Exploiting Timelines to Enhance Multi-document Summarization

Beyond Lexical Penalties

23

An official in Barisal, 120 kilometres south of Dhaka, spoke of severe destruction as the 500 kilometre-wide mass of cloud passed overhead.

“Many trees have been uprooted and houses and schools blown away,” Mostofa Kamal, a district relief and rehabilitation officer, told AFP by telephone.

“Mud huts have been damaged and the roofs of several houses blown off,” said the state’s relief minister, Mortaza Hossain.

Page 24: Exploiting Timelines to Enhance Multi-document Summarization

TimeMMR• Novel dimension to redundancy detection• Beyond lexical similarities, identify sentences which contain

substantial time span overlaps• Candidate sentences which share many time spans with

selected sentences are penalised

24

Page 25: Exploiting Timelines to Enhance Multi-document Summarization

EXPERIMENTS AND RESULTS

Page 26: Exploiting Timelines to Enhance Multi-document Summarization

Results• TAC-2010 data set to

train regression model• TAC-2011 data set to

test • Using timelines lead to

better summaries!

System ROUGE-2

SWING 0.1339

+ Timelines 0.1394*

+ TimeMMR 0.1389

26

Page 27: Exploiting Timelines to Enhance Multi-document Summarization

Overcoming Errors• Timelines contain errors

– Errors from underlying temporal processing systems– Simplifying assumptions made in timeline construction– Lack of consistency checking and validation

27

Page 28: Exploiting Timelines to Enhance Multi-document Summarization

Reliability Filtering• Identify timelines which potentially contain more errors• Exclude these when performing summarization

28

Page 29: Exploiting Timelines to Enhance Multi-document Summarization

Length as a Metric• Use the length of a timeline as a gauge of its “accuracy”• Drop the use of timelines which are less than the average

length, computed over the whole input document collection

29

Page 30: Exploiting Timelines to Enhance Multi-document Summarization

Results• Experiments repeated

with reliability filtering• Significant

improvement obtained • After filtering timelines

are used in 21 out of 44 document sets

System ROUGE-2

SWING 0.1339

+ Timelines 0.1394*

+ Timelines + Filtering 0.1418**

+ TimeMMR 0.1389

+ TimeMMR+ Filtering 0.1402**

30

Page 31: Exploiting Timelines to Enhance Multi-document Summarization

DISCUSSION

Page 32: Exploiting Timelines to Enhance Multi-document Summarization

Text Example

32

The Army’s surgeon general criticized stories in The Washington Post disclosing problems at Walter Reed Army Medical Center, saying the series unfairly characterized the living conditions and care for soldiers recuperating from wounds at the hospital’s facilities.

The Army’s surgeon general criticized stories in The Washington Post disclosing problems at Walter Reed Army Medical Center, saying the series unfairly characterized the living conditions and care for soldiers recuperating from wounds at the hospital’s facilities.

Defense Secretary Robert Gates says people found to have been responsible for allowing substandard living conditions for soldier outpatients at Walter Reed Army Medical Center in Washington will be “held account- able,” although so far no one in the Army chain of com- mand has offered to resign.

A top Army general vowed to personally over- see the upgrading of Walter Reed Army Medical Cen- ter’s Building 18, a dilapidated former hotel that houses wounded soldiers as outpatients.

Top Army officials visited Building 18, the decrepit former hotel housing more than 80 recovering soldiers, outside

“I’m not sure it was an accurate representation,” Lt. Gen. Kevin Kiley, chief of the Army Medical Com- mand which oversees Walter Reed and all Army health care, told reporters during a news conference.

Timelines Used SWING

Page 33: Exploiting Timelines to Enhance Multi-document Summarization

Future Work• Study the use of alternative evaluation metrics, especially

for TimeMMR• Look at better metrics for reliability filtering• Expand the scope of the timelines that are used for more

flexibility

33

Page 34: Exploiting Timelines to Enhance Multi-document Summarization

Conclusion

• The use of time is useful for summarization!• Sentence Scoring

– Derive features from a timeline– Combine features with a supervised learning

summarization framework• Sentence Re-ordering

– Use overlapping time spans to identify redundancies

Page 35: Exploiting Timelines to Enhance Multi-document Summarization

Thank you!

35


Recommended