+ All Categories
Home > Documents > Modeling Novelty and feature combination using Support Vector Regression for Update Summarization

Modeling Novelty and feature combination using Support Vector Regression for Update Summarization

Date post: 13-Jan-2016
Category:
Upload: donoma
View: 41 times
Download: 2 times
Share this document with a friend
Description:
Modeling Novelty and feature combination using Support Vector Regression for Update Summarization. Praveen Bysani Vijay Yaram Vasudeva Varma. Outline. Text Summarization Update Summarization Support Vector Regression Sentence Scoring Features Novelty Factor (NF) Experimental Results. - PowerPoint PPT Presentation
Popular Tags:
25
Modeling Novelty and feature combination using Support Vector Regression for Update Summarization Praveen Bysani Vijay Yaram Vasudeva Varma 1 Search and Information Extraction Lab LTRC, IIIT Hyderabad
Transcript
Page 1: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

Modeling Novelty and feature combination using Support Vector Regression for

Update Summarization

Praveen Bysani

Vijay Yaram

Vasudeva Varma

1

Search and Information Extraction Lab LTRC, IIIT Hyderabad

Page 2: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

Outline

Search and Information Extraction Lab LTRC, IIIT Hyderabad

2

Text Summarization

Update Summarization

Support Vector Regression

Sentence Scoring Features Novelty Factor (NF)

Experimental Results

Page 3: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

Text Summarization Condensing a piece of text while retaining

important information Types of summarization

Extractive Vs Abstractive Single Document Vs Multi Document Query Focused Vs Query Independent Personalized Vs Generic

Focus on Extractive Multi Document Summarization

Dynamic Summarization, Update Summarization

5 Search and Information Extraction Lab LTRC, IIIT Hyderabad

Page 4: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

Update Summarization Emerging area in summarization

Summarization with a sense of prior knowledge

Challenging – To detect information that is relevant and also novel

Practical usage – monitor changes in temporally evolving topics especially newswire

6 Search and Information Extraction Lab LTRC, IIIT Hyderabad

Page 5: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

An example

Search and Information Extraction Lab LTRC, IIIT Hyderabad

7

Topic : YSR Chopper Missing

 A helicopter carrying Andhra Pradesh chief minister Y S Rajashekhar Reddy, two of his staff and two pilots went missing in pouringrain Wednesday morning over the Naxal and tiger-infested Nalamalla forests and with no contact until early Thursday, experts and officials feared the worst. Multiple agencies of the state launched a massive hunt for possible wreckage in the desolate terrain.  Apart from Reddy, the chopper was carrying principal secretary to CM S Subrahmanyam and YSR's chief security officer ASC Wesley.

Andhra Pradesh chief minister Y S Rajasekhara Reddy has died in an air crash. The bodies of 60-year-old Reddy, his special secretary P Subramanyam, chief security officer A S C Wesley, pilot Group Captain S K Bhatia and co-pilot M S Reddy were found on Rudrakonda Hill, 40 nautical miles east of here, besides the mangled remains of the helicopter. The central leadership of the Congress is understood to have cleared the name of Andhra Pradesh finance minister K Rosaiah as the caretaker CM of the state.

Page 6: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

doc

doc3

doc1

Pre Processing

Sentences

Sentence

Scorers

feature2

feature n

feature1

Ranker

Ranked Set of Sentences

Summary Generator

Summary

Page 7: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

Sentence Ranking 4 stages of sentence extractive

summarization Pre Processing Sentence Scoring Sentence Ranking Summary Generation

Scores from features are manually weighted to compute sentence rank

Instead, use a Machine Learning Algorithm to estimate rank from features

9 Search and Information Extraction Lab LTRC, IIIT Hyderabad

Page 8: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

Support Vector Regression (SVR) Regression analysis - modeling values of a

dependent variable  from one or more independent variables

Regression is a function Y = f(X,β)            

The independent variables, X The dependent variable, Y unknown parameters, β

Regression using Support Vectors is Support Vector Regression (SVR)

Estimate Sentence rank (dependent variable) using Scoring Features (Independent variables) through Support Vector Regression (SVR)

10 Search and Information Extraction Lab LTRC, IIIT Hyderabad

Page 9: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

Estimating Sentence Rank

ROUGE – recall oriented metric which evaluates based on word overlap with models

ROUGE-2 and ROUGE-SU4 correlate highly with human evaluation

Sentence Rank (is) of a sentence s is

|Bigramm ^ Bigrams| - # of bigrams shared by model and sentence

11 Search and Information Extraction Lab LTRC, IIIT Hyderabad

Page 10: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

Oracle summaries

Each oracle summary is best summary that can be generated by any sentence extractive summarization system

Sentences are ranked using the ROUGE-2 score described above

To depict the gap and scope of improvement

12 Search and Information Extraction Lab LTRC, IIIT Hyderabad

Page 11: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

Sentence Scoring FeaturesSentence Position is a popular and well

studied feature Sentence Location 1 (SL1)

First 3 sentences of document contain most informative content (also proved by analysis of oracle summaries)

Score of a sentence ‘s’ at position ‘n’ in document ‘d’ is

Score(snd) = 1 – n/1000 if n<=3

= n/1000 else

13 Search and Information Extraction Lab LTRC, IIIT Hyderabad

Page 12: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

• Sentence Location 2 (SL2)– Positional index of sentence itself as value– Model will learn optimum sentence position based

on genre– Not inclined to top or bottom as SL1

Score(snd) = n

• Sentence Frequency Score (SFS)– Ratio of number of sentences in which a word

occurred to total number of sentences in cluster– SFS of word w is

14 Search and Information Extraction Lab LTRC, IIIT Hyderabad

Page 13: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

TF – IDF• Popular measure to find relevance of document in IR• Same analogy used to find relevance of sentence• Term Frequency(Tf ij) of term (ti) in document (dj) is

• Inverse Document Frequency (IDFi) of a term (ti) is

15 Search and Information Extraction Lab LTRC, IIIT Hyderabad

Page 14: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

Document Frequency Score (DFS)

Inverse of IDF Ratio of number of docs in which a term occurred

to total number of docs Average DFS of words in sentence is its score

Probabilistic Hyperspace Analogue to language (PHAL) and Kullback-Leiber Divergence (KL) available as features

Baseline summarizer generates summary by picking first 100 words of last document

16 Search and Information Extraction Lab LTRC, IIIT Hyderabad

Page 15: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

Novelty Factor An ad-hoc feature for Update

summarization Consider a stream of articles published on a

topic over time period T All articles published from time 0 to t are

considered to be read previously (prior Knowledge)

Articles published from t to T are new that contains new information.

Let td represent the chronological time stamp of document d.

17 Search and Information Extraction Lab LTRC, IIIT Hyderabad

Page 16: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

Novelty Factor

NF of a word ‘w’ is

nd t = { d : w in d and td > t }pd t = { d: w in d and td < t}D = { d: td > t }

•|ndt| signifies the importance of term in new cluster•|pdt| penalizes any term that occurs frequently in previous clusters•|D| for smoothing

18 Search and Information Extraction Lab LTRC, IIIT Hyderabad

Page 17: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

Training Data• DUC 2007 Main task data for training

– 45 topics– Each topic with 25 documents and a query– Associated 4 model summaries each 250 words

• DUC 2007 Update task data for training update specific features– 10 topics– Each topic divided into clusters A,B,C in

chronological order with 10, 8, 7 docs respectively

– Associated 4 model summaries each 100 words

19 Search and Information Extraction Lab LTRC, IIIT Hyderabad

Page 18: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

Test Dataset

TAC 2008 Update Summarization data for training 48 topics Each topic divided into A, B with 10 documents Summary for cluster A is normal summary and

cluster B is update summary

20 Search and Information Extraction Lab LTRC, IIIT Hyderabad

Page 19: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

Experiments and Results

Feature ROUGE-2 ROUGE-SU4

KL 0.09285 0.132325

DFS 0.092225 0.13281

NF 0.086155 0.126455

SL1 0.086245 0.12163

SL2 0.08599 0.12147

SFS 0.077745 0.12419

TF-IDF 0.07317 0.12604

PHAL 0.06505 0.10712

baseline 0.05865 0.09333

• For Individual Features

21 Search and Information Extraction Lab LTRC, IIIT Hyderabad

Page 20: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

Experiments and Results

Combination ROUGE-2 ROUGE-SU4

DFS+SL1 0.102195 0.139205

NF+SL1 0.100845 0.13742

DFS+SL2 0.10126 0.13943

NF+SL2 0.0978 0.134925

DFS+TFIDF 0.0993 0.1383

PHAL+KL 0.094035 0.134275

DFS+SP+PHAL+KL 0.09749 0.13705

• For combination of features

Non-complimenting features for KL

22 Search and Information Extraction Lab LTRC, IIIT Hyderabad

Page 21: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

At cluster Level For Cluster A

System ROUGE-2 ROUGE-SU4

DFS+SL2 10604 0.13936

DFS+TF+IDF 0.10633 0.14415

System-43 0.11137 0.14297

System-13 0.11045 0.13987

Oracle 0.17041 0.19616

23 Search and Information Extraction Lab LTRC, IIIT Hyderabad

Page 22: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

At cluster Level For Cluster B

System ROUGE-2 ROUGE-SU4

DFS+SL1 0.10343 0.14267

NF+SL1 0.10055 0.13791

System-14 0.10111 0.13669

System-65 0.09675 0.13381

Oracle 0.17610 0.19877

24 Search and Information Extraction Lab LTRC, IIIT Hyderabad

Page 23: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

Discussion Quality of training data for NF is poor

compared to other features, that explains the relatively less performance compared to DFS

Huge Gap between oracle summaries and best peers

25 Search and Information Extraction Lab LTRC, IIIT Hyderabad

Page 24: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

Contribution

Search and Information Extraction Lab LTRC, IIIT Hyderabad

26

Ad-hoc Feature (Novelty Factor) for modeling novelty along with relevance

Analyzing the affect of various feature combinations in quality of summaries using Support Vector regression

Depicting the scope of improvement in summarization

Page 25: Modeling Novelty and feature combination using Support Vector Regression for  Update Summarization

Thank You

27 Search and Information Extraction Lab LTRC, IIIT Hyderabad


Recommended