A Trainable Document SummarizerJulian Kupiec, Jan Pedersen & Francine Chen
ACM SIGIR ‘95
Presented by Mat KellyCS895 – Web-based Information Retrieval
Old Dominion UniversityNovember 22, 2011
The Automatic Creationof Literature Abstracts
H. P. LuhnIBM Journal of R&D, 1958
Luhn’s Objectives
• Exploration into automatic methods of obtaining abstracts
• Selects sentences that are most representative of pertinent info
• Citations of author’s own statements constitute “auto-abstract”
Which sentences are best?
• Establish a significance factor– Freq. of word occurrence word significance– Relative position of signif. word in sentence is a
measure for determining signif. of sentence• Why does this work?– Writer repeats certain words as he elaborates
Over-Simplification
• Method does not differentiate words with same stem– Letter-by-letter analysis to
determine P() of same stem• While authors will opt for
synonymous word choice, s/he’ll eventually run out and resort to repetition.
polic = {policingpolicypolice
}
Premise
• No consideration given to meaning of words.• Instead, the closer certain words are
associated, the more specifically an aspect of the subject is being treated
• Where the greatest number of freq. occurring different words are found close to each other, the prob. is high that information is most representative of the article.
• Criterion is relationship of signif words to each other rather than distrib. over whole sentence.
• Consider only portions of sentences that are bracketed by signif word, disregard those beyond limit from consideration of current bracket.
• Useful limit found is 4-5 non-signif words between signif words
Computing Significance Factor
1. Determine extent of cluster by bracketing
2. Count # signif words in cluster3. Divide square of # by total #
words in cluster
Tested on 50 articles of 300-4500 words each, compared against 100-person manual generation
Significant Words* * * * 1 2 3 4 5 6 7[ ]
A portion of a sentence is bracketedIf signif. words are not more than 4 apart, whole sentence is cited
• Resolving power depends on total # words in article and decreases as total # of words increases
• Overcome by running on subdivisions of article, highest ranking sentences combined to form abstract– Divisions might already exist with paper’s
organization– Otherwise, divided arbitrarily and overlapping
Procedures
• Abstracts prepared by first punching on cards(!)• Pronouns & prepositions deleted from lookup routine• Rest of words sorted alphabetically• Words with common beginnings consolidated
(rudimentary form of stemming)– Produced errors up to 5% but did not affect results
• Words with low frequency removed, remaining were marked as significant
• Sentence signif then computed with prev formula
Abstract Creation with Result
• Apply cutoff value of sentence significance• Fixed number of sentences required
irrespective of document length• Sentences could be weighted by assigning
premium value to predetermined set of words if article is of special interest
• If no sentences meet threshold, reject article as too general for purpose of auto-abstracting
Example
Two major recent developments have called the attention of chemists, physiologists, physicists and other scientists to mental diseases: It has been found that extremely minute quantities of chemicals can induce hallucinations and bizarre psychic disturbances in normal people, and mood-altering drugs (tranquilizers, for instance) have made long-institutionalized people amenable to therapy. (4.0)
This poses new possibilities for studying brain chemistry changes in health and sickness and their alleviation, the California researchers emphasized. (5.4) The new studies of brain chemistry have provided practical therapeutic results and tremendous encouragement to those who must care for mental patients. (5.4)
Generated Abstract
Conclusions
• Method proved feasible• Highly reliable, consistent and stable unlike
manual creation• Possibility that author’s style causes inferior
sentences to be promotes• Method helps to realize savings in human
effort
Significant Words
Significant Sentences
Inclusion in Abstract
Kupiec’s Objective
• Motive: provide intermediate point between document title and full text (i.e. abstract)
• Documents as short as 20% of the original can be as informative as the full text*
• Extracts can be non-unique• Combination by numerous methods (including
Luhn’s) would have the best performance.
* A.H. Morris, G.M. Kasper, and D.A. Adams. The effects and limitations of automated text condensing on reading comprehension performance. Information Systems Research, pages 17-35, March 1992
A Statistical Classification Problem• Have training set of documents w/ manually extracted abstracts• Develop classification function that est. prob. That a given
sentence is included in abstract• From this, generate new abstracts by ranking sentences
according to this prob and select user-specified # of top scoring sentences.
Contributes to S’s score
Given Sentences
Feature 1
Feature 2
…
Feature n
Determine P() of abstract inclusion
Using Bayesian Classifier
Inclusion Threshold
SCORE}
• Evaluation criterion: classification success rate/precision
• Requires corpus (expensive)– Acquired from non-profit
Engineering Information Co. – used as basis for experiments
• All previous methods assume that documents exist in isolation
FeaturesExperimentally Obtained• Sentence Len. Cutoff – short sentences are not usually
included in summaries – 5 words• Fixed-Phrase – list of words and those after “Summary",
"Conclusions”, etc are likely to be in summaries• Paragraph – Consider first 10 ¶ and last 5 ¶• Thematic Word – score sentences respective to inclusion of
words within theme• Uppercase Word – e.g. proper names, scored similarly to
thematic words, sentences that start with score double than later occurrences
Classifier
• For each sentence, determine prob that it will be included in summary S given k features:– Since all features are discrete, equation can be put
in terms of probs rather than likelihoods.– Results in simple Bayesian classification function
that assigns s as score, used to select sentences for inclusion in summary
About the Corpus
• Articles w/o abstracts, created manuallyafter the fact
• 188 document/summary pairs from 21 publications in scientific/technical domains
• Summary avg length is 3 sentences
Sentence Matching• Using manually created abstracts, match to
sentences in orig. document• Direct match - Verbatim or w/o minor
modifications• Direct join – 2 or more sentences used to
make summary sentence• Unmatchable – suspected fabrication without
using sentences in document• Incomplete –
– Some overlap exists but content is not preserved in summary– Summary sentence includes content from original but contains other
information that is not covered by a direct join
AbstractDirect match
Direct join
Evaluation
• Insufficient data for separate test corpus, used cross-validation strategy for evaluation
• Documents from a journal were selected for testing one at a time, all other document summary pairs were used for training
• Results were summed over journals• Unmatchable/incomplete sentences were
excluded from training and testing = 498 unique sentences
Evaluating Performance
• Fraction of manual summary sentences that were reproduced, limited by text excerpting: (451+19)/568 =83%
# Sentences
Fract of Corpus
Direct Sentence Matches
451 79%
Direct Joins 19 3%
Unmatchable Sentences 50 9%
Incomplete Single Sentences
21 4%
Incomplete Joins 21 4%
Total Manual Summary Sentences
568
• Sentence produced is correct if:• Has direct sentence match & present in manual summary
– or –• Is in manual summary as part of direct join and all other
components of join have been produced
Distribution of Correspondence in Training Corpus
1 2 5 10 12 15 20 25 30 35 400
20
40
60
80
100
number of sentencespe
rcen
t sen
tenc
es co
rrec
t
Results
• Of 568 Sentenceso 195 direct matches,
6 direct joins 201 correctly ident. summary sentences (35% replication)
• Manual summary generation has only 25% overlap between people and 55% for the same person over time.• 211/498 (42%) sentences correctly identified by the summarizer
Conclusions
• For summarizes 25% size of document– 84% sentences selected that were also selected by
professionals• For smaller summaries, improvement of 74%
observed vs. simply presenting beginning of document.
Contributes to S’s score
Comparing the Processes
• Luhn
Significant Words
Significant Sentences
Inclusion in Abstract
Given Sentences
|Sentence| < 5
↑ After fixed phrase
Prior. 1st & Last ¶s
↑ Thematic Words↑ Capitalized, non-
unit words
Determine P() of abstract inclusion
Using Bayesian Classifier
Inclusion Threshold
SCORE}
• Kupiec
References
• H.P. Luhn. The automatic creation of literature abstracts. IBM J. Res. Develop., 2:159-165, 1959
• Julian Kupiec, Jan Pedersen, and Francine Chen. A trainable document summarizer. In Proc. of the 18th Annual International ACM/SIGIR Conference, pages 68-73, Seattle, WA, 1995