A Machine Reading-based Approach to Update Summarization

A Machine Reading-based Approach toUpdate Summarization

Andrew Hickl, Kirk Roberts, and Finley Lacatusu

Language Computer CorporationApril 26, 2007

Overview

• Introduction• Why Machine Reading?• System Overview

– Question Processing– Sentence Retrieval and Ranking– Recognizing Textual Entailment– Sentence Selection– Summary Generation

• Results– Main Task– Update Task

• Conclusions and Future Considerations

Update Summarization

• As currently defined, the task of update summarization requires systems to maximize the amount of new information included in a summary that is not available from any previously-considered document.

Don’t consider identical content.

Don’t consider textually entailed content.

Considercontradictory content.

Potentially consider inferable content.

Consider new information.

Require access to models of the knowledge available from texts!

Term overlap, etc.

What is Machine Reading?

• Machine Reading (MR) applications seek to promote the understanding of texts by provide a representation of the knowledge available from a corpus.

• Three important components:– Knowledge Acquisition: How can we automatically extract the

semantic/pragmatic content of a text?– Knowledge Representation: How do we encode the propositional content

of a text in a regular manner?– Stability/Control: How do we ensure that the knowledge acquired from

text is consistent with previous commitments stored in a knowledge base?

• We believe that the recognition of knowledge that’s consistent with a KB is an important prerequisite for performing update summarization:

– Identify content that’s already stored in the KB– Identify content that’s inferable from the KB– Identify content that contradicts content in the KB

Consistency: Assume that knowledge is consistent wrt a particular model M iff the truth of a proposition can be reasonably inferred from the

other knowledge commitments of M.

From Textual Inference to Machine Reading

• The recent attention paid to the task of recognizing textual entailment (Dagan et al. 2006) and textual contradiction (Harabagiu et al. 2006) has led to the development of systems capable of accurately recognizing different types of textual inference relationships in natural language texts.

Text: A revenue cutter, the ship was named for Harriet Lane, niece of President James Buchanan, who served as Buchanan’s White House hostess.

Text: A revenue cutter, the ship was named for Harriet Lane, niece of President James Buchanan, who served as Buchanan’s White House hostess.Hyp: Lane worked at the White House.Hyp: Lane worked at the White House.

A revenue cutter, the ship was named for Harriet Lane, niece of President James Buchanan, who served as Buchanan’s White House hostess.

A revenue cutter, the ship was named for Harriet Lane, niece of President James Buchanan, who served as Buchanan’s White House hostess.Hyp: Lane never set foot in White House.Hyp: Lane never set foot in White House.

Textual Entailment (RTE3 Test Set) Textual Contradiction

• Despite still being a relatively new evaluation area for NLP, statistical knowledge-lean approaches are achieving near human-like performance:

• PASCAL RTE-2 (2006): 75.38% accuracy (Hickl et al. 2006) [max: 86.5%]• PASCAL RTE-3 (2007): 81.75% accuracy (Hickl and Bensley 2007) [max:

85.75%]• Contradiction: 66.5% accuracy (Harabagiu, Hickl, and Lacatusu, 2006)• Human Agreement: 86% (entailment), 81% (contradiction)

The Machine Reading Cycle

Document Ingestion

CommitmentExtraction

KnowledgeSelection

TextRepository

TextualEntailment

TextualContradiction

No

Yes

No

Yes

Knowledge Consolidation

Text Probes (hypotheses)

KB Commitments

(“texts”)

Overview

• Introduction• Why Machine Reading?• System Overview

– Question Processing– Sentence Retrieval and Ranking– Recognizing Textual Entailment– Sentence Selection– Summary Generation

• Results– Main Task– Update Task

• Conclusions and Future Considerations

Architecture of GISTexter

ComplexQuestion

KeywordExtraction

SyntacticQ Decomp

SemanticQ Decomp

QuestionAnswering

Multi-DocumentSummarization

SummaryGeneration

SummaryAnswer



TextualEntailment

No

Yes

No

Yes

Sentence Selection

Main

Question Processing Sentence Retrieval and Ranking Summary Generation

Corrections New Knowledge

Machine Reading

Update

Question Processing

• GISTexter uses three different Question Processing modules in order to represent the information need of complex questions.

What are the long term and short term implications of Israel’s continuing military action against Lebanon, including airstrikes on Hezbollah positions in Southern Lebanon?

SyntacticQuestion Decomposition

Extracted: implications, Israel, military, action, (Southern) Lebanon, airstrikes, Hezbollah, positionsAlternations: implications, effects, outcomes, disaster, scandal, crisis; Israel, Israeli, Jewish state; military action, attack, operation, onslaught, invasion; Lebanon, Lebanese; positions, locations, facilities, targets, bunkers, areas, situations

What are the long term implications of Israel’s action against Lebanon?What are the short term implications of Israel’s action against Lebanon?What are the long term implications of Israeli airstrikes on Hezbollah positions in Southern Lebanon?

SemanticQuestion DecompositionWhat ramifications could another round of airstrikes have on relations?What could foment anti-Israeli sentiment among the Lebanese population?

What kinds of humanitarian assistance has Hezbollah provided in Lebanon?

How much damage resulted from the Israeli airstrikes on Lebanon?

Keyword Extraction/Alternation

Who has re-built roads, schools, and hospitals in Southern Lebanon?

What are the short term implications of Israeli airstrikes on Hezbollah positions in Southern Lebanon?

Question Processing

Q0. What are the long-term ramifications of Israeli airstrikes against Hezbollah?

Q0. What are the long-term ramifications of Israeli airstrikes against Hezbollah?

A0: Security experts warn that this round of airstrikes could have serious ramifications for Israel , including fomenting anti-Israeli sentiment among most of the Lebanese population for generations to come.

A0: Security experts warn that this round of airstrikes could have serious ramifications for Israel , including fomenting anti-Israeli sentiment among most of the Lebanese population for generations to come.

R1. ramifications-airstrikes

Q1. What ramifications could this round of airstrikes have ?

Q1. What ramifications could this round of airstrikes have ?

Q2. What could foment anti-Israeli sentiment among the Lebanese

population?

Q2. What could foment anti-Israeli sentiment among the Lebanese

population?

Q3. What kinds of humanitarian assistance

has Hezbollah provided in Lebanon?

Q3. What kinds of humanitarian assistance

has Hezbollah provided in Lebanon?

A2. Hezbollah has provided humanitarian assistance to the people of Southern Lebanon following recent airstrikes; a surprising move to many who believe Hezbollah’s sole purpose was foment unrest in Lebanon and Israel.

A2. Hezbollah has provided humanitarian assistance to the people of Southern Lebanon following recent airstrikes; a surprising move to many who believe Hezbollah’s sole purpose was foment unrest in Lebanon and Israel.

A3. Following the widespread destruction caused by Israeli airstrikes, Hezbollah has moved quickly to provide humanitarian assistance, including rebuilding roads, schools, and hospitals and ensuring that water and power is available in metropolitan areas.

A3. Following the widespread destruction caused by Israeli airstrikes, Hezbollah has moved quickly to provide humanitarian assistance, including rebuilding roads, schools, and hospitals and ensuring that water and power is available in metropolitan areas.

R2. fomenting-unrestR3. provide-humanitarian assistance

A1. The most recent round of Israeli airstrikes has caused significant damage to the Lebanese civilian infrastructure, resulting in more than an estimated $900 million in damage in the Lebanese capital of Beirut alone.

A1. The most recent round of Israeli airstrikes has caused significant damage to the Lebanese civilian infrastructure, resulting in more than an estimated $900 million in damage in the Lebanese capital of Beirut alone.

Q4. How much damage resulted from the

airstrikes?

Q4. How much damage resulted from the

airstrikes?

Q5. What is Hezbollah’s sole purpose?

Q5. What is Hezbollah’s sole purpose?

Q6. Who has re-built roads, schools, and

hospitals?

Q6. Who has re-built roads, schools, and

hospitals?

R4. result-COST R5. ORG-purpose R6. ORG-rebuild

Sentence Retrieval and Ranking

• As with our DUC 2006 system, we used two different mechanisms to retrieve relevant sentences for a summary:

– Question Answering (Hickl et al. 2006):• Keywords extracted from subquestions and automatically expanded• Sentences retrieved and ranked based on number and proximity of keywords in

each sentence• Top 10 answers from each subquestion are combined and re-ranked in order to

produce a ranked list of sentences for a summary– Multi-Document Summarization (Harabagiu et al. 2006, Harabagiu & Lacatusu

2005):• Computed topic signatures (Lin and Hovy 2000) and enhanced topic

signatures (Harabagiu 2004) for each relevant set of documents• Sentences retrieved based on keywords; re-ranked based on combined topic

score derived from topic signatures

• All retrieved sentences were then re-ranked based on a number of features, including:

– Relevance score assigned by retrieval engine– Position in document– Number of topical terms / named entities– Length of original document

• Feature weights were determined using a hill-climber trained on “human” summaries from the DUC 2005 and 2006 main tasks.


ComplexQuestion

KeywordExtraction

SyntacticQ Decomp

SemanticQ Decomp

QuestionAnswering


SummaryGeneration

SummaryAnswer



TextualEntailment

No

Yes

No

Yes

SentenceSelection

Main



Machine Reading

Update

Recognizing Textual Entailment

PreprocessingCommitment

ExtractionCommitment

AlignmentLexical

AlignmentEntailment

Classification

text

hyp

ContradictionRecognition

EntailedKnowledge

ExtractedKnowledge

+TE

-TE

Extracted commitments from t and h

NO

YES

NO

YES

Text Commitments

Hyp Commitments




AlignmentLexical

AlignmentEntailment

Classification

text

hyp


EntailedKnowledge

ExtractedKnowledge

+TE

-TE


NO

YES

NO

YES

Text Commitments

Hyp Commitments

• Step 1. Preprocessing of text-hypothesis pairs– POS Tagging, Syntactic Parsing, Morphological Stemming, Collocation Detection– Annotation of Tense/Aspect, Modality, Polarity– Semantic Parsing (PropBank, NomBank, FrameNet)– Named Entity Recognition (~300 named entity types)– Temporal Normalization – Temporal Relation Detection (t-link, s-link, a-link)– Pronominal Co-reference– Nominal Co-reference– Synonymy and Antonymy Detection– Predicate Alternation (based on pre-cached corpus of predicate paraphrases)




AlignmentLexical

AlignmentEntailment

Classification

text

hyp


EntailedKnowledge

ExtractedKnowledge

+TE

-TE


NO

YES

NO

YES

Text Commitments

Hyp Commitments

• Step 2. Commitment Extraction



Harriet Lane worked at the White House.Harriet Lane worked at the White House.

1. The ship was named for Harriet Lane.2. A revenue cutter was named for Harriet Lane.3. The ship was named for the niece of

Buchanan.4. Buchanan had the title of President.5. Buchanan had a niece.6. A revenue cutter was named for the niece of

Buchanan.7. Harriet Lane was the niece of Buchanan.8. Harriet Lane was related to Buchanan.9. Harriet Lane served as Buchanan’s White

House hostess.10. Buchanan had a White House Hostess.11. There was a hostess at the White House.12. The niece of Buchanan served as White

House hostess.13. Harriet Lane served as White House hostess.14. Harriet Lane served as a hostess.15. Harriet Lane served at the White House.

1. The ship was named for Harriet Lane.2. A revenue cutter was named for Harriet Lane.3. The ship was named for the niece of

Buchanan.4. Buchanan had the title of President.5. Buchanan had a niece.6. A revenue cutter was named for the niece of

Buchanan.7. Harriet Lane was the niece of Buchanan.8. Harriet Lane was related to Buchanan.9. Harriet Lane served as Buchanan’s White

House hostess.10. Buchanan had a White House Hostess.11. There was a hostess at the White House.12. The niece of Buchanan served as White

House hostess.13. Harriet Lane served as White House hostess.14. Harriet Lane served as a hostess.15. Harriet Lane served at the White House.

16. Harriet Lane worked at the White House.16. Harriet Lane worked at the White House.

ConjunctionSubordination

Reported SpeechAppositives

Relative ClausesTitles and Epithets

Co-reference Resolution

Ellipsis ResolutionPre-Nominal

ModifiersPossessives



ExtractionLexical

AlignmentEntailment

Classification

text

hyp


EntailedKnowledge

ExtractedKnowledge

+TE

-TE


NO

YES

NO

YES

Text Commitments

Hyp Commitments

• Step 3. Commitment Alignment– Used Taskar et al. (2005)’s discriminative matching approach to word alignment

• Cast alignment prediction as maximum weight bipartite matching• Used large-margin estimation to learn parameters w which:

– where yi (correct alignment), (actual alignment), xi (sentence pair), w (parameter), f (feature mapping)

– Used reciprocal best-hit match to ensure that best commitment alignments were considered

13. Harriet Lane served as White House hostess.

14. Harriet Lane served as a hostess.15. Harriet Lane served at the

White House.

Harriet Lane worked at the White House.

iiiYy

i yxfwyi

),,(maxarg T

iy




AlignmentLexical

AlignmentEntailment

Classification

text

hyp


EntailedKnowledge

ExtractedKnowledge

+TE

-TE


NO

YES

NO

YES

Text Commitments

Hyp Commitments

• Step 4. Lexical Alignment– Used Maximum Entropy Classifier to identify best possible token-wise

alignment for each phrase chunk found in t-h pair• Morphological Stemming / Levenshtein Edit Distance• Numeric /Date Comparators (second, 2; 1920’s, 1928)• Named Entity Categories (350+ types from LCC’s CiceroLite)• WordNet synonymy/antonymy distance




AlignmentLexical

AlignmentEntailment

Classification

text

hyp


EntailedKnowledge

ExtractedKnowledge

+TE

-TE


NO

YES

NO

YES

Text Commitments

Hyp Commitments

• Step 5. Entailment Classification and Contradiction Recognition– Used Decision Tree classifier (C4.5)

• 2006 (RTE-2): Trained on 100K+ Entailment Pairs• 2007 (RTE-3): Trained only on RTE-3 Development Set (800 Pairs)

– If NO judgment returned:• Consider all other commitment-hypothesis pairs with palign ≥ ; ( = 0.85)

• Return NO as RTE judgment

– If YES judgment returned:• Used system for recognizing textual contradiction (Harabagiu et al. 2006)

to determine whether the hypothesis contradicted any other extracted commitment

• If no contradiction can be found positive instance of textual entailment

• If contradiction negative instance of textual entailment


ComplexQuestion

KeywordExtraction

SyntacticQ Decomp

SemanticQ Decomp

QuestionAnswering


SummaryGeneration

SummaryAnswer



TextualEntailment

No

Yes

No

Yes

SentenceSelection

Main



Machine Reading

Update

Sentence Selection and KB Update

• Step 1. Entailment confidence scores assigned to commitments are then used to re-rank the sentences that they were extracted from:

– Textual Entailment:• Entailed Commitments (known information): Negative Weight• Non-Entailed Commitments (new information): Positive Weight

– Textual Contradiction:• Contradicted Commitments (changed information): Positive Weight• Non-Contradicted Commitment (no change): No Contribution

– Confidence scores are normalized for textual entailment and textual contradiction

• Step 2. After each round of summarization, GISTexter’s knowledge base is updated to include:

– All non-entailed commitments– All contradicted commitments

• Step 3. Fixed-length summaries were generated as in (Lacatusu et al. 2006):

– Top-ranked sentences clustered based on topic signatures to promote coherence– Heuristics used to insert paragraph breaks, drop words until word limit was met.

Results: Main Task

• Two differences between 2006 and 2007 versions of GISTexter:– Sentence Ranking:

• 2006: Used textual entailment to create Pyramids from 6 candidate summaries

• 2007: Learned sentence weights based on 2005, 2006 summaries– Coreference Resolution:

• 2006: Used heuristics to select sentences with “resolvable” pronouns• 2007: Used coreference resolution system to resolve all pronouns

Non-Redundancy vs. Referential Clarity

• Using output from a pronoun resolution system can boost referential clarity: but at what price?

– Only a modest gain: 3.71 4.09– Marked loss in non-redundancy: 4.60 3.89– Same redundancy filtering techniques used in 2006, 2007

• Summaries appear to be incurring a “repeat mention penalty”; need to know:

– When pronouns should be resolved– When pronouns should not be resolvedResolved Output Original Context

Need to revisit our heuristics!

Results: Update Task

• Evaluation results from the Update Task were encouraging: GISTexter produced some of the most responsive summaries evaluated in DUC 2007.

0

10

20

30

40

50

60

70

40 36 38 55 46 47 44 45 51 48 42 53 37 52 49 43 56 54 39 41 50 57

Su

m o

f R

an

ks

Peer ID

Content Responsiveness

Modified Pyramid

Results: Update Task

• On average, “B” summaries were judged to be significantly worse than either “A” or “C” summaries on both Content Responsiveness and Modified Pyramid.

– Unclear as to exactly why this was the case

– Not due to “over-filtering”: KBA was always smaller than KBA+B, less knowledge to potentially entail commitments extracted from the text.

A B C

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0

0.5

1

1.5

2

2.5

3

3.5

Mod

ified

Pyr

amid

Sco

re

Cont

ent R

espo

nsiv

enes

s

Responsiveness Mod Pyramid

Future Considerations

• What’s the right way to deal with contradictory information?– Do users want to be notified when information changes?

• When any information changes?• When relevant information changes?

– How do you incorporate updates into a coherent text?

• How can we evaluate the quality of updates?– Current approaches only measure the responsiveness of

individual summaries– Is it possible to create “gold standard” lists of the facts

(propositions?) that are available from a reading of a text?– Isn’t it enough just to be responsive?

• For Q/A or QDS – yes. For database update tasks – maybe not.

• How much recall do readers have? – Is it fair to assume that a reader of a text has access to all of the

knowledge stored in a knowledge repository?– What level of “recap” is needed?

Ensuring Stability and Control

• In order to take full advantage of the promise of machine reading for summarization, systems need to take steps to provide greater stability and control over the knowledge being added to a KB.– Control: How do we keep from introducing error-full

knowledge into our knowledge bases?– Stability: How do we keep from removing accurate

knowledge from our knowledge bases?

Qu

alit

y o

f K

now

led

ge in K

B

time

1

0

+

Introducing Errors, Removing Accurate Knowledge

Including Perfect Knowledge

IntroducingErrors!

MachineReading!

Thank you!

Semantic Question Decomposition

• Method for decomposing questions operates on a Markov Chain (MC) by performing a random walk on a bipartite graph of:

– Sequences of operators on relations (Addition(R1), Remove(R1), Replace(R1,R2))– Previous questions created by previous sequence of operators

• Markov Chain alternates between selecting a sequence of operations ({Oi}) and generating a question decomposition (Qi):

O0 O1 O2

Q1 Q2

p(Q1|O0)

p(O1|Q1)

p(Q2|O1)

p(O2|Q2)

• Assume initial state of MC depends initial sequence of operators available ({O0})

• Defining {O0} depends on access to a knowledge mapping function M1(KB, T, TC):

• KB: available knowledge base• T: available text in corpus• TC: concepts extracted from T

• Assume that {O0} represents set of operators that maximizes value of M1.

Semantic Question Decomposition

• Following (Lapata and Lascarides 2003), the role of M1 is to coerce knowledge from a conceptual representation of a text that can be used in question decomposition.

• State transition probabilities also depend on a second mapping function, M2,defined as: M2(KB, T) = {CL, RL} – CL: set of related concepts stored in a KB– RL: set of relations that exist between concepts in CL

• Both CL and RL are assumed to be discovered using M1

• This notation allows us to define a random walk for hypothesis generation

using a matrix notation:– Given N = |CL| and M = |RL|, we define:

• A stochastic matrix A (with dimensions N × M ) with entries ai,j = p(ri|hj), where ri = sequence of relations and hj = partial hypothesis generated

• … and a second matrix, B (with dimensions M × N ) with entries bi,j = p(hi|rj)

– We can estimate the probabilities for ai,j and bi,j by applying the Viterbi algorithm to the maximum likelihood estimations resulting from the knowledge mappings for M1 and M2.

• Several possibilities for M1 and M2, including:– Density functions introduced by (Resnik 1995) – Probabilistic framework for taxonomy representation (Snow et al. 2006)

Date post:	06-Jan-2016
Category:	Documents
Upload:	ilar
View:	41 times
Download:	0 times

A Machine Reading-based Approach to Update Summarization

Documents