+ All Categories
Home > Documents > CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003...

CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003...

Date post: 02-Jan-2016
Category:
Upload: bruce-shepherd
View: 219 times
Download: 0 times
Share this document with a friend
Popular Tags:
65
CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington, G. Neumann, M. Venkataramani, and D. Radev)
Transcript
Page 1: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

CS276BText Information Retrieval, Mining, and

Exploitation

Lecture 13Text Mining IIFeb 27, 2003

(includes slides borrowed from J. Allan, G. Doddington, G. Neumann, M. Venkataramani, and D. Radev)

Page 2: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Today’s Topics

First story detection (FSD) Summarization Coreference resolution

Page 3: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

First Story Detection

Page 4: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

First Story Detection

Automatically identify the first story on a new event from a stream of text

Topic Detection and Tracking – TDT “Bake-off” sponsored by US government

agencies Applications

Intelligence services Finance: Be the first to trade a stock

Page 5: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Examples

2002 Presidential Elections Thai Airbus Crash (11.12.98)

On topic: stories reporting details of the crash, injuries and deaths; reports on the investigation following the crash; policy changes due to the crash (new runway lights were installed at airports).

Euro Introduced (1.1.1999) On topic: stories about the preparation for the common currency (negotiations about

exchange rates and financial standards to be shared among the member nations); official introduction of the Euro; economic details of the shared currency; reactions within the EU and around the world.

Page 6: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

First Story Detection

Other technologies don’t work for this Information retrieval Text classification Why?

Page 7: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

There is no supervised topic training (like Topic Detection)

Time

First Stories

Not First Stories

= Topic 1= Topic 2

The First-Story Detection Task

To detect the first story that discusses a topic, for all topics.

Page 8: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Definitions

Event: A reported occurrence at a specific time and place, and the unavoidable consequences. Specific elections, accidents, crimes, natural disasters.

Activity: A connected set of actions that have a common focus or purpose - campaigns, investigations, disaster relief efforts.

Topic: a seminal event or activity, along with all directly related events and activities

Story: a topically cohesive segment of news that includes two or more DECLARATIVE independent clauses about a single topic.

Page 9: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

TDT Tasks

First story detection (FSD) Detect the first story on a new topic

Topic tracking Once a topic has been detected, identify

subsequent stories about it Standard text classification task However, very small training set (initially:

1!)

Page 10: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

First Story Detection (FSD)

First story detection is an unsupervised learning task.

On-line vs. Retrospective On-line: Flag onset of new events from live news feeds

as stories come in Retrospective: Detection consists of identifying first

story looking back over longer period Lack of advance knowledge of new events, but have

access to unlabeled historical data as a contrast set FSD input: stream of stories in chronological order

simulating real-time incoming document stream FSD output: YES/NO decision per document

Page 11: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Patterns in Event Distributions

News stories discussing the same event tend to be temporally proximate

A time gap between burst of topically similar stories is often an indication of different events

Different earthquakes Airplane accidents

A significant vocabulary shift and rapid changes in term frequency are typical of stories reporting a new event, including previously unseen proper nouns

Events are typically reported in a relatively brief time window of 1- 4 weeks

Page 12: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Similar Events over Time

Page 13: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

TDT: The Corpus

TDT evaluation corpora consist of text and transcribed news from 1990s.

A set of target events (e.g., 119 in TDT2) is used for evaluation

Corpus is tagged for these events (including first story)

TDT2 consists of 60,000 news stories, Jan-June 1998, about 3,000 are “on topic” for one of 119 topics

Stories are arranged in chronological order

Page 14: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Ideas?

Page 15: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Approach 1: KNN

On-line processing of each incoming story Compute similarity to all previous stories

Cosine similarity Language model Prominent terms Extracted entities

If similarity is below threshold: new story If similarity is above threshold for previous

document d: assign to topic of d Optimal threshold can be chosen based on

historical data Threshold is not topic specific!

Page 16: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Variant: Single Pass Clustering

Assign each incoming document to one of a set of topic clusters

A topic cluster is represented by its centroid (vector average of members)

For incoming story compute similarity s with centroid

As before: s>θ: add document to corresponding cluster s<θ: first story!

Page 17: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Approach 2: KNN + Time

Only consider documents in a (short) time window

Compute similarity in a time weighted fashion:

m: number of documents in window, d_i: ith document in window

Time weighting significantly increases performance.

Page 18: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

FSD - Results

Umass , CMU: Single-Pass Clustering

Page 19: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

FSD Error vs. Classification Error

Page 20: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Discussion

Hard problem Becomes harder the more topics need to

be tracked. Why? Second Story Detection much easier that

First Story Detection Example: retrospective detection of first

9/11 story easy, on-line detection hard

Page 21: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Summarization

Page 22: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

What is a Summary?

Informative summary Purpose: replace original document Example: executive summary

Indicative summary Purpose: support decision: do I want to

read original document yes/no? Example: Headline, scientific abstract

Page 23: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Why Automatic Summarization?

Algorithm for reading many genres is:1) read summary2) decide whether relevant or not3) if relevant: read whole document

Summary is gate-keeper for large number of documents.

Information overload Often the summary is all that is read.

Example from last quarter: summaries of search engine hits

Human-generated summaries are expensive.

Page 24: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Summary Length (Reuters)

Goldstein et al. 1999

Page 25: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Summary Compression (Reuters)

Goldstein et al. 1999

Page 26: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,
Page 27: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Summarization Algorithms

Natural language understanding / generation Build knowledge representation of text Generate sentences summarizing content Hard to do well

Keyword summaries Display most significant keywords Easy to do Hard to read, poor representation of content

Sentence extraction Extract key sentences Medium hard Summaries often don’t read well Good representation of content

Page 28: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Sentence Extraction

Represent each sentence as a feature vector Compute score based on features Select n highest-ranking sentences Present in order in which they occur in text. Postprocessing to make summary more

readable/concise Eliminate redundant sentences Anaphors/pronouns Delete subordinate clauses, parentheticals

Oracle Context

Page 29: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Sentence Extraction: Example

Sigir95 paper on summarization by Kupiec, Pedersen, Chen

Trainable sentence extraction

Proposed algorithm is applied to its own description (the paper)

Page 30: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Sentence Extraction: Example

Page 31: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Feature Representation

Fixed-phrase feature Certain phrases indicate summary, e.g. “in

summary” Paragraph feature

Paragraph initial/final more likely to be important.

Thematic word feature Repetition is an indicator of importance

Uppercase word feature Uppercase often indicates named entities.

(Taylor) Sentence length cut-off

Summary sentence should be > 5 words.

Page 32: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Feature Representation (cont.)

Sentence length cut-off Summary sentences have a minimum length.

Fixed-phrase feature True for sentences with indicator phrase

“in summary”, “in conclusion” etc. Paragraph feature

Paragraph initial/medial/final Thematic word feature

Do any of the most frequent content words occur?

Uppercase word feature Is uppercase thematic word introduced?

Page 33: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Training

Hand-label sentences in training set (good/bad summary sentences)

Train classifier to distinguish good/bad summary sentences

Model used: Naïve Bayes

Can rank sentences according to score and show top n to user.

Page 34: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Evaluation

Compare extracted sentences with sentences in abstracts

Page 35: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Evaluation

Baseline (choose first n sentences): 24% Overall performance (42-44) not very

good. However, there is more than one good

summary.

Page 36: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Multi-Document (MD) Summarization

Summarize more than one document Why is this harder? But benefit is large (can’t scan 100s of

docs) To do well, need to adopt more specific

strategy depending on document set. Other components needed for a production

system, e.g., manual postediting. DUC: government sponsored bake-off

200 or 400 word summaries Longer -> easier

Page 37: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Types of MD Summaries

Single event/person tracked over a long time period Elizabeth Taylor’s bout with pneumonia Give extra weight to character/event May need to include outcome (dates!)

Multiple events of a similar nature Marathon runners and races More broad brush, ignore dates

An issue with related events Gun control Identify key concepts and select sentences

accordingly

Page 38: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Determine MD Summary Type

First, determine which type of summary to generate

Compute all pairwise similarities Very dissimilar articles -> multi-event

(marathon) Mostly similar articles

Is most frequent concept named entity? Yes -> single event/person (Taylor) No -> issue with related events (gun

control)

Page 39: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

MultiGen Architecture (Columbia)

Page 40: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Generation

Ordering according to date Intersection

Find concepts that occur repeatedly in a time chunk

Sentence generator

Page 41: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Processing

Selection of good summary sentences Elimination of redundant sentences Replace anaphors/pronouns with noun

phrases they refer to Need coreference resolution

Delete non-central parts of sentences

Page 42: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Performance (Columbia System)

(1) Precision and recall on “model units” (facts)

(2) Coherence, grammaticality, readability

Page 43: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Newsblaster (Columbia)

Page 44: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Query-Specific Summarization

So far, we’ve look at generic summaries. A generic summary makes no assumption

about the reader’s interests. Query-specific summaries are specialized

for a single information need, the query. Summarization is much easier if we have a

description of what the user wants. Recall from last quarter:

Google-type excerpts – simply show keywords in context

Page 45: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Genre

Some genres are easy to summarize Newswire stories Inverted pyramid structure The first n sentences are often the best

summary of length n Some genres are hard to summarize

Poems Long documents (novels, the bible)

Trainable summarizers are genre-specific.

Page 46: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Non-Text Summaries

Summarization also important for non-text: Speech (phone conversations, radio) Video (surveillance, TV)

Similar techniques are used. Text is easier to scan than speech/video.

Page 47: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Discussion

Correct parsing of document format is critical. Need to know headings, sequence, etc.

Limits of current technology Some good summaries require natural

language understanding Example: President Bush’s nominees for

ambassadorships Contributors to Bush’s campaign Veteran diplomats Others

Page 48: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Coreference Resolution

Page 49: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Coreference

Two noun phrases referring to the same entity are said to corefer.

Example: Transcription from RL95-2 is mediated through an ERE element at the 5-flanking region of the gene.

Coreference resolution is important for many text mining tasks: Information extraction Summarization First story detection

Page 50: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Types of Coreference

Noun phrases: Transcription from RL95-2 … the gene …

Pronouns: They induced apoptosis. Possessives: … induces their rapid

dissociation … Demonstratives: This gene is responsible

for Alzheimer’s

Page 51: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Preferences in Pronoun Interpretation

Recency: John has an Integra. Bill has a legend. Mary likes to drive it.

Grammatical role: John went to the Acura dealership with Bill. He bought an Integra.

Non-ambiguity: John and Bill went to the Acura dealership. He bought an Integra.

Repeated mention: John needed a car to go to his new job. He decided that he wanted something sporty. Bill went to the Acura dealership with him. He bought an Integra.

Copyright: D. Radev

Page 52: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Preferences in Pronoun Interpretation

Parallelism: Mary went with Sue to the Acura dealership. Sally went with her to the Mazda dealership.

??? Mary went with Sue to the Acura dealership. Sally told her not to buy anything.

Verb semantics: John telephoned Bill. He lost his pamphlet on Acuras. John criticized Bill. He lost his pamphlet on Acuras.

Copyright: D. Radev

Page 53: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Algorithm for Coreference Resolution

Two steps: discourse model update and pronoun resolution.

Salience values are introduced when a noun phrase that evokes a new entity is encountered.

Salience factors: set empirically.

Copyright: D. Radev

Page 54: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Salience Weights (Lappin&Leass)

Sentence recency 100

Subject emphasis 80

Existential emphasis 70

Accusative emphasis 50

Indirect object and oblique complement emphasis

40

Non-adverbial emphasis 50

Head noun emphasis 80

Copyright: D. Radev

Page 55: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Lappin&Leass (cont’d)

Recency: weights are cut in half after each sentence is processed.

Examples: An Acura Integra is parked in the lot. There is an Acura Integra parked in the lot. John parked an Acura Integra in the lot. John gave Susan an Acura Integra. In his Acura Integra, John showed Susan his new CD

player.

Copyright: D. Radev

Page 56: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Algorithm (Lappin&Leass)

1. Collect the potential referents (up to four sentences back).

2. Remove potential referents that do not agree in number or gender with the pronoun.

3. Remove potential referents that do not pass intrasentential syntactic coreference constraints.

4. Compute the total salience value of the referent by adding any applicable values for role parallelism (+35) or cataphora (-175).

5. Select the referent with the highest salience value. In case of a tie, select the closest referent in terms of string position.

Copyright: D. Radev

Page 57: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Example

John saw a beautiful Acura Integra at the dealership last week. He showed it to Bob. He bought it.

Rec Subj Exist ObjIndObj

NonAdv

HeadN Total

John 100 80 50 80 310

Integra 100 50 50 80 280

dealership 100 50 80 230

Copyright: D. Radev

Page 58: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Example (cont’d)

Referent Phrases Value

John {John} 165

Integra{a beautiful Acura Integra} 140

dealership {the dealership} 115

Copyright: D. Radev

Page 59: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Example (cont’d)

Referent Phrases Value

John {John, he1} 475

Integra{a beautiful Acura Integra} 140

dealership {the dealership} 115

Copyright: D. Radev

Page 60: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Example (cont’d)

Referent Phrases Value

John {John, he1} 475

Integra {a beautiful Acura Integra, it} 400

dealership {the dealership} 115

Copyright: D. Radev

Page 61: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Example (cont’d)

Referent Phrases Value

John {John, he1} 475

Integra {a beautiful Acura Integra, it} 400

Bill {Bill} 270

dealership {the dealership} 115

Copyright: D. Radev

Page 62: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Example (cont’d)

Referent Phrases Value

John {John, he1} 237.5

Integra{a beautiful Acura Integra, it1} 200

Bill {Bill} 135

dealership {the dealership} 57.5

Copyright: D. Radev

Page 63: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Observations

Lappin & Leass - tested on computer manuals - 86% accuracy on unseen data.

Centering (Grosz, Josh, Weinstein): additional concept of a “center”.

Centering has not been automatically tested on actual data.

Copyright: D. Radev

Page 64: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

MUC Information Extraction:State of the Art c. 1997

NE – named entity recognitionCO – coreference resolutionTE – template element constructionTR – template relation constructionST – scenario template production

Page 65: CS276B Text Information Retrieval, Mining, and Exploitation Lecture 13 Text Mining II Feb 27, 2003 (includes slides borrowed from J. Allan, G. Doddington,

Resources

[4] Umass at TDT2000, Allan, Lavrenko, Frey, Khandelwal (Umass, 2000) [6] Learning Approaches for Detecting and Tracking News Events, Yang, Carbonell,

Brown (CMU, 1999) A study on retrospective and on-line event detection. Yang, Pierce, Carbonell. http://www.cs.columbia.edu/nlp/newsblaster/ A Trainable Document Summarizer (1995) Julian Kupiec, Jan Pedersen, Francine

ChenResearch and Development in Information Retrieval The Columbia Multi-Document Summarizer for DUC 2002 K. McKeown, D. Evans, A.

Nenkova, R. Barzilay, V. Hatzivassiloglou, B. Schiffman, S. Blair-Goldensohn, J. Klavans, S. Sigelman, Columbia University

Coreference: detailed discussion of the term: http://www.ldc.upenn.edu/Projects/ACE/PHASE2/Annotation/guidelines/EDT/coreference.shtml


Recommended