+ All Categories
Home > Documents > IE by Candidate Classification: Califf & Mooney

IE by Candidate Classification: Califf & Mooney

Date post: 01-Jan-2016
Category:
Upload: lawrence-mckenzie
View: 28 times
Download: 1 times
Share this document with a friend
Description:
IE by Candidate Classification: Califf & Mooney. William Cohen 1/21/03. Where this paper fits. What happens when the candidate generator becomes very general?. Candidate Generator. Candidate phrase. Learned filter. Extracted phrase. Example task for Rapier. Rapier: the 3-slide version. - PowerPoint PPT Presentation
45
IE by Candidate Classification: Califf & Mooney William Cohen 1/21/03
Transcript

IE by Candidate Classification:Califf & Mooney

William Cohen

1/21/03

Where this paper fits

Candidate Generator

Learned filter

Candidate phrase

Extracted phrase

What happens when the candidate generator becomes very general?

Example task for Rapier

Rapier: the 3-slide versionA bottom-up rule learner:

initialize RULES to be one rule per example;

repeat {

randomly pick N pairs of rules (Ri,Rj);

let {G1…,GN} be the consistent pairwise generalizations;

let G* = Gi that optimizes “compression”

let RULES = RULES + {G*} – {R’: covers(G*,R’)}

}

where compression(G,RULES) = size of RULES- {R’: covers(G,R’)} and “covers(G,R)” means every example matching G matches R

[Califf & Mooney, AAAI ‘99]

<title>Course Information for CS213</title>

<h1>CS 213 C++ Programming</h1> …

<title>Syllabus and meeting times for Eng 214</title>

<h1>Eng 214 Software Engineering for Non-programmers </h1>…

courseNum(window1) :- token(window1,’CS’), doubleton(‘CS’), prevToken(‘CS’,’CS213’), inTitle(‘CS213’), nextTok(‘CS’,’213’), numeric(‘213’), tripleton(‘213’), nextTok(‘213’,’C++’), tripleton(‘C++’), ….

courseNum(window2) :- token(window2,’Eng’), tripleton(‘Eng’), prevToken(‘Eng’,’214’), inTitle(‘214’), nextTok(‘Eng’,’214’), numeric(‘214’), tripleton(‘214’), nextTok(‘214’,’Software’), …

courseNum(X) :- token(X,A), prevToken(A, B), inTitle(B), nextTok(A,C)), numeric(C), tripleton(C), nextTok(C,D), …

Common conditions carried over to generalization

Differences dropped

Rapier: an alternative approach- Combines top-down and bottom-up learning

- Bottom-up to find common restrictions on content- Top-down greedy addition of restrictions on context

- Use of part-of-speech and semantic features (from WORDNET).

- Special “pattern-language” based on sequences of tokens, each of which satisfies one of a set of given constraints- < <tok{‘ate’,’hit’},POS{‘vb’}>, <tok{‘the’}>, <POS{‘nn’>>

Rapier: the N slide version

Rapier: IE with “rules”

• Rule consists of – Pre-filler pattern– Filler pattern– Post-filler pattern

• Pattern composed of elements, and each pattern element matches a sequence of words that obeys constraints on– value, POS, Wordnet class of words in sequence– Total length of sequence

• Example: “IBM paid an undisclosed amount” matches– PreFiller: <pos=nn or nnp>:1 <anyword>:2– Filler: <word=‘undisclosed’>:1– PostFiller: <semanticClass=‘price’>

• A rule might match many times in a document

Algorithm:1

Start with a huge ruleset – one rule per example

Every new rule “compresses” the ruleset

Expect many high-precision, low-recall rules

Covers many pos, few neg examples

ie. Redundant

Filler: <word=‘SOFTWARE’> <word=‘PROGRAMMER’>

PostFiller: <word=‘Position’> <word=‘available’> <word=‘for’> …. <word=‘memphisonline’> <word=‘.’> <word=‘com’>

PreFiller: <word=‘Subject’> <word=‘:’> … <word=‘com’> <word=‘>’>

Plus POS, info for each word in each pattern (but not semantic class)

Algorithm:2

What is FindNewRule?

An “obvious choice”

• Some sort of search based on this primitive step:– Pick a PAIR of rules R1,R2– Form the most specific possible rule that is

more general than either R1 or R2– But in Rapier’s language there are multiple

generalizations of R1 and R2…

Algorithm: 3

Specialize by adding conditions from pairwise generalizations, starting with filler and working out

Algorithm 4: pairwise generalization of patterns

Heuristic search for best (in some graph sense) possible generalization of Wordnet semantic classes

Algorithm 4: pairwise generalization of patterns

Explore several ways to generalize differing POS of word values:

Algorithm 4: pairwise generalization of patterns

Patterns of the same length can be generalized element-by-element.

Patterns of different length….

- special cases when one list of length zero or one

- “punt” and generate a single very general pattern if the lists are both long

- if both patterns are moderately short, consider many possible pairings of pattern elements

Algorithm 4: pairwise generalization of patterns

If both patterns are moderately short, consider many possible pairings of pattern elements

ABCBA

ABD

ABCBA

ABD

A + B + <[ABCD]{1,4}>

A + <BC>? + B + [AD]

Algorithm: 5 Evaluation of Rules

Results

Extraction by Sliding Window GRAND CHALLENGES FOR MACHINE LEARNING

Jaime Carbonell School of Computer Science Carnegie Mellon University

3:30 pm 7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

E.g.Looking forseminarlocation

A “Naïve Bayes” Sliding Window Model [Freitag 1997]

00 : pm Place : Wean Hall Rm 5409 Speaker : Sebastian Thrunw t-m w t-1 w t w t+n w t+n+1 w t+n+m

prefix contents suffix

Other examples of sliding window: [Baluja et al 2000](decision tree over individual words & their context)

If P(“Wean Hall Rm 5409” = LOCATION) is above some threshold, extract it.

… …

Estimate Pr(LOCATION|window) using Bayes rule

Try all “reasonable” windows (vary length, position)

Assume independence for length, prefix words, suffix words, content words

Estimate from data quantities like: Pr(“Place” in prefix|LOCATION)

“Naïve Bayes” Sliding Window Results

GRAND CHALLENGES FOR MACHINE LEARNING

Jaime Carbonell School of Computer Science Carnegie Mellon University

3:30 pm 7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

Domain: CMU UseNet Seminar Announcements

Field F1 Person Name: 30%Location: 61%Start Time: 98%

Results: second domain

Discussion questions

• Is this candidate classification?• Is complexity good or bad in a learned

hypothesis? In a learning system?• What are the tradeoffs in expressive rule

languages vs simple ones?– Is RAPIER successful in using long-range

information? – What other ways are there to get this

information?

Wrapper learning

Cohen et al, WWW2002

Goal: learn from a human teacher how to extract certain database records from a particular web site.

Learner

Why learning from few examples is important

At training time, only four examples are available—but one would like to generalize to future pages as well…

Must generalize across time as well as across a single site

now some details….

Improving A Page Classifier with Anchor Extraction

and Link Analysis

William W. Cohen

NIPS 2002

•Previous work in page classification using links:

• Exploit hyperlinks (Slattery&Mitchell 2000; Cohn&Hofmann, 2001; Joachims 2001): Documents pointed to by the same “hub” should have the same class.

• What’s new in this paper:

• Use structure of hub pages (as well as structure of site graph) to find better “hubs”

• Adapt an existing “wrapper learning” system to find structure, on the task of classifying “executive bio pages”.

Intuition: links from this “hub page” are informative…

…especially these links

Idea: use the wrapper-learner to learn to extract links to

execBio pages, smoothing the “noisy” data produced by the

initial page classifier.

Task: train a page classifier, then use it to classify pages on a new, previously-unseen web site as executiveBio or other

Question: can index pages for executive biographies be used

to improve classification?

Background: “co-training” (Mitchell&Blum, ‘98)

• Suppose examples are of the form (x1,x2,y) where x1,x2 are independent (given y), and where each xi is sufficient for classification, and unlabeled examples are cheap. – (E.g., x1 = bag of words, x2 = bag of links).

• Co-training algorithm:1. Use x1’s (on labeled data D) to train f1(x)=y

2. Use f1 to label additional unlabeled examples U.3. Use x2’s (on labeled part of U+D to train f1(x)=y4. Repeat . . .

Simple 1-step co-training for web pages

f1 is a bag-of-words page classifier, and S is web site containing unlabeled pages.

• Feature construction. Represent a page x in S as a bag of pages that link to x (“bag of hubs”).

• Learning. Learn f2 from the bag-of-hubs examples, labeled with f1

• Labeling. Use f2(x) to label pages from S.

Idea: use one round of co-training to bootstrap the bag-of words classifier to one that uses site-specific features x2/f2

Improved 1-step co-training for web pages

Feature construction. - Label an anchor a in S as positive iff it points to a positive page x (according to f1). Let D = {(x’,a): a is a positive anchor on x’}. - Generate many small training sets Di from D, by sliding small windows over D.- Let P be the set of all “structures” found by any builder from any subset Di

- Say that p links to x if p extracts an anchor that points to x. Represent a page x as the bag of structures in P that link to x.

Learning and Labeling. As before.

builder

extractor

List1

builder

extractor

List2

builder

extractor

List3

BOH representation:

{ List1, List3,…}, PR

{ List1, List2, List3,…}, PR

{ List2, List 3,…}, Other

{ List2, List3,…}, PR

Learner

Experimental results

1 2 3 4 5 6 7 8 9

Winnow

None0

0.05

0.1

0.15

0.2

0.25

Winnow

D-Tree

None

Co-training hurts No improvement

Summary- “Builders” (from a wrapper learning system) let

one discover and use structure of web sites and index pages to smooth page classification results.

- Discovering good “hub structures” makes it possible to use 1-step co-training on small (50-200 example) unlabeled datasets.– Average error rate was reduced from 8.4% to 3.6%.– Difference is statistically significant with a 2-tailed paired sign test or t-test.– EM with probabilistic learners also works—see (Blei et al, UAI 2002)


Recommended