Date post: | 01-Jan-2016 |
Category: |
Documents |
Upload: | lawrence-mckenzie |
View: | 28 times |
Download: | 1 times |
Where this paper fits
Candidate Generator
Learned filter
Candidate phrase
Extracted phrase
What happens when the candidate generator becomes very general?
Rapier: the 3-slide versionA bottom-up rule learner:
initialize RULES to be one rule per example;
repeat {
randomly pick N pairs of rules (Ri,Rj);
let {G1…,GN} be the consistent pairwise generalizations;
let G* = Gi that optimizes “compression”
let RULES = RULES + {G*} – {R’: covers(G*,R’)}
}
where compression(G,RULES) = size of RULES- {R’: covers(G,R’)} and “covers(G,R)” means every example matching G matches R
[Califf & Mooney, AAAI ‘99]
<title>Course Information for CS213</title>
<h1>CS 213 C++ Programming</h1> …
<title>Syllabus and meeting times for Eng 214</title>
<h1>Eng 214 Software Engineering for Non-programmers </h1>…
courseNum(window1) :- token(window1,’CS’), doubleton(‘CS’), prevToken(‘CS’,’CS213’), inTitle(‘CS213’), nextTok(‘CS’,’213’), numeric(‘213’), tripleton(‘213’), nextTok(‘213’,’C++’), tripleton(‘C++’), ….
courseNum(window2) :- token(window2,’Eng’), tripleton(‘Eng’), prevToken(‘Eng’,’214’), inTitle(‘214’), nextTok(‘Eng’,’214’), numeric(‘214’), tripleton(‘214’), nextTok(‘214’,’Software’), …
courseNum(X) :- token(X,A), prevToken(A, B), inTitle(B), nextTok(A,C)), numeric(C), tripleton(C), nextTok(C,D), …
Common conditions carried over to generalization
Differences dropped
Rapier: an alternative approach- Combines top-down and bottom-up learning
- Bottom-up to find common restrictions on content- Top-down greedy addition of restrictions on context
- Use of part-of-speech and semantic features (from WORDNET).
- Special “pattern-language” based on sequences of tokens, each of which satisfies one of a set of given constraints- < <tok{‘ate’,’hit’},POS{‘vb’}>, <tok{‘the’}>, <POS{‘nn’>>
Rapier: IE with “rules”
• Rule consists of – Pre-filler pattern– Filler pattern– Post-filler pattern
• Pattern composed of elements, and each pattern element matches a sequence of words that obeys constraints on– value, POS, Wordnet class of words in sequence– Total length of sequence
• Example: “IBM paid an undisclosed amount” matches– PreFiller: <pos=nn or nnp>:1 <anyword>:2– Filler: <word=‘undisclosed’>:1– PostFiller: <semanticClass=‘price’>
• A rule might match many times in a document
Algorithm:1
Start with a huge ruleset – one rule per example
Every new rule “compresses” the ruleset
Expect many high-precision, low-recall rules
Covers many pos, few neg examples
ie. Redundant
Filler: <word=‘SOFTWARE’> <word=‘PROGRAMMER’>
PostFiller: <word=‘Position’> <word=‘available’> <word=‘for’> …. <word=‘memphisonline’> <word=‘.’> <word=‘com’>
PreFiller: <word=‘Subject’> <word=‘:’> … <word=‘com’> <word=‘>’>
Plus POS, info for each word in each pattern (but not semantic class)
An “obvious choice”
• Some sort of search based on this primitive step:– Pick a PAIR of rules R1,R2– Form the most specific possible rule that is
more general than either R1 or R2– But in Rapier’s language there are multiple
generalizations of R1 and R2…
Algorithm: 3
Specialize by adding conditions from pairwise generalizations, starting with filler and working out
Algorithm 4: pairwise generalization of patterns
Heuristic search for best (in some graph sense) possible generalization of Wordnet semantic classes
Algorithm 4: pairwise generalization of patterns
Explore several ways to generalize differing POS of word values:
Algorithm 4: pairwise generalization of patterns
Patterns of the same length can be generalized element-by-element.
Patterns of different length….
- special cases when one list of length zero or one
- “punt” and generate a single very general pattern if the lists are both long
- if both patterns are moderately short, consider many possible pairings of pattern elements
Algorithm 4: pairwise generalization of patterns
If both patterns are moderately short, consider many possible pairings of pattern elements
ABCBA
ABD
ABCBA
ABD
A + B + <[ABCD]{1,4}>
A + <BC>? + B + [AD]
Extraction by Sliding Window GRAND CHALLENGES FOR MACHINE LEARNING
Jaime Carbonell School of Computer Science Carnegie Mellon University
3:30 pm 7500 Wean Hall
Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.
CMU UseNet Seminar Announcement
E.g.Looking forseminarlocation
A “Naïve Bayes” Sliding Window Model [Freitag 1997]
00 : pm Place : Wean Hall Rm 5409 Speaker : Sebastian Thrunw t-m w t-1 w t w t+n w t+n+1 w t+n+m
prefix contents suffix
Other examples of sliding window: [Baluja et al 2000](decision tree over individual words & their context)
If P(“Wean Hall Rm 5409” = LOCATION) is above some threshold, extract it.
… …
Estimate Pr(LOCATION|window) using Bayes rule
Try all “reasonable” windows (vary length, position)
Assume independence for length, prefix words, suffix words, content words
Estimate from data quantities like: Pr(“Place” in prefix|LOCATION)
“Naïve Bayes” Sliding Window Results
GRAND CHALLENGES FOR MACHINE LEARNING
Jaime Carbonell School of Computer Science Carnegie Mellon University
3:30 pm 7500 Wean Hall
Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.
Domain: CMU UseNet Seminar Announcements
Field F1 Person Name: 30%Location: 61%Start Time: 98%
Discussion questions
• Is this candidate classification?• Is complexity good or bad in a learned
hypothesis? In a learning system?• What are the tradeoffs in expressive rule
languages vs simple ones?– Is RAPIER successful in using long-range
information? – What other ways are there to get this
information?
Goal: learn from a human teacher how to extract certain database records from a particular web site.
Why learning from few examples is important
At training time, only four examples are available—but one would like to generalize to future pages as well…
Must generalize across time as well as across a single site
•Previous work in page classification using links:
• Exploit hyperlinks (Slattery&Mitchell 2000; Cohn&Hofmann, 2001; Joachims 2001): Documents pointed to by the same “hub” should have the same class.
• What’s new in this paper:
• Use structure of hub pages (as well as structure of site graph) to find better “hubs”
• Adapt an existing “wrapper learning” system to find structure, on the task of classifying “executive bio pages”.
Idea: use the wrapper-learner to learn to extract links to
execBio pages, smoothing the “noisy” data produced by the
initial page classifier.
Task: train a page classifier, then use it to classify pages on a new, previously-unseen web site as executiveBio or other
Question: can index pages for executive biographies be used
to improve classification?
Background: “co-training” (Mitchell&Blum, ‘98)
• Suppose examples are of the form (x1,x2,y) where x1,x2 are independent (given y), and where each xi is sufficient for classification, and unlabeled examples are cheap. – (E.g., x1 = bag of words, x2 = bag of links).
• Co-training algorithm:1. Use x1’s (on labeled data D) to train f1(x)=y
2. Use f1 to label additional unlabeled examples U.3. Use x2’s (on labeled part of U+D to train f1(x)=y4. Repeat . . .
Simple 1-step co-training for web pages
f1 is a bag-of-words page classifier, and S is web site containing unlabeled pages.
• Feature construction. Represent a page x in S as a bag of pages that link to x (“bag of hubs”).
• Learning. Learn f2 from the bag-of-hubs examples, labeled with f1
• Labeling. Use f2(x) to label pages from S.
Idea: use one round of co-training to bootstrap the bag-of words classifier to one that uses site-specific features x2/f2
Improved 1-step co-training for web pages
Feature construction. - Label an anchor a in S as positive iff it points to a positive page x (according to f1). Let D = {(x’,a): a is a positive anchor on x’}. - Generate many small training sets Di from D, by sliding small windows over D.- Let P be the set of all “structures” found by any builder from any subset Di
- Say that p links to x if p extracts an anchor that points to x. Represent a page x as the bag of structures in P that link to x.
Learning and Labeling. As before.
BOH representation:
{ List1, List3,…}, PR
{ List1, List2, List3,…}, PR
{ List2, List 3,…}, Other
{ List2, List3,…}, PR
…
Learner
Experimental results
1 2 3 4 5 6 7 8 9
Winnow
None0
0.05
0.1
0.15
0.2
0.25
Winnow
D-Tree
None
Co-training hurts No improvement
Summary- “Builders” (from a wrapper learning system) let
one discover and use structure of web sites and index pages to smooth page classification results.
- Discovering good “hub structures” makes it possible to use 1-step co-training on small (50-200 example) unlabeled datasets.– Average error rate was reduced from 8.4% to 3.6%.– Difference is statistically significant with a 2-tailed paired sign test or t-test.– EM with probabilistic learners also works—see (Blei et al, UAI 2002)