Outline: Where have we been and were are we going? Were making consistent progress, or Were running...

Outline: Where have we been and were are we going?

• We’re making consistent progress, or• We’re running around in circles, or

– 1950s: Empiricism (Information Theory, Behaviorism)– 1970s: Rationalism (AI, Cognitive Psychology)– 1990s: Empiricism (Data Mining, Statistical NLP, Speech)– 2010s: Rationalism (TBD)

• We’re going off a cliff…– Don’t worry; be happy

0%20%40%60%80%

100%

19

85

19

90

19

95

20

00

20

05

ACL Meeting

% S

tati

sti

cal

P

ap

ers

Bob Moore Fred Jelinek

No matter what happens, it’s goin’

be great!

Rising tide of data lifts all boats

Rising Tide of Data Lifts All BoatsIf you have a lot of data, then you don’t need a lot of methodology

• 1985: “There is no data like more data”– Fighting words uttered by radical fringe elements

(Mercer at Arden House)• 1995: The Web changes everything• All you need is data (magic sauce)

– No linguistics– No artificial intelligence (representation)– No machine learning– No statistics– No error analysis– No data mining– No text mining

“It never pays to think until you’ve run out of data” – Eric Brill

Banko & Brill: Mitigating the Paucity-of-Data Problem (HLT 2001)

Fire everybody and spend the money on data

More data is better data!

No consistentlybest learner

Quo

ted

out

of c

onte

xt

Moore’s Law Constant:Data Collection Rates Improvement Rates

The rising tide of data will lift all boats!TREC Question Answering & Google:

What is the highest point on Earth?

The rising tide of data will lift all boats!Acquiring Lexical Resources from Data:

Dictionaries, Ontologies, WordNets, Language Models, etc.http://labs1.google.com/sets

England Japan Cat cat

France China Dog more

Germany India Horse ls

Italy Indonesia Fish rm

Ireland Malaysia Bird mv

Spain Korea Rabbit cd

Scotland Taiwan Cattle cp

Belgium Thailand Rat mkdir

Canada Singapore Livestock man

Austria Australia Mouse tail

Australia Bangladesh Human pwd

http://labs1.google.com/sets

Applications• What good is word sense disambiguation (WSD)?

– Information Retrieval (IR)• Salton: Tried hard to find ways to use NLP to help IR

– but failed to find much (if anything)• Croft: WSD doesn’t help because IR is already using those methods• Sanderson (next two slides)

– Machine Translation (MT)• Original motivation for much of the work on WSD• But IR arguments may apply just as well to MT

• What good is POS tagging? Parsing? NLP? Speech?• Commercial Applications of Natural Language Processing,

CACM 1995– $100M opportunity (worthy of government/industry’s attention)

1. Search (Lexis-Nexis)2. Word Processing (Microsoft)

• Warning: premature commercialization is risky

Don’t worry;Be happy

Sanderson (SIGIR-94)http://dis.shef.ac.uk/mark/cv/publications/papers/my_papers/SIGIR94.pdf

Not much?

• Could WSD help IR?• Answer: no

– Introducing ambiguity by pseudo-words doesn’t hurt (much)

Short queries matter most, but hardest for WSD

F

Query Length (Words)

5 Ia

n A

nder

sons

http://dis.shef.ac.uk/mark/cv/publications/papers/my_papers/SIGIR94.pdf

Sanderson (SIGIR-94)http://dis.shef.ac.uk/mark/cv/publications/papers/my_papers/SIGIR94.pdf

• Resolving ambiguity badly is worse than not resolving at all– 75% accurate WSD

degrades performance– 90% accurate WSD:

breakeven point

Soft WSD?

Query Length (Words)

F

http://dis.shef.ac.uk/mark/cv/publications/papers/my_papers/SIGIR94.pdf

Some Promising Suggestions(Generate lots of conference papers, but may not support the field)

• Two Languages are Better than One– For many classic hard NLP

problems• Word Sense

Disambiguation (WSD)• PP-attachment• Conjunction• Predicate-argument

relationships• Japanese and Chinese

Word breaking– Parallel corpora plenty

of annotated (labeled) testing and training data

– Don’t need unsupervised magic (data >> magic)

• Demonstrate that NLP is good for something– Statistical methods (IR & WSD)

focus on bags of nouns,• Ignoring verbs, adjectives,

predicates, intensifiers, etc.– Hypothesis: Ignored because

perceptrons can’t model XOR– Task: classify “comments” into

“good,” “bad” and “neutral”• Lots of terms associated with just

one category• Some associated with two

– Depending on argument• Good & Bad, but not neutral:

Mickey Mouse, Rinky Dink– Bad: Mickey Mouse(us)– Good: Mickey Mouse(them)

– Current IR/WSD methods don’t capture predicate-argument relationships

Web Apps: Document Language Model ≠ Query Language Model

• Documents– Function Words– Adjectives– Verbs– Predicates

• Queries– Typos– Brand Names– Celebrities– Named Entities– Slower Vocab Growth

Technical Op: Reduce IR to Translation

Promising Apps: Web Spam, Frame Problem

Speech Data Mining & Call Centers:

An Intelligence Bonanza • Some companies are collecting

information with technology designed to monitor incoming calls for service quality.

• Last summer, Continental Airlines Inc. installed software from Witness Systems Inc. to monitor the 5,200 agents in its four reservation centers.

• But the Houston airline quickly realized that the system, which records customer phone calls and information on the responding agent's computer screen, also was an intelligence bonanza, says André Harris, reservations training and quality-assurance director.

Speech Data Mining• Label calls as success or failure based on

some subsequent outcome (sale/no sale)

• Extract features from speech

• Find patterns of features that can be used to predict outcomes

• Hypotheses:– Customer: “I’m not interested” no sale– Agent: “I just want to tell you…” no sale

Inter-ocular effect (hits you between the eyes);Don’t need a statistician to know which way the wind is blowing

Outline

• We’re making consistent progress, or

• We’re running around in circles, or– Don’t worry; be happy

• We’re going off a cliff…

According to unnamed sources:Speech Winter Language Winter

Dot Boom Dot Bust

Sample of 20 Survey Questions(Strong Emphasis on Applications)

• When will– More than 50% of new PCs have dictation on them, either at

purchase or shortly after.– Most telephone Interactive Voice Response (IVR) systems

accept speech input.– Automatic airline reservation by voice over the telephone is the

norm.– TV closed-captioning (subtitling) is automatic and pervasive.– Telephones are answered by an intelligent answering machine

that converses with the calling party to determine the nature and priority of the call.

– Public proceedings (e.g., courts, public inquiries, parliament, etc.) are transcribed automatically.

• Two surveys of ASRU attendees: 1997 & 2003

Hockey StickBusiness Case

2003 2004 2005

t

$

LastYear

ThisYear Next

Year

2003 Responses ≈ 1997 Responses + 6 Years(6 years of hard work No progress)

Wrong Apps?

• New Priorities– Increase demand for

space >> Data entry• New Killer Apps

– Search >> Dictation• Speech Google!

– Data mining

• Old Priorities– Dictation app dates back to

days of dictation machines– Speech recognition has not

displaced typing• Speech recognition has

improved• But typing skills have

improved even more– My son will learn typing in

1st grade– Sec rarely take dictation

– Dictation machines are history• My son may never see one• Museums have slide rulers

and steam trains– But dictation machines?

Great Challenge: Annotating Data

• Produce annotated data with minimal supervision

• Active learning– Identify reliable labels– Identify best candidates for annotation

• Co-training• Bootstrap (project) resources from one

application to another

Borrowed Slide: Jelinek (LREC)

Self-organizing “Magic” ≠ Error Analysis

Great Strategy Success

Grand Challengesftp://ftp.cordis.lu/pub/ist/docs/istag040319-draftnotesofthemeeting.pdf

ftp://ftp.cordis.lu/pub/ist/docs/istag040319-draftnotesofthemeeting.pdf

Roadmaps: Structure of a Strategy(not the union of what we are all doing)

• Goals– Example: Replace keyboard with

microphone– Exciting (memorable) sound bite– Broad grand challenge that we can work

toward but never solve• Metrics

– Examples: • WER: word error rate• Time to perform task

– Easy to measure• Milestones

– Should be no question if it has been accomplished

– Example: reduce WER on task x by y% by time t

• Accomplishments v. Activities– Accomplishments are good– Activity is not a substitute for

accomplishments– Milestones look forward whereas

accomplishments look backward• Serendipity is good!

• Small is beautiful– Quantity is not a good thing– Awareness– 1-slide version

• if successful, you get maybe 3 more slides

• Size of container– Goal: 1-3– Metrics: 3– Milestones: a dozen

• Mostly for next year: Q1-4• Plus some for years 2, 5, 10 & 20

– Accomplishments: a dozen• Broad applicability & illustrative

– Don’t cover everything– Highlight stuff that

• Applies to multiple groups• Forward-Looking / Exciting

€ € €

ResourcesApps & Techniques

Grand Challenges

Goal: Reduce barriers to entry

Goals:1. The multilingual companion2. Life log

Goal: Produce NLP apps that improve the way people communicate

with one another

Evaluation

Summary: What Workedand What Didn’t?

• Data– Stay on msg: It is the data, stupid!It is the data, stupid!

• WVLC (Very Large) >> EMNLP (Empirical Methods)• If you have a lot of data,

– Then you don’t need a lot of methodology

• Rising Tide of Data Lifts All Boats

• Methodology– Empiricism means different things to different people

1. Machine Learning (Self-organizing Methods)2. Exploratory Data Analysis (EDA)3. Corpus-Based Lexicography

– Lots of papers on 1• EMNLP-2004 theme (error analysis) 2• Senseval grew out of 3

Substance: Recommended if…

Magic: Recommended if…

Promise: Recommended if…

Short term ≠ Long term

Lonely

What’s the right answer?

There’ll be a quiz at the end of the decade…

Date post:	28-Mar-2015
Category:	Documents
Upload:	danny-cartmell
View:	213 times
Download:	0 times