+ All Categories
Home > Documents > Computational Linguistics

Computational Linguistics

Date post: 06-Jan-2016
Category:
Upload: velvet
View: 62 times
Download: 0 times
Share this document with a friend
Description:
Computational Linguistics. Ling 200 Spring 2006. Speech and language processing. Computational Linguistics use of computers to facilitate linguistic research Natural Language Processing computer-natural language interface applications. Combines disciplines. Linguistics - PowerPoint PPT Presentation
28
1 Computational Linguistics Ling 200 Spring 2006
Transcript
Page 1: Computational Linguistics

1

Computational Linguistics

Computational Linguistics

Ling 200Spring 2006

Page 2: Computational Linguistics

2

Speech and language processing

Speech and language processing

•Computational Linguistics use of computers to facilitate linguistic research

•Natural Language Processing computer-natural language interface applications

Page 3: Computational Linguistics

3

Combines disciplinesCombines disciplines

•Linguistics e.g. grammar engineering

•Electrical Engineering e.g. speech recognition

•Computer science e.g. machine translation

•Psychology e.g. cognitive modeling

Page 4: Computational Linguistics

4

2 Minute question (part 1)

2 Minute question (part 1)

List the specific language related skills HAL exhibits.

In other words, list the different abilities the computer (HAL) must have to display human-like language?

Page 5: Computational Linguistics

5

QuickTime™ and aH.263 decompressor

are needed to see this picture.

Page 6: Computational Linguistics

6

Today’s goalsToday’s goals

•Convey: some areas of research some of the difficulties involved some development strategies

•Provide examples of particular technologies as illustration

Page 7: Computational Linguistics

7

Computerized natural language

Computerized natural language

•speech recognition•language understanding•language generation•speech synthesis

Page 8: Computational Linguistics

8

Other areas of interest

Other areas of interest

• searching understanding search request finding relevant documents ordering by degree of relevance

• information extraction retrieving information from documents

• data mining discovering patterns and relationships in data

Page 9: Computational Linguistics

9

...and still more topics

...and still more topics

•machine translation http://babelfish.altavista.com http://www.google.com/translate

•summarization•grammar checking •spell checking

Page 10: Computational Linguistics

10

Commonly used toolsCommonly used tools

•formal rule systems•computational search algorithms•formal logic•probability theory•machine learning techniques

Page 11: Computational Linguistics

11

Speech Recognition Demo

Speech Recognition Demo

Software Used:iListen from MacSpeech

Page 12: Computational Linguistics

12

What is Speech Recognition?

What is Speech Recognition?

• Definition: Speech recognition turns acoustic input into strings of phonemes and then finds the best matching word in a database. Can be built for open domain use, theoretically recognizing all possible strings of words• e.g. dictation systems

Can also be built for a particular domain, recognizing small, finite sets of utterances • e.g. automated call-centers.

Page 13: Computational Linguistics

13

Speech RecognitionAcoustic Model

Speech RecognitionAcoustic Model

• First, the continuous speech signal is broken up into short segments.

• Segments are analyzed into features, which you can think of as quantitative versions of the phonetic features you learned in class.

• By comparing segments against internally stored phonological model, well matched phonemes are proposed for each segment

• End up with a list of most likely phoneme sequences.

Page 14: Computational Linguistics

14

Speech RecognitionLanguage Model

Speech RecognitionLanguage Model

• Sequences of phonemes are verified by comparing with a database of words and their likelihoods (in real time), and only actual words and phrases are accepted [rɛkənajspič]

• [rɛkənajspič]• ‘recognize speech’

[rɛkənajspiš] • [rɛkənajspiš]• ??‘recognize speesh’

*Fast speech: [z] -> [s] / _[s]

Page 15: Computational Linguistics

15

Problems Acoustic Model

Problems Acoustic Model

• Recognizing different voice qualities as the same basic sounds.

• You can think of this as choosing the correct phoneme. Phonemes sound different (allophones), depending on

their environments. • word position: /p/ --> [ph] / #_• assimilation: /z/ --> [s] / _C [-voice]• deletion: [s] --> ø / _[s]

“Three cats sit.”

• Speech signal is continuous and full of non-speech noise.

Page 16: Computational Linguistics

16

ProblemsAmbiguityProblemsAmbiguity

•Same or very similar sequence of phonemes can correspond to multiple words or phrases Homophones

•Words [dir] ‘deer’ ‘dear’

•Phrases (remember there is no pause to separate word boundaries)

[rɛkənajspič] ‘recognize speech’ [rɛkənajspič] ‘wreck a nice beach’

Page 17: Computational Linguistics

17

Potential FixLanguage ModelPotential FixLanguage Model

• Weight word/phrase interpretations (statistical language modeling)

• Lexical: Consider how often a word actually occurs. [dir] ‘deer’ (50) ‘dear’ (215)

• Choose most frequent, in this case ‘dear’

• Condition on context: Consider how often a word occurs within a particular context.

• I just shot a [dir]. (shot, a, dear) 1 (shot, a, deer) 10

• In this case, ‘deer’ occurs more frequently in this environment, so we choose ‘deer’ as our interpretation.

Page 18: Computational Linguistics

18

DemoTraining Data Matters

DemoTraining Data Matters

• Word and context frequencies are not just pulled from thin air.

• Frequencies are calculated (training) From some collection of text (a corpus).

• Speech recognizers often train on a user’s emails and documents, to better match the user’s lexical choice and phrase patterns.

• This training data helps decipher homophonous strings (strings that are acoustically ambiguous).

Page 19: Computational Linguistics

19

Demo 2Training Data Matters

Demo 2Training Data Matters

•I will attempt to utter the following phrase and iListen should transcribe my speech.

•It’s hard to… [rɛkənajspič] ‘recognize speech’ [rɛkənajspič] ‘wreck a nice beach’

Page 20: Computational Linguistics

20

Demo 3 LinguistDemo 3

Linguist• What if software is trained for a Computational Linguist? Trained on 3 Wikipedia articles about various topics in Computational Linguistics

Which interpretation should we expect, based on words and phrases likely to be present in computational linguistics documents?

Results:Is hard to recognize speech New set the state

but is so bad and found a 544 is no sound better, even so it is etc is not really that bad so at and his exist listening to 89, Nancy of

Page 21: Computational Linguistics

21

Demo 4 Beach BumDemo 4

Beach Bum• What if software is trained for a Beach Bum? Trained on 3 Wikipedia articles on beach topics.

Which interpretation should we expect, based on frequent words and phrases likely to be found in beach-related documents?

Results:It’s hard to wreck nice beach and

Page 22: Computational Linguistics

22

Language understandingLanguage understanding

•morphology•syntax•semantics•pragmatics•discourse

Page 23: Computational Linguistics

23

"I made her duck.”"I made her duck.”

•I cooked waterfowl for her•I cooked waterfowl belonging to her•I created the (plaster?) duck she owns

•I caused her to quickly lower her head or body

•I waved my magic wand and turned her into undifferentiated waterfowl

Page 24: Computational Linguistics

24

Language generationLanguage generation

“I'm sorry, Dave, I'm afraid I can't do that”

pragmatics:•politeness•indirect speech

morphology: •contractions

discourse: •reference (“that”)

Page 25: Computational Linguistics

25

Who/what is ELIZA?Who/what is ELIZA?

Page 26: Computational Linguistics

26

Dialogue systems - issues

Dialogue systems - issues

•HAL has complete understanding - How close are we to this?

•Eliza had no semantic understanding and only minimal syntactic knowledge

•dialogue systems: effective in limited domains like travel

Page 27: Computational Linguistics

27

Dialogue systems: demoDialogue systems: demo

[David]

•Chatbot website: http://daden.co.uk/chatbots/

Page 28: Computational Linguistics

28

2 minute question (part 2)

2 minute question (part 2)

•Do you think that HAL quality computer communication is a reasonable expectation?

•Why or why not?


Recommended