+ All Categories
Home > Documents > You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura...

You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura...

Date post: 21-Dec-2015
Category:
View: 215 times
Download: 1 times
Share this document with a friend
22
You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University
Transcript
Page 1: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University.

You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text

Laura Mayfield Tomokiyo

Rosie Jones

Carnegie Mellon University

Page 2: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University.

Overview

Motivation Speech data Accent detection as document

classification Classification performance Discriminative tokens Conclusions

Page 3: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University.

Non-native speech recognition

The warship U.S.S. Jarrett has pulled into port in San Diego, CA after training voyage

Native recognizer (word accuracy = 26.7):

Tomorrow CPU a sister at has spilled into port and sandy and afford after a training wage

Non-native recognizer (word accuracy = 73.3):

The worst eighty U.S.S. chart has pulled into port in San Diego California after training warrior

Page 4: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University.

Motivation

Practical can we detect non-native users with

enough accuracy to switch acoustic models?

Exploratory how well does an algorithm based

only on text features work? what tokens are discriminative for

non-native speakers?

Page 5: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University.

Speech examples

Over the next two months, public officials, Native American leaders, businesses and environmental groups will come up with plans for meeting the law’s requirements.

Spontaneous speech

Read speech

I like to have anything very special in Boston, very native in Boston.

Local specialties

Page 6: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University.

Speech data

Read speech Spontaneous speech

Native language

Speaker count

Utterance count

Word count (types)

Speaker count

Utterance count

Word count (types)

Japanese 10 957 15868 (3195)

31 1685 15934 (826)

English 8 756 10237 (2073)

6 320 4117 (418)

Mandarin --- --- --- 6 374 3490 (391)

Page 7: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University.

Transcripts and hypotheses

A safety net for the salmons

Environment= environmentalists…

A safety net forced simon

Um environmental activists…

•Usually gives a good idea of gold standard

•Finds true differences in linguistic usage

•Implicitly models acoustics

•Benefits from amplified difference between native and non-native samples

Classification based on transcripts: Classification based on hypotheses:

“A safety net for salmon: environmentalists, the government, and ordinary folks team up to save the Northwest’s wondrous wild salmon”

Page 8: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University.

Related work

Acoustic feature based accent discrimination (e.g. Fung and Liu 1999)

Competing HMM based accent discrimination (e.g. Teixeira et al 1996)

Classification of documents according to style (Argamon-Engleson et al 1998), author (Mosteller and Wallace 1964)

Page 9: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University.

Accent detection as document classification

Native speaker utterances

Non-native speaker utterances

Classifier

Page 10: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University.

Accent detection as document classification

Classifier

Test speaker utterances

Classification decision: native or non-native?

Page 11: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University.

Experimental methodology

Rainbow naïve Bayes classifier Both word and part-of-speech tokens were examined Classification based on token unigrams and bigrams No feature selection initially Stopwords were not excluded from feature set Data randomly split into 30% testing, 70% training data

for evaluation; evaluation repeated 20 times and classification results averaged

Utterances from the same speaker never appeared in both training and test sets

Page 12: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University.

Classification of spontaneous speech (transcripts only)

01020304050

60708090

100

Cla

ssif

icat

ion

accu

racy

BaselineWordPOSPOSNoun

Native/ Japanese

Native/ Chinese

Japanese/ Chinese

Native/ Non-native

Native/ Japanese/ Chinese

Page 13: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University.

Classification of read speech

0102030405060708090

100

A

Word-trans

POS-trans

Word-hypo

POS-hypo

A train: same texts

test: same texts

baseline

Page 14: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University.

Classification of read speech

0102030405060708090

100

A B C D

trans-word

trans-POS

hypo-word

hypo-POS

A train: same texts

test: same texts

B train: disjoint texts

test: disjoint texts

C train: disjoint texts

test: same texts

D train: same texts

test: disjoint texts

baseline

Page 15: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University.

Classification of read speech

0102030405060708090

100

B

trans-word

trans-pos

hypo-word

hypo-pos

A train: same texts

test: same texts

B train: disjoint texts

test: disjoint texts

C train: disjoint texts

test: same texts

D train: same texts

test: disjoint texts

baseline

Page 16: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University.

Feature Selection

Method Number of features Accuracy

None 4087 47

IG-524 524 69

SMART-524 524 88

IG-200 200 74

SMART-524, IG-200 200 88

IG-70 70 70

M&W-70 70 87

IG-48 48 74

SMART-48 48 84

Page 17: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University.

Discriminative sequences

Speech type Token type Native Non-native

Read Word NMFS the + the

the that

Read POS noun(pl) noun(sing)

noun(pl) verb(past)

Spontaneous Word Wonderland the

Spontaneous POS TO + verb(base) noun(sing)

Spontaneous POSNoun am noun(sing)

transcriptions hypotheses

Page 18: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University.

Conclusions

Transcriptions of spontaneous speech can be classified with high accuracy for both 2-way and 3-way distinctions

Read speech samples, which are simple transformations of native-produced text, can be classified with high accuracy

Recognizer output is classified more accurately than transcripts

Page 19: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University.

Future directions

Incorporating the classification decision in acoustic model selection

Minimizing the number of samples from the test speaker needed for classification

Applying classification to parsing grammar selection, language model construction, writer identification

Page 20: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University.

Discriminative POS sequences

Native Non-native

Noun(pl) Noun(sing)

Determiner Preposition

Noun(pl);preposition Preposition;preposition

Adjective;noun(Pl) Noun(sing);noun(sing)

Gerund;particle Particle;preposition

Noun(s);verb(3s) Cardinal#;cardinal#

Noun(pl);modal Verb(past)

Page 21: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University.

Discriminative word sequences

Native Non-native

NMFS the;the

the;NMFS in;in

nineteen;hundreds the

hundreds;now in

hundreds that

habitats;and habitat;and

Page 22: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University.

Phone-based classification

0

20

40

60

80

100

Words Phones

Identity POS/Phone class

Native Non-native

Phone identity // /I/

Phone class

CCC V

Discriminative tokens

Condition B


Recommended