+ All Categories
Home > Documents > Machine Learning of Level and Progression in Spoken...

Machine Learning of Level and Progression in Spoken...

Date post: 24-May-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
42
Machine Learning of Level and Progression in Spoken EAL Kate Knill and Mark Gales Speech Research Group, Machine Intelligence Lab, University of Cambridge 5 February 2016
Transcript
Page 1: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Machine Learning of Level and Progression in Spoken EAL

Kate Knill and Mark Gales Speech Research Group, Machine Intelligence Lab, University of Cambridge

5 February 2016

Page 2: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Spoken Communication

Pronunciation Prosody

Message Construction Message Realisation Message Reception

Speaker Characteristics Environment/Channel

Page 3: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Spoken Communication

Pronunciation Prosody

Message Construction Message Realisation Message Reception

Speaker Characteristics Environment/Channel

Spoken communication is a very rich communication medium

Page 4: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Spoken Communication Requirements

•  Message Construction should consider: •  Has the speaker generated a coherent message to convey?

•  Is the message appropriate in the context?

•  Is the word sequence appropriate for the message?

Page 5: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Spoken Communication Requirements

•  Message Construction should consider: •  Has the speaker generated a coherent message to convey?

•  Is the message appropriate in the context?

•  Is the word sequence appropriate for the message?

•  Message Realisation should consider: •  Is the pronunciation of the words correct/appropriate?

•  Is the prosody appropriate for the message?

•  Is the prosody appropriate for the environment?

Page 6: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Spoken Communication Requirements

•  Message Construction should consider: •  Has the speaker generated a coherent message to convey?

•  Is the message appropriate in the context?

•  Is the word sequence appropriate for the message?

•  Message Realisation should consider: •  Is the pronunciation of the words correct/appropriate?

•  Is the prosody appropriate for the message?

•  Is the prosody appropriate for the environment?

Page 7: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Spoken Language Versus Written

okay carl uh do you exercise yeah actually um i belong to a gym down here gold’s gym and uh i try to exercise five days a week um and now and then i’ll i’ll get it interrupted by work or just full of crazy hours you know

ASR Output

Page 8: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Spoken Language Versus Written

okay carl uh do you exercise yeah actually um i belong to a gym down here gold’s gym and uh i try to exercise five days a week um and now and then i’ll i’ll get it interrupted by work or just full of crazy hours you know

ASR Output

Meta-Data Extraction (MDE) Markup Speaker1: / okay carl {F uh} do you exercise / Speaker2: / {DM yeah actually} {F um} i belong to a gym down here / / gold’s gym / / and {F uh} i try to exercise five days a week {F um} / / and now and then [REP i’ll + i’ll] get it interrupted by work or just full of crazy hours {DM you know } /

Page 9: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Spoken Language Versus Written

okay carl uh do you exercise yeah actually um i belong to a gym down here gold’s gym and uh i try to exercise five days a week um and now and then i’ll i’ll get it interrupted by work or just full of crazy hours you know

ASR Output

Meta-Data Extraction (MDE) Markup Speaker1: / okay carl {F uh} do you exercise / Speaker2: / {DM yeah actually} {F um} i belong to a gym down here / / gold’s gym / / and {F uh} i try to exercise five days a week {F um} / / and now and then [REP i’ll + i’ll] get it interrupted by work or just full of crazy hours {DM you know } /

Written Text Speaker1: Okay Carl do you exercise? Speaker2: I belong to a gym down here, Gold’s Gym, and I try to exercise five days a week and now and then I’ll get it interrupted by work or just full of crazy hours.

Page 10: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Business Language Testing Service (BULATS) Spoken Tests

•  Example of a test of communication skills A.  Introductory Questions: where you are from B.  Read Aloud: read specific sentences C.  Topic Discussion: discuss a company that you admire

D.  Interpret and Discuss Chart/Slide: example above E.  Answer Topic Questions: 5 questions about organising a meeting

Page 11: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Automated Assessment of One Speaker

Audio

Grade

Page 12: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Automated Assessment of One Speaker

Audio

Grade

Feature

extraction

Features

Grader

Page 13: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Automated Assessment of One Speaker

Audio

Grade

Featureextraction

Speechrecogniser

Text

Features

Grader

Page 14: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Outline

Audio

Grade

Featureextraction

Speechrecogniser

Text

Features

Grader

Page 15: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Speech Recognition Challenges

•  Non-native ASR highly challenging •  Heavily accented •  Pronunciation dependent on L1

•  Commercial systems poor!

•  State-of-the-art CUED systems

Training Data Word error rate

Native & C-level non-native English

54%

BULATS speakers 30%

Page 16: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Automatic Speech Recognition Components

Language Model

Acoustic Model

Recognition Engine “The cat sat on …”

Acoustic Model training data

Language Model training data

Pronunciation Lexicon

Page 17: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Forms of Acoustic and Language Models

L2 audio data L2 text data L1 text data

+ L2 Acoustic Model

L2 Language Model

Used to recognise L2 speech

Page 18: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Forms of Acoustic and Language Models

L2 audio data L2 text data L1 text data

+ L2 Acoustic Model

L2 Language Model

Used to recognise L2 speech

Native (L1) audio data

Native (L1) text data

Native Acoustic Model

Native Language Model

Useful to extract features

Page 19: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Deep Learning for Speech Recognition

Speaker Dependent

LayerBottleneck

Bottleneck PLP

Pitch

PLP Pitch

Bottleneck

HMM−GMMTandem

Stacked Hybrid

Fusion

FBank

PitchScore

Log−Posteriors

Log−LikelihoodsAMI Corpus DataBULATS Data

•  Fusion of HMM deep neural network and Gaussian mixture models •  trained on BULATS data

Page 20: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Recognition Error Rate Versus Learner Progression

Page 21: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Outline

Audio

Grade

Featureextraction

Speechrecogniser

Text

Features

Grader

Page 22: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Outline

Audio

Grade

Featureextraction

Speechrecogniser

Text

Features

Grader

Page 23: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Baseline Features

•  Mainly fluency based:

•  Audio Features: statistics about •  fundamental frequency (f0) •  speech energy and duration

•  Aligned Text Features: statistics about •  silence durations •  number of disfluencies (um, uh, etc) •  speaking rate

•  Text Identity Features: •  number of repeated words (per word) •  number of unique word identities (per word)

Page 24: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Speaking Time Versus Learner Progression

0

100

200

300

400

500

600

700

A1 A2 B1 B2 C

Average Speaking Time

(secs)

CEFR Grade

spontaneous speech read speech

Page 25: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Pronunciation Features

•  Hypothesis: poor speakers are weaker at making phonetic distinctions •  Statistical approach – learn phonetic distances from graded data

Page 26: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Pronunciation Features

•  Pattern of distances different between candidates of different levels Candidate Grade A1 Candidate Grade C1

•  Hypothesis: poor speakers are weaker at making phonetic distinctions •  Statistical approach – learn phonetic distances from graded data

Page 27: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Outline

Audio

Grade

Featureextraction

Speechrecogniser

Text

Features

Grader

Page 28: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Outline

Audio

Grade

Featureextraction

Speechrecogniser

Text

Features

Grader

Page 29: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Uses of Automatic Assessment

•  Human graders ✔ very powerful ability to assess spoken language ✖ vary in quality and not always available

•  Automatic graders ✔ more consistent and potentially always available ✖ validity of the grade varies and limited information about context

Page 30: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Uses of Automatic Assessment

•  Human graders ✔ very powerful ability to assess spoken language ✖ vary in quality and not always available

•  Automatic graders ✔ more consistent and potentially always available ✖ validity of the grade varies and limited information about context

•  Use automatic grader •  for grading practice tests/learning process •  in combination with human graders

•  combination: use both grades •  back-off process: detect challenging candidates

Page 31: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Gaussian Process Grader

•  Currently have 1000s candidates to train grader •  limited data compared to ASR frames (100,000s frames) •  useful to have confidence in prediction

Gaussian Process is a natural choice for this configuration

Page 32: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Form of Output

Graders Pearson Correlation Human experts 0.85 Automatic GP 0.83 – 0.86

Page 33: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Combining Human and Automatic Graders

•  Interpolate between human and automated grades •  Higher correlation i.e. more reliable grade produced

•  Content checking can be done by the human grader

Original 0.2 0.4 0.6 0.8 Gaussian process

Interpolation weight

0.85

0.9

0.95

1Correlation

Page 34: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Detecting Outlier Grades

•  Standard (BULATS) graders handle standard speakers very well •  non-standard (outlier) speakers less well handled •  use Gaussian Process variance to automatically detect outliers

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Rejection rate (i.e., cost)

0.85

0.9

0.95

1

Correlation

Gaussian process

•  Back-off to human experts •  Reject 10%: performance 0.83 è 0.88

Random rejection

Ideal rejection

Page 35: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Assessing Content

•  Grader correlates well with expert grades •  features do not assess content – primarily fluency features

•  Train a Recurrent Neural Network Language Model for each question •  assess whether the response is consistent with example answers

Page 36: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Audio

Grade

Featureextraction

Speechrecogniser

Text

Features

Grader

Spoken Language Assessment

•  Automatically assess: •  Message realisation

•  Fluency, pronunciation

•  Message construction •  Construction & coherence of response •  Relationship to topic

Page 37: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Audio

Grade

Featureextraction

Speechrecogniser

Text

Features

Grader

Spoken Language Assessment

•  Automatically assess: •  Message realisation

•  Fluency, pronunciation Achieved (with room for improvement) •  Message construction

•  Construction & coherence of response •  Relationship to topic

Unsolved – active research areas

Page 38: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Audio

Grade

Featureextraction

Speechrecogniser

Text

Features

Grader

Spoken Language Assessment and Feedback

Error Detection & Correction

•  Automatically assess: •  Message realisation

•  Fluency, pronunciation

•  Message construction •  Construction & coherence of response •  Relationship to topic

•  Provide feedback: •  Feedback to user: realisation, construction •  Feedback to system: adjust to level

Feedback

Page 39: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Recognition Error Rate Versus Learner Progression

Page 40: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Time Alignment and Pronunciation Feedback

•  Lightly supervised: •  No pronunciation labelling required – trained just on grades

Page 41: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Conclusions

•  Automated machine-learning for spoken language assessment •  important to keep costs down •  able to be integrated into the learning process

•  Current level – assessment of fluency •  ongoing research into assessing communication skills:

•  appropriateness and acceptability

•  Error detection and feedback is challenging •  high precision required in detecting where errors have occurred •  supplying feedback in appropriate form for learner

Page 42: Machine Learning of Level and Progression in Spoken EALmi.eng.cam.ac.uk/~kmk/presentations/CEP_Feb2016_Knill.pdf · 2016-05-12 · Pitch PLP Pitch Bottleneck HMM ... • useful to

Thank You

•  Acknowledgement: members of CUED MIL ALTA team:

•  Rogier van Dalen, Kostas Kyriakopoulos, Andrey Malinin, Yu Wang


Recommended