+ All Categories
Home > Documents > By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal...

By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal...

Date post: 21-Dec-2015
Category:
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
22
By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL George Doddington, NA-sayer EARS Kickoff Meeting: “Pushing the Envelope”
Transcript
Page 1: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL.

By the Novel Approaches team:

Nelson Morgan, ICSIHynek Hermansky, OGI

Dan Ellis, ColumbiaKemal Sonmez, SRIMari Ostendorf, UW

Hervé Bourlard, IDIAP/EPFLGeorge Doddington, NA-sayer

EARS Kickoff Meeting:“Pushing the Envelope”

Page 2: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL.

Modern ASR SystemsModern ASR Systems

• From 50,000 ft, all ASR systems the same:

- compute local spectral envelope- determine likelihoods of speech

sounds- search for most likely HMMs

• Spectral envelope distorted by many things

- Alternatives often are bad fits to the statistical models

Page 3: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL.

ASR is half-deafASR is half-deaf• Phonetic classification very poor

• Success due to constraints (domain, speaker, noise-canceling mic, etc)

• These constraints can mask the underlyingweakness of the technology

Page 4: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL.

“Y'see, they just find out who complains

the loudest about the cooking, and he gets to be the

cook.”

- Utah Phillips

Who gets to try to fix it?Who gets to try to fix it?

Page 5: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL.

Rethinking Acoustic Rethinking Acoustic Processing for ASRProcessing for ASR

• Escape dependence on spectral envelope

• Use multiple front ends across time/freq

• Modify statistical models to accommodate new front ends

• Design optimal combination schemes for multiple models

Page 6: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL.

The Two EARS-NA The Two EARS-NA TasksTasks

• Signal processing - Replacing the spectral envelope by long-time and short-time (multirate) probabilistic functions of the spectro-temporal plane.

• Statistical Modeling: Modifying the statistical models, both to incorporate these new multirate front ends and to explicitly handle areas of missing information.

Page 7: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL.

time

Task 1: Pushing the Task 1: Pushing the Envelope (aside)Envelope (aside)

• Problem: Spectral envelope is a fragile information carrier

estimate of sound identity

info

rmat

ion

fusi

on

10 msOLD

PROPOSED

• Solution: Probabilities from multiple time-frequency patches

i-th estimate

up to 1s

k-th estimate

n-th estimate

estimate of sound identity

Page 8: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL.

Multiple time-Multiple time-frequency tradeoffsfrequency tradeoffs

• Temporal trajectories of narrow subbands

• Optimal search for more general patches

• Data-driven broad class probabilities

time

k-th estimate

n-th estimate

i-th estimate

up to 1s

Page 9: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL.

Pitch-related featuresPitch-related features• Current recognizers have no use for pitch• Listeners benefit from pitch• Correlogram estimates spectrum of pitch

Page 10: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL.

Principled multistreamPrincipled multistream

• Not just different, but useful in combination

- minimizing relative entropy between error signals

- minimizing conditional information of posterior signals

• Choosing categories for per-stream probabilistic functions (e.g., broad classes)

Page 11: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL.

Task 2: Beyond Task 2: Beyond Frames…Frames…

• Solution: Advanced features require advanced models, not limited by fixed-frame-rate paradigm

OLD

PROPOSED

conventional HMMshort-term features

• Problem: Features & models interact, new features may require different models

advanced features multi-rate / dynamic scale classifier

Page 12: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL.

Multirate ModelsMultirate Models

• Goal: Model features that span different time scales and dependence across scales/streams

advanced features multirate classifier

Page 13: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL.

Multirate Models (ctd)Multirate Models (ctd)• Why multirate vs. redundant features?

- Redundant features violate independence assumptions, lead to poor confidence (posterior) estimates- Redundancy adds unnecessary computation

• Important research issues:- Acoustically driven rate mixing and/or variable alignment - Discriminative learning of dependence across streams

Page 14: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL.

Partial information Partial information techniquestechniques

• Can integrate across unknown dimensions

• particularly simple for diagonal Gaussians

• e.g. Spectral masks: Skip missing dimensions

• Hard part is identifying the bad data

Page 15: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL.

Multistream statisticsMultistream statistics

• All possible combinations of individual streams

Page 16: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL.

Multistream statistics Multistream statistics (ctd)(ctd)

• Statistical modeling in both frequency and time: HMM2

Page 17: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL.

EvaluationEvaluation

• For greatest and most reliable progress, need frequent internal evaluations

• Most importantly, need to define helpful evaluation tasks – to guide the research

• Other considerations beyond the task:- definition of performance measures- choice of corpora- establishment of an evaluation process

Page 18: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL.

Task and corpus, initial Task and corpus, initial planplan

• Evaluation tasks – Recognition of words and syllables

• Cross-corpus testing- training on Hub 5, Macrophone

- testing on OGI numbers for quick turn- around, debugging

• Testing on Hub 5 in due course

• Rescoring SRI decoder output (N-best or lattice)

Page 19: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL.

Metrics and diagnosticsMetrics and diagnostics• Word and syllable error statistics

• Detection statistics and error distribution across speakers (and other conditions that are deemed to be important)

• Comparison to human performance

• Running scores on dev sets within group, held-out evals at least annually (NA-sayer wants weekly )

Page 20: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL.

Connection to RT evalsConnection to RT evals

• Rescore output of SRI system

• In later years work more closely with RT team to transfer most successful ideas

• Feedback from RT experience (error diagnostics) is also important

Page 21: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL.

Summary

• An alternative view of acoustic

processing for ASR for

features+models

• Pushing the envelope … aside

• Matching new front end

characteristics with appropriate

statistical models

• Diagnostic evaluations a key feature

Page 22: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL.

Closing Thought

“When you come to a fork in the road,

take it.”

- Yogi Berra


Recommended