Date post: | 07-Feb-2017 |
Category: |
Data & Analytics |
Upload: | xavier-giro |
View: | 32 times |
Download: | 8 times |
[course site]
Day 3 Lecture 3
Speaker ID I
Javier Hernando
2
Acknowledgments
Miquel India, Omid Ghahabi, Pooyan SafariPh.D. candidates
3
Speech Recognition
Emotion - HappyGender - WomanAge - Teengger
.
.
.
4
Speaker ID as Biometrics
5
Security
6
Applications
7
Modalities
8
Tasks
9
FeaturesHumans use different features to recognise the speaker:
● Pronunciation, diction, …● Prosody, rhythm, speed, volume,…● Acoustic aspects of the voice.
Desirable aspects for those features:● Practical
○ To appear frequently and naturally during the speech○ Easily measurable for the system
● Robust○ Any change over time or by speaker’s health○ Any change by different transmission characteristics or by background noise
● Secure○ Hard to falsify
No feature has all those propertiesSpectrum-derived features are the more used by now because of their effectiveness
10
HMMs and GMMs
11
GMM-UBM Universal Background Model
MAP Adaptation
12
Supervectors
13
i-vectorsJoint Factor Analysis (JFA) model
14
i-Vector dimension
15
i-Vector TrainingT is trained iteratively according to the GMM posteriors
Given T and the UBM, i-vectors are extracted for each speaker utterance
16
i-vector Scoring
17
SoA Speaker Recognition
18
DL in Speaker Recognition
19
DL Features
20
BN Features
After M. Mclaren e al.,“Advances in deep neural network approaches to speaker recognition” ICASSP 2015.
Autoencoder
Denoising Autoencoder
Denoising autoencoder for cepstral domain dereverberation.
● Transfrom noisy features of reverberant speech to clean speech features.
● Pre-Trainning with Deep Belief Networks (DBN)
Zhang et al., Deep neural network-based bottleneck feature and denoising autoencoder-based fro distant-talking speaker identification, EURASSIP Journal on Audio, Speech, and Music Processing (2015) 2015:12