+ All Categories
Home > Data & Analytics > Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

Date post: 07-Feb-2017
Category:
Upload: xavier-giro
View: 32 times
Download: 8 times
Share this document with a friend
22
[course site] Day 3 Lecture 3 Speaker ID I Javier Hernando
Transcript
Page 2: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

2

Acknowledgments

Miquel India, Omid Ghahabi, Pooyan SafariPh.D. candidates

Page 3: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

3

Speech Recognition

Emotion - HappyGender - WomanAge - Teengger

.

.

.

Page 4: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

4

Speaker ID as Biometrics

Page 5: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

5

Security

Page 6: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

6

Applications

Page 7: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

7

Modalities

Page 8: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

8

Tasks

Page 9: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

9

FeaturesHumans use different features to recognise the speaker:

● Pronunciation, diction, …● Prosody, rhythm, speed, volume,…● Acoustic aspects of the voice.

Desirable aspects for those features:● Practical

○ To appear frequently and naturally during the speech○ Easily measurable for the system

● Robust○ Any change over time or by speaker’s health○ Any change by different transmission characteristics or by background noise

● Secure○ Hard to falsify

No feature has all those propertiesSpectrum-derived features are the more used by now because of their effectiveness

Page 10: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

10

HMMs and GMMs

Page 11: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

11

GMM-UBM Universal Background Model

MAP Adaptation

Page 12: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

12

Supervectors

Page 13: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

13

i-vectorsJoint Factor Analysis (JFA) model

Page 14: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

14

i-Vector dimension

Page 15: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

15

i-Vector TrainingT is trained iteratively according to the GMM posteriors

Given T and the UBM, i-vectors are extracted for each speaker utterance

Page 16: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

16

i-vector Scoring

Page 17: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

17

SoA Speaker Recognition

Page 18: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

18

DL in Speaker Recognition

Page 19: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

19

DL Features

Page 20: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

20

BN Features

After M. Mclaren e al.,“Advances in deep neural network approaches to speaker recognition” ICASSP 2015.

Page 21: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

Autoencoder

Page 22: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

Denoising Autoencoder

Denoising autoencoder for cepstral domain dereverberation.

● Transfrom noisy features of reverberant speech to clean speech features.

● Pre-Trainning with Deep Belief Networks (DBN)

Zhang et al., Deep neural network-based bottleneck feature and denoising autoencoder-based fro distant-talking speaker identification, EURASSIP Journal on Audio, Speech, and Music Processing (2015) 2015:12


Recommended