+ All Categories
Home > Documents > Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep...

Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep...

Date post: 11-Jun-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
28
Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1 , Yan Song 1 , Si Wei 2 , Meng-Ge Wang 1 , Ian McLoughlin 1 , Li-Rong Dai 1 1、National Engineering Laboratory of Speech and Language Information Processing, University of Science and Technology of China 2、iFlytek Research, Anhui USTC iFlytek Co. Ltd.
Transcript
Page 1: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Performance Evaluation of Deep Bottleneck Features for Spoken

Language Identification

Bing Jiang1, Yan Song1, Si Wei2, Meng-Ge Wang1, Ian McLoughlin1, Li-Rong Dai1

1、National Engineering Laboratory of Speech and Language Information Processing,

University of Science and Technology of China2、iFlytek Research, Anhui USTC iFlytek Co. Ltd.

Page 2: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Outline

• Background

• Our Method

• Experiments

• Conclusions

Friday, September 19, 2014 2

Page 3: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Outline

• Background• Our Method

• Experiments

• Conclusions

Friday, September 19, 2014 3

Page 4: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Background

• Language Identification is a typical problem inMachine learning

Friday, September 19, 2014 4

Feature Domain

Model

Domain

SDC GMM

Page 5: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Background

• There are many language-independent nuisancescovered by original acoustical feature.– Speaker variations

– Channel variations

– Special content variations

– Noise variations

• Feature improvement– MFCC SDC

• Temporal extension

– Compensation in feature domain• Factor analysis

Friday, September 19, 2014 5

So difficult

Page 6: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Background

• Model– Generative model Discriminative model

• GMM-UBM

• SVM

• MMI

– i-vector is the state of the art• Factor analysis

• With compensation methods:– LDA

– WCCN

– PLDA

• More suitable features are wanted……

Friday, September 19, 2014 6

Page 7: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Background

• Recently, DNN is drawing lots of attention– Non-linear modeling capability

• Deep layers structure

• Non-linear activation function

– Feature learning capability• Extracting information about the target layer by layer

• Using neural network to extract the discriminativefeature for LID task??– PLLR

– MLP

– Deep Bottleneck Feature

Friday, September 19, 2014 7

Page 8: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Outline

• Background

• Our Method• Experiments

• Conclusions

Friday, September 19, 2014 8

Page 9: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Our Method

• What are Deep bottleneck features?

Friday, September 19, 2014 9

输入特征

输出类别

Bottleneck layer

当前帧

当前帧

输出特征

Y=[y1,y2,...,yf]

Target Class

Input Feature

Current Frame

Current Frame

Output Feature

Page 10: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Our Method

• What are Deep bottleneck features?

Friday, September 19, 2014 10

1 -2 1

1 2

1 1 1 1

1 =1 1

( ; , ,..., )

( (... ( )...) )l l

l

m

M M Ml l l l

mj ji id d i j m

j i d

y W W W

x b b b

x

Page 11: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Our Method

• Why do we use Deep bottleneck features?– The target class

• Phonemes or phoneme states are suitable forlanguage identification task– Statistical method

• A low-dimensional compact representation ofthe original inputs

– Non-linear transformation– Discriminative features

Friday, September 19, 2014 11

Page 12: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Our Method

• Why do we use Deep bottleneck features?

Friday, September 19, 2014 12

SDC PK DBF

Page 13: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Our Method

• How to train the DBF extractor?

Friday, September 19, 2014 13

Page 14: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Outline

• Background

• Our Method

• Experiments• Conclusions

Friday, September 19, 2014 14

Page 15: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Experiments

• DNN training database– 500 hours Mandarin telephone database

• Evaluation database– NIST LRE 2009

Friday, September 19, 2014 15

Page 16: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Experiments

• Exper1: Comparison with SDC– DBF:43x11-2048-2048-43-2048-2048-6004

– 2048 mixture GMM-UBM

Friday, September 19, 2014 16

Page 17: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Experiments

• Exper2: Context window size of DNNinput– Motivation

• Context window size is sensitive for LID

• The parameter for SDC (7-1-3-7)– Can cover 21 frames

• For LID, the input window should be more length than speech recognition– Speech recognition: 5-1-5

Friday, September 19, 2014 17

Page 18: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Experiments

• Exper2: Context window size of DNNinput– DBF:43xn-2048-2048-43-2048-2048-6004

Friday, September 19, 2014 18

Page 19: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Experiments

• Exper3: Dimension of DBF– Motivation

• DNN training forces the activation signals inthe bottleneck layer to form a low-dimensional compact representation of theoriginal inputs

• Find the relationship of the feature dimensionand the performance.

Friday, September 19, 2014 19

Page 20: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Experiments

• Exper3: Dimension of DBF– DBF:43x21-2048-2048-d-2048-2048-6004

Friday, September 19, 2014 20

Page 21: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Experiments

• Exper4: Generated in different layers– Motivation

• The feature is more discriminative for target, the more suitable for LID??

• The bottleneck layer is more closer to the output layer, the performance more better???

Friday, September 19, 2014 21

Page 22: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Experiments

• Exper4: Generated in different layers– Layer3:43x21-2048-2048-43-2048-2048-6004

– Layer4:43x21-2048-2048-2048-43-2048-6004

– Layer5:43x21-2048-2048-2048-2048-43-6004

Friday, September 19, 2014 22

Page 23: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Experiments

• Exper5: DBF with PCA– Motivation

• Since we use the diagonal covariance matrix toapproximate the GMM, each dimension of the inputfeature need to be de-correlated.

• For SDC, (Discrete cosine transformation) DCT.

• For DBF, we use the classical PCA to have a try.

Friday, September 19, 2014 23

Page 24: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Experiments

• Exper5: DBF with PCA– DBF:43x21-2048-2048-43-2048-2048-6004

Friday, September 19, 2014 24

Page 25: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Outline

• Background

• Our Method

• Experiments

• Conclusions

Friday, September 19, 2014 25

Page 26: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Conclusions

• In this paper, we investigated the useof bottleneck features for LID task.– DBF can significantly improve LID

performance, especially for shortduration utterances.

– DBF is a new milestone for LID research.

• We believe that using DNN to extractmore suitable feature for LID will makea great process in LID community.

Friday, September 19, 2014 26

Page 27: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Related Paper

• For more information about DBF for LID, you can seethe following paper:– Song Yan, Jiang Bing, Bao Ye-Bo, Wei Si, Dai Li-Rong, “I-vector

representation based on bottleneck features for languageidentification.” Electronics Letters, vol.49, no. 24, pp. 1569-1570,2013.

– Jiang Bing, Song Yan, Si Wei, Liu Jun-hua, Ian McLoulghlin andDai Li-Rong, “Deep Bottleneck features for Spoken languageidentification.” Plos One 9(7): e100795.doi:10.1371/journal.pone.0100795, 2014.

– Jiang Bing, Song Yan, Si Wei, Ian McLoulghlin and Dai Li-Rong,“Task-aware deep bottleneck features for spoken languageidentification.” in Proc. of INTERSPECH 2014, Singapore, 2014.

Friday, September 19, 2014 27

Page 28: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Friday, September 19, 2014 28


Recommended