1 2.5.4.1 Basics of Neural Networks. 2 2.5.4.2 Neural Network Topologies.

Post on 03-Jan-2016

227 views 0 download

Tags:

transcript

1

2.5.4.1 Basics of Neural Networks2.5.4.1 Basics of Neural Networks0X

INPUT

1X

2X

1NX

Y

OUTPUT

1

0

N

iii xWfy

2

2.5.4.2 Neural Network Topologies2.5.4.2 Neural Network Topologies

3

2.5.4.2 Neural Network 2.5.4.2 Neural Network TopologiesTopologies

4

2.5.4.2 Neural Network Topologies2.5.4.2 Neural Network Topologies

5

TDNNTDNN

6

2.5.4.6 Neural Network Structures for 2.5.4.6 Neural Network Structures for Speech RecognitionSpeech Recognition

7

2.5.4.6 Neural Network Structures for 2.5.4.6 Neural Network Structures for

Speech RecognitionSpeech Recognition

8

3.1.1 Spectral Analysis Models3.1.1 Spectral Analysis Models

9

3.1.1 Spectral Analysis Models3.1.1 Spectral Analysis Models

10

3.2 THE BANK-OF-FILTERS 3.2 THE BANK-OF-FILTERS FRONT- END PROCESSORFRONT- END PROCESSOR

11

3.2 THE BANK-OF-FILTERS 3.2 THE BANK-OF-FILTERS FRONT- END PROCESSORFRONT- END PROCESSOR

12

3.2 THE BANK-OF-FILTERS 3.2 THE BANK-OF-FILTERS FRONT- END PROCESSORFRONT- END PROCESSOR

13

3.2 THE BANK-OF-FILTERS 3.2 THE BANK-OF-FILTERS FRONT- END PROCESSORFRONT- END PROCESSOR

14

3.2 THE BANK-OF-FILTERS 3.2 THE BANK-OF-FILTERS FRONT- END PROCESSORFRONT- END PROCESSOR

15

3.2.1 Types of Filter Bank Used for 3.2.1 Types of Filter Bank Used for Speech RecognitionSpeech Recognition

N

Fb

NQ

QiiN

Ff

si

si

2/

1,

16

Nonuniform Filter BanksNonuniform Filter Banks

1

1

11

1

1

,2

)(

2,i

j

iji

ii

bbbff

Qibb

cb

17

Nonuniform Filter BanksNonuniform Filter Banks

HzbHzfFilter

HzbHzfFilter

HzbHzfFilter

HzbHzfFilter

1600,2400:4

800,1200:3

400,600:2

200,300:1

44

33

22

11

18

3.2.1 Types of Filter Bank Used for 3.2.1 Types of Filter Bank Used for Speech RecognitionSpeech Recognition

19

3.2.1 Types of Filter Bank Used for 3.2.1 Types of Filter Bank Used for Speech RecognitionSpeech Recognition

20

3.2.2 Implementations of Filter Banks3.2.2 Implementations of Filter Banks

Instead of direct convolution, which is Instead of direct convolution, which is computationally expensive, we assume computationally expensive, we assume each bandpass filter impulse response to each bandpass filter impulse response to be represented by:be represented by:

Where w(n) is a fixed lowpass filterWhere w(n) is a fixed lowpass filter

nji

ienwnh )()(

21

3.2.2 Implementations of Filter Banks3.2.2 Implementations of Filter Banks

22

3.2.2.1 Frequency Domain Interpretation of the Short-3.2.2.1 Frequency Domain Interpretation of the Short-

Time Fourier TransformTime Fourier Transform

23

3.2.2.1 Frequency Domain 3.2.2.1 Frequency Domain Interpretation of the Short-Time Interpretation of the Short-Time

Fourier TransformFourier Transform

24

3.2.2.1 Frequency Domain 3.2.2.1 Frequency Domain Interpretation of the Short-Time Interpretation of the Short-Time

Fourier TransformFourier Transform

25

3.2.2.1 Frequency Domain 3.2.2.1 Frequency Domain Interpretation of the Short-Time Interpretation of the Short-Time

Fourier TransformFourier Transform

26

Linear Filter Interpretation of the Linear Filter Interpretation of the STFTSTFT

)(~

ns)(ns)(nw

ije

)( 1jn eS

27

3.2.2.4 FFT Implementation of a 3.2.2.4 FFT Implementation of a Uniform Filter BankUniform Filter Bank

28

Direct implementation of an arbitrary Direct implementation of an arbitrary filter bankfilter bank

)(ns

)(1 nh

)(nX Q

)(2 nh

)(nhQ

)(1 nX

)(2 nX

29

3.2.2.5 Nonuniform FIR Filter Bank 3.2.2.5 Nonuniform FIR Filter Bank ImplementationsImplementations

30

3.2.2.7 Tree Structure Realizations of 3.2.2.7 Tree Structure Realizations of Nonuniform Filter BanksNonuniform Filter Banks

31

3.2.4 Practical Examples of Speech-3.2.4 Practical Examples of Speech-Recognition Filter Banks Recognition Filter Banks

32

3.2.4 Practical Examples of Speech-3.2.4 Practical Examples of Speech-Recognition Filter BanksRecognition Filter Banks

33

3.2.4 Practical Examples of Speech-3.2.4 Practical Examples of Speech-Recognition Filter BanksRecognition Filter Banks

34

3.2.4 Practical Examples of Speech-3.2.4 Practical Examples of Speech-Recognition Filter BanksRecognition Filter Banks

35

3.2.5 Generalizations of Filter-Bank Analyzer 3.2.5 Generalizations of Filter-Bank Analyzer

36

3.2.5 Generalizations of Filter-Bank Analyzer 3.2.5 Generalizations of Filter-Bank Analyzer

37

3.2.5 Generalizations of Filter-Bank Analyzer 3.2.5 Generalizations of Filter-Bank Analyzer

38

3.2.5 Generalizations of Filter-Bank Analyzer 3.2.5 Generalizations of Filter-Bank Analyzer

39

40

41

42

43

44

45

46

روش مل-کپسترومروش مل-کپستروم

Mel-scaling بندی فریم

IDCT

|FFT|2

Low-order coefficientsDifferentiator

Cepstra

Delta & Delta Delta Cepstra

زمانی سیگنال

Logarithm

47

Time-Frequency analysisTime-Frequency analysis

Short-term Fourier TransformShort-term Fourier Transform Standard way of frequency analysis: decompose the Standard way of frequency analysis: decompose the

incoming signal into the constituent frequency incoming signal into the constituent frequency components.components.

W(n): windowing functionW(n): windowing function N: frame lengthN: frame length p: step sizep: step size

48

Critical band integrationCritical band integration

Related to masking phenomenon: the Related to masking phenomenon: the threshold of a sinusoid is elevated when threshold of a sinusoid is elevated when its frequency is close to the center its frequency is close to the center frequency of a narrow-band noisefrequency of a narrow-band noise

Frequency components within a critical Frequency components within a critical band are not resolved. Auditory system band are not resolved. Auditory system interprets the signals within a critical interprets the signals within a critical band as a wholeband as a whole

49

Bark scaleBark scale

50

Feature Feature orthogonalizationorthogonalization

Spectral values in adjacent Spectral values in adjacent frequency channels are highly frequency channels are highly correlatedcorrelated

The correlation results in a The correlation results in a Gaussian model with lots of Gaussian model with lots of parameters: have to estimate all the parameters: have to estimate all the elements of the covariance matrixelements of the covariance matrix

Decorrelation is useful to improve Decorrelation is useful to improve the parameter estimation.the parameter estimation.

51

CepstrumCepstrum Computed as the inverse Fourier transform Computed as the inverse Fourier transform

of the log magnitude of the Fourier of the log magnitude of the Fourier transform of the signaltransform of the signal

The log magnitude is real and symmetric -> The log magnitude is real and symmetric -> the transform is equivalent to the Discrete the transform is equivalent to the Discrete Cosine Transform.Cosine Transform.

Approximately decorrelatedApproximately decorrelated

52

Principal Component Principal Component AnalysisAnalysis

Find an orthogonal basis such that the Find an orthogonal basis such that the reconstruction error over the training set reconstruction error over the training set is minimizedis minimized

This turns out to be equivalent to This turns out to be equivalent to diagonalize the sample autocovariance diagonalize the sample autocovariance matrixmatrix

Complete decorrelationComplete decorrelation Computes the principal dimensions of Computes the principal dimensions of

variability, but not necessarily provide variability, but not necessarily provide the optimal discrimination among classesthe optimal discrimination among classes

53

Principal Component Analysis Principal Component Analysis ((PCAPCA))

MathematicalMathematical procedure that transforms a number of procedure that transforms a number of (possibly) correlated variables into a (smaller) number of (possibly) correlated variables into a (smaller) number of uncorrelateduncorrelated variables called variables called principal components (PC)principal components (PC)

Find an orthogonal basis such that the reconstruction error Find an orthogonal basis such that the reconstruction error over the training set is minimizedover the training set is minimized

This turns out to be equivalent to diagonalize the sample This turns out to be equivalent to diagonalize the sample autocovariance matrixautocovariance matrix

Complete decorrelationComplete decorrelation

Computes the principal dimensions of variability, but not Computes the principal dimensions of variability, but not necessarily provide the optimal discrimination among classesnecessarily provide the optimal discrimination among classes

54

PCA PCA (Cont.)(Cont.)

AlgorithmAlgorithm

xFy

Apply Transform

Output =

(R- dim vectors)

MRy *

Input=

(N-dim vectors)

MNx * Covariance matrix

1

1

M

xxxxCov

M

i

T

ii

iN

i

EigVec

EigValNi ...1

Transform matrix

NEigVec

EigVec

EigVec

F.

2

1

...21 EigValEigVal

Eigen values

Eigen vectors

55

PCA PCA (Cont.)(Cont.) PCA in speech recognition systemsPCA in speech recognition systems

56

Linear discriminant Linear discriminant AnalysisAnalysis

Find an orthogonal basis such that the Find an orthogonal basis such that the ratio of the between-class variance ratio of the between-class variance and within-class variance is and within-class variance is maximizedmaximized

This also turns to be a general This also turns to be a general eigenvalue-eigenvector problemeigenvalue-eigenvector problem

Complete decorrelationComplete decorrelation Provide the optimal linear separability Provide the optimal linear separability

under quite restrict assumptionunder quite restrict assumption

57

PCA vs. LDAPCA vs. LDA

58

Spectral smoothingSpectral smoothing

Formant information is crucial for Formant information is crucial for recognitionrecognition

Enhance and preserve the formant Enhance and preserve the formant information:information: Truncating the number of cepstral Truncating the number of cepstral

coefficientscoefficients Linear prediction: peak-hugging Linear prediction: peak-hugging

propertyproperty

59

Temporal processingTemporal processing

To capture the temporal features of To capture the temporal features of the spectral envelop; to provide the the spectral envelop; to provide the robustness:robustness: Delta Feature: first and second order Delta Feature: first and second order

differences; regressiondifferences; regression Cepstral Mean Subtraction:Cepstral Mean Subtraction:

For normalizing for channel effects and For normalizing for channel effects and adjusting for spectral slopeadjusting for spectral slope

60

RASTA (RelAtive SpecTral RASTA (RelAtive SpecTral Analysis)Analysis)

Filtering of the temporal trajectories of Filtering of the temporal trajectories of some function of each of the spectral some function of each of the spectral values; to provide more reliable values; to provide more reliable spectral featuresspectral features

This is usually a bandpass filter, This is usually a bandpass filter, maintaining the linguistically important maintaining the linguistically important spectral envelop modulation (1-16Hz)spectral envelop modulation (1-16Hz)

61

62

RASTA-PLPRASTA-PLP

63

64