+ All Categories
Home > Documents > Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Date post: 03-Feb-2022
Category:
Upload: others
View: 9 times
Download: 1 times
Share this document with a friend
50
Speech Technology - Kishore Prahallad ([email protected]) 1 Speech Technology: A Practical Introduction Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis Kishore Prahallad Email: [email protected] Carnegie Mellon University & International Institute of Information Technology Hyderabad
Transcript
Page 1: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])1

Speech Technology: A Practical Introduction

Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Kishore PrahalladEmail: [email protected]

Carnegie Mellon University&

International Institute of Information Technology Hyderabad

Page 2: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])2

Topics

• Spectrogram• Cepstrum • Mel-Frequency Analysis • Mel-Frequency Cepstral Coefficients

Page 3: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])3

Spectrogram

Page 4: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])4

Speech signal represented as a sequence of spectral vectors

FFT FFT FFT

Spectrum

Page 5: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])5

Speech signal represented as a sequence of spectral vectors

FFT

Spectrum

FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT

Page 6: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])6

Speech signal represented as a sequence of spectral vectors

FFT

Spectrum

FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT

Hz

Amp.

Page 7: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])7

Speech signal represented as a sequence of spectral vectors

FFT

Spectrum

FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT

Hz

Amplitude

Rotate it by 90 degrees

Page 8: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])8

Speech signal represented as a sequence of spectral vectors

FFT

Spectrum

FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT

Hz • MAP spectral amplitude to a grey level (0-255) value. 0 represents black and 255 represents white.• Higher the amplitude, darker the corresponding region.

Amplitude

Page 9: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])9

Speech signal represented as a sequence of spectral vectors

FFT

Spectrum

FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT

Hz

Time

Page 10: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])10

Speech signal represented as a sequence of spectral vectors

FFT

Spectrum

FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT

Hz

Time

Time Vs Frequency representation of a speech

signal is referred to as spectrogram

Page 11: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])11

Some Real Spectrograms

Dark regions indicate peaks (formants) in the spectrum

Page 12: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])12

Why we are bothered about spectrograms

Phones and their properties are

better observed in spectrogram

Page 13: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])13

Why we are bothered about spectrograms

Sounds can be identified much

better by the Formants and by their transitions

Page 14: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])14

Why we are bothered about spectrograms

Sounds can be identified much

better by the Formants and by their transitions

Hidden Markov Models implicitly model these spectrograms to perform speech recognition

Page 15: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])15

Usefulness of Spectrogram• Time-Frequency representation of the speech signal

• Spectrogram is a tool to study speech sounds (phones)

• Phones and their properties are visually studied by phoneticians

• Hidden Markov Models implicitly model spectrograms for speech totext systems

• Useful for evaluation of text to speech systems– A high quality text to speech system should produce synthesized

speech whose spectrograms should nearly match with the natural sentences.

Page 16: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])16

Cepstral Analysis

Page 17: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])17

A Sample Speech Spectrum

Frequency (Hz)

dB

• Peaks denote dominant frequency components in the speech signal

• Peaks are referred to as formants• Formants carry the identity of the sound

Page 18: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])18

What we want to Extract? –Spectral Envelope

• Formants and a smooth curve connecting them• This Smooth curve is referred to as spectral envelope

Frequency (Hz)

dB

Page 19: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])19

Spectral Envelope

Spectral Envelope

Spectrum

Spectral details

Page 20: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])20

Spectral Envelope

Spectral Envelope

Spectrum

Spectral details

log X[k]

log H[k]

log E[k]

Page 21: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])21

Spectral Envelope

Spectral Envelope

Spectrum

Spectral details

log X[k]

log H[k]

log E[k]

log X[k] = log H[k] + log E[k]

1. Our goal: We want to separate spectral envelope and spectral details from the spectrum.

2. i.e Given log X[k], obtain log H[k] and log E[k], such that log X[k] = log H[k] + log E[k]

Page 22: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])22

How to achieve this separation ?

Page 23: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])23

Play a Mathematical Trick

Spectral Envelope

Spectral details

Spectrum

• Trick: Take FFT of the spectrum!!

• An FFT on spectrum referred to as Inverse FFT (IFFT).

• Note: We are dealing with spectrum in log domain (part of the trick)

• IFFT of log spectrum would represent the signal in pseudo-frequency axis

Page 24: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])24

Play a Mathematical Trick

Spectral Envelope

A pseudo-frequency axis

Spectral details

Spectrum

Page 25: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])25

Play a Mathematical Trick

Spectral Envelope

Spectrum

Spectral detailsA pseudo-frequency

axis

Low Freq. region

High Freq. region

Page 26: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])26

Play a Mathematical Trick

Spectral Envelope

Spectrum

Spectral detailsA pseudo-frequency

axis

Low Freq. region

High Freq. region

IFFT

Page 27: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])27

Play a Mathematical Trick

Spectral Envelope

Spectrum

Spectral detailsA pseudo-frequency

axis

Low Freq. region

High Freq. region

IFFT

Treat this as a sine wave

with 4 cycles per sec.

Page 28: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])28

Play a Mathematical Trick

Spectral Envelope

Spectrum

Spectral detailsA pseudo-frequency

axis

Low Freq. region

High Freq. region

IFFT

Treat this as a sine wave

with 4 cycles per sec.

Gives a peak at 4 Hz in frequency

axis

Page 29: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])29

Play a Mathematical Trick

Spectral Envelope

Spectrum

Spectral detailsA pseudo-frequency

axis

Low Freq. region

High Freq. region

IFFT

Treat this as a sine wave

with 4 cycles per sec.

Gives a peak at 4 Hz in frequency

axis

Page 30: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])30

Play a Mathematical Trick

Spectral Envelope

Spectrum

Spectral detailsA pseudo-frequency

axis

Low Freq. region

High Freq. region

IFFT

Page 31: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])31

Play a Mathematical Trick

Spectral Envelope

Spectrum

Spectral detailsA pseudo-frequency

axis

Low Freq. region

High Freq. region

IFFT

Treat this as a sine wave with 100 cycles per

sec.

Gives a peak at 100 Hz in frequency

axis

Page 32: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])32

Play a Mathematical Trick

Spectral Envelope

Spectrum

Spectral detailsA pseudo-frequency

axis

Low Freq. region

High Freq. region

IFFT

IFFT

Page 33: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])33

Play a Mathematical Trick

Spectral Envelope

Spectrum

Spectral detailsA pseudo-frequency

axis

Page 34: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])34

Play a Mathematical Trick

Spectral Envelope

Spectrum

Spectral detailsA pseudo-frequency

axis

IFFT

log X[k] = log H[k] + log E[k]

log H[k]

log E[k]

Page 35: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])35

Play a Mathematical Trick

Spectral Envelope

Spectrum

Spectral detailsA pseudo-frequency

axis

IFFT

log X[k] = log H[k] + log E[k]

log H[k]

log E[k]

x[k] = h[k] + e[k]

Page 36: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])36

Play a Mathematical Trick

Spectral Envelope

Spectrum

Spectral detailsA pseudo-frequency

axis

IFFT

log X[k] = log H[k] + log E[k]

log H[k]

log E[k]

x[k] = h[k] + e[k]

In practice all you have access to only log X[k] and hence you can obtain x[k]

Page 37: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])37

Play a Mathematical Trick

Spectral Envelope

Spectrum

Spectral detailsA pseudo-frequency

axis

IFFT

log X[k] = log H[k] + log E[k]

log H[k]

log E[k]

x[k] = h[k] + e[k]

If you know x[k] Filter the low

frequency region to get h[k]

Page 38: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])38

Play a Mathematical Trick

Spectral Envelope

Spectrum

Spectral details

A pseudo-frequency axis

IFFT

log X[k] = log H[k] + log E[k]

log H[k]

log E[k]

x[k] = h[k] + e[k]

• x[k] is referred to as Cepstrum • h[k] is obtained by considering

the low frequency region of x[k].• h[k] represents the spectral

envelope and is widely used as feature for speech recognition

Page 39: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])39

Cepstral Analysis

][][][

sidesboth on FFTinverseTaking

||][||log||][||log||][||log

sidesboth on Log Take

magnitude denotes||.||

||][||||][||||][||

][][][

kekhkx

kEkHkX

kEkHkX

kEkHkX

+=

+=

−=

=

Page 40: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])40

Mel-Frequency Analysis

Page 41: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])41

Review: What we did

• We captured spectral envelope (curve connecting all formants)

• BUT: Perceptual experiments say human ear concentrates on certain regions rather than using whole of the spectral envelope….

Frequency (Hz)

dB

Page 42: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])42

Mel-Frequency Analysis

• Mel-Frequency analysis of speech is based on human perception experiments

• It is observed that human ear acts as filter – It concentrates on only certain frequency

components

• These filters are non-uniformly spaced on the frequency axis– More filters in the low frequency regions – Less no. of filters in high frequency regions

Page 43: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])43

Mel-Frequency Filters

Page 44: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])44

Mel-Frequency FiltersMore no. of filters in low freq. region

Lesser no. of filters in high freq. region

Page 45: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])45

Mel-Frequency Cepstral Coefficients (MFCC)

• Spectrum � Mel-Filters � Mel-Spectrum

• Say log X[k] = log (Mel-Spectrum) • NOW perform Cepstral analysis on log X[k]

– log X[k] = log H[k] + log E[k]– Taking IFFT – x[k] = h[k] + e[k]

• Cepstral coefficients h[k] obtained for Mel-spectrum are referred to as Mel-Frequency Cepstral Coefficients often denoted by *MFCC*

Page 46: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])46

Speech signal represented as a sequence of spectral vectors

FFT

Spectrum

FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT

Mel-Filters

Cepstral Analy.

Page 47: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])47

Speech signal represented as a sequence of CEPSTRAL vectors

FFT

Spectrum

FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT

Cepstral Vectors

Page 48: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])48

Why we are going to use MFCC

• Speech synthesis– Used for joining two speech segments S1 and S2– Represent S1 as a sequence of MFCC– Represent S2 as a sequence of MFCC– Join at the point where MFCCs of S1 and S2 have

minimal Euclidean distance

• Used in speech recognition – MFCC are mostly used features in state-of-art speech

recognition system

Page 49: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])49

Summary: Process of Feature Extraction

• Speech is analyzed over short analysis window• For each short analysis window a spectrum is obtained

using FFT • Spectrum is passed through Mel-Filters to obtain Mel-

Spectrum• Cepstral analysis is performed on Mel-Spectrum to

obtain Mel-Frequency Cepstral Coefficients• Thus speech is represented as a sequence of Cepstral

vectors• It is these Cepstral vectors which are given to pattern

classifiers for speech recognition purpose

Page 50: Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Speech Technology - Kishore Prahallad ([email protected])50

Additional Reading

• Chapter 6– Pg: 273 – 281

– Pg: 304 – 311– Pg: 314 - 316


Recommended