+ All Categories
Home > Documents > Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0...

Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0...

Date post: 19-Jul-2020
Category:
Upload: others
View: 9 times
Download: 0 times
Share this document with a friend
35
Multimedia Systems Multimedia Systems Speech II Speech II Course Presentation Course Presentation Mahdi Amiri February 2014 Sharif University of Technology
Transcript
Page 1: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Multimedia SystemsMultimedia Systems

Speech IISpeech II

Course PresentationCourse Presentation

Mahdi Amiri

February 2014

Sharif University of Technology

Page 2: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Speech Compression

Based on Time Domain analysis

Differential Pulse-Code Modulation (DPCM)

Adaptive DPCM (ADPCM)

Road MapRoad Map

Page 1 Multimedia Systems, Speech II

Based on Frequency Domain analysis

Linear Predictive Coding (LPC)

Code Excited Linear Prediction (CELP)

Page 3: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Differential PCM (DPCM)IdeaIdea

Take advantage of data redundancy

Page 2 Multimedia Systems, Speech II

[… 110 112 111 112 112 114 115 115 114 114… ] [… +2 -1 +1 0 +2 +1 0 -1 0 …]

Or histogram of PCM samples in a chunk

of digitized audio.Typ. Chunk length: 50 ms

Page 4: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Differential PCM (DPCM)

Simplest prediction: The difference between the current

sample value and previous sample is quantized. It means

we predicted current sample by assuming it to be equal to

the previous sample.

Another better prediction: The difference between the

PredictionPrediction

Page 3 Multimedia Systems, Speech II

Another better prediction: The difference between the

current sample value and the average of e.g. 2 previous

samples.

Much better prediction: The difference between the

current sample value and the weighted average of e.g. 10

previous samples.

Page 5: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Differential PCM (DPCM)Basic SchemeBasic Scheme

General Predictive Coding

Page 4 Multimedia Systems, Speech II

1Delta Modulation (DM): i n ia x z−−⇒∑

Problem?

Page 6: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Differential PCM (DPCM)Error PropagationError Propagation

General Predictive Coding

Page 5 Multimedia Systems, Speech II

The output of dequantizer in decoder is not

equal with the input of the quantizer in the

encoder

� The input of predictor in decoder is not

the same as input values of predictor in

encoder

� This is the source of error propagation.

General Predictive Coding

x_n: 0 0.6 1.2 1.8 2.4 3.0 3.6 4.2

p_n: 0 0 0.6 1.2 1.8 2.4 3.0 3.6

d_n: 0 0.6 0.6 0.6 0.6 0.6 0.6 0.6

Q : 0 1 1 1 1 1 1 1

^Q : 0 1 1 1 1 1 1 1

^dn: 0 1 1 1 1 1 1 1

^pn: 0 0 1 2 3 4 5 6

^xn: 0 1 2 3 4 5 6 7

Err: 0 0.4 0.8 1.2 1.6 2.0 2.4 2.8

Error propagation example in above structure when input signal increases constantly by +0.6.

Quantization step size is 1; Therefore, e.g. -0.5 thr. 0.5 quantized as 0 and 0.5 thr. 1.5 quantized as 1.

Page 7: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Differential PCM (DPCM)Better StructureBetter Structure

Page 6 Multimedia Systems, Speech II

Page 8: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Adaptive DPCM (ADPCM)IdeaIdea

Page 7 Multimedia Systems, Speech II

Problem?

Page 9: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Adaptive DPCM (ADPCM)Size of Quantization StepSize of Quantization Step

Delta Modulation (DM)

1 bit quantizer: 0 means + and 1 means ∆ −∆

Page 8 Multimedia Systems, Speech II

ADM: [ ] [ 1]n M n∆ = ∆ −

12, 2

P Q= =

1 if [ ] [ 1]

1 if [ ] [ 1]

M P c n c n

M Q c n c n

= > = −

= < ≠ −

Adaptive Delta Modulation (ADM)

Page 10: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Speech Compression ConceptsFFT, No Time LocalizationFFT, No Time Localization

Speech Signal

Page 9 Multimedia Systems, Speech II

FFT

(is only localized in frequency)

Joseph Fourier, 1768-1830

Page 11: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Speech Compression ConceptsFFT, No Time Localization, Ex. 1FFT, No Time Localization, Ex. 1

Page 10 Multimedia Systems, Speech II

See Power Spectral Density (PSD) examples in MATLAB

Page 12: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

0 0.2 0.4 0.6 0.8 1-1

-0.5

0

0.5

1x4 = x1+x2+x3

1x5 = x1 then x2 then x3

Speech Compression ConceptsFFT, No Time Localization, Ex. 2FFT, No Time Localization, Ex. 2

0 0.005 0.01 0.015-1

0

1x1, freq.: 400 Hz

0 1000 2000 3000 4000 5000 6000 7000 80000

50

100

150PSD x1

Frequency (Hz)

0 0.005 0.01 0.015-1

0

1x1, freq.: 900 Hz

0 1000 2000 3000 4000 5000 6000 7000 80000

50

100

150PSD x2

Frequency (Hz)

1x1, freq.: 1600 Hz

150PSD x3

0 1000 2000 3000 4000 5000 6000 7000 80000

100

200

300

400

500PSD x4

Frequency (Hz)

500PSD x5

X4 = (x1 + x2 + x3) / 3

0 0.2 0.4 0.6 0.8 1-1

-0.5

0

0.5

1

0 0.2 0.4 0.6 0.8 1-1

-0.5

0

0.5

1x6 = x3 then x1 then x2

Page 11 Multimedia Systems, Speech II

See Power Spectral Density (PSD) examples in MATLAB

0 0.005 0.01 0.015-1

0

1

0 1000 2000 3000 4000 5000 6000 7000 80000

50

100

150

Frequency (Hz)

0 0.005 0.01 0.015-3

-2

-1

0

1

2

3x4 = x1+x2+x3

0 1000 2000 3000 4000 5000 6000 7000 80000

100

200

300

400

500

Frequency (Hz)

0 1000 2000 3000 4000 5000 6000 7000 80000

100

200

300

400

500PSD x6

Frequency (Hz)

0.325 0.33 0.335 0.34-1

-0.5

0

0.5

1x5 = x1 then x2 then x3

0.325 0.33 0.335 0.34 0.345-1

-0.5

0

0.5

1x6 = x3 then x1 then x2

Part of

x4

X5 = x1 then x2 then x3

X6 = x3 then x1 then x2Zoomed

Parts �

Part of

x5

Part of

x6

Page 13: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Speech Compression ConceptsSTFTSTFT

Speech Signal

Page 12 Multimedia Systems, Speech II

STFT

(fixed time and frequency localization)

Dennis Gabor, 1900-1979

Page 14: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Speech Compression ConceptsSpectrogramSpectrogram

Page 13 Multimedia Systems, Speech II

3D surface spectrogram of a part

from a music piece.

Page 15: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Speech Compression ConceptsSpectrogramSpectrogram

Page 14 Multimedia Systems, Speech II

Spectrogram of a male voice saying ‘nineteenth century’.

Page 16: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Speech Compression ConceptsSpectrogram Display in Spectrogram Display in AudaCityAudaCity

Page 15 Multimedia Systems, Speech II

Waveform

Spectrogram

Page 17: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Speech Compression ConceptsSpectrogram Display in Spectrogram Display in AudaCityAudaCity

AudaCity | Edit | Preferences |

Spectrograms | FFT Window |

Window size

Page 16 Multimedia Systems, Speech II

FFT Window size:128

FFT Window size:1024

Page 18: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Speech Compression ConceptsSpectrogram, DemonstrationSpectrogram, Demonstration

Bat Echolocation Call Flute by Jean Pierre Rampal

Page 17 Multimedia Systems, Speech II

Bat Echolocation Call Flute by Jean Pierre Rampal

Singing Voice Face!

Page 19: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Speech Compression ConceptsFormantFormant

Page 18 Multimedia Systems, Speech II

The time and frequency domain

presentation of vowels /a/, /i/, and /u//a/

/i/

/u/

Page 20: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Speech Compression ConceptsSample ApplicationSample Application

A computing system to answer

questions posed in natural language.

Design a reliable speech recognition

unit for IBM Watson project.

Page 19 Multimedia Systems, Speech II

Jeopardy! champions Ken Jennings (left) and Brad Rutter (right) versus the IBM computer Watson

www-943.ibm.com/innovation/us/watson/

Dr. David Ferrucci, Watson Principal Investigator

Page 21: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Linear Predictive Coding (LPC)ModelingModeling

Page 20 Multimedia Systems, Speech II

Simplified

Page 22: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Linear Predictive Coding (LPC)Modeling (Hiss or Buzz)Modeling (Hiss or Buzz)

Buzzer � Filter

Speech = Formants + Residue

Chuncks: 30 thr. 50 frames/sec.

Page 21 Multimedia Systems, Speech II

1

[ ] [ ]P

i

i

x n a x n i=

= −∑%

Predictor for each frame:

Speech = Formants + Residue

Page 23: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Linear Predictive Coding (LPC)Modeling (Hiss or Buzz)Modeling (Hiss or Buzz)

The human vocal tract as an infinite impulse response (IIR) system Vowel /a/

Page 22 Multimedia Systems, Speech II

LPC Block Diagram

Page 24: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Linear Predictive Coding (LPC)Original Paper, Original Paper, AtalAtal--HanauerHanauer 19711971

Original

Page 23 Multimedia Systems, Speech II

Comparison of wide-band sound spectrograms for synthetic and original speech signal for the utterance "It's

time we rounded up that herd of Asian cattle," spoken by a male speaker

Original

Synthetic

Page 25: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Linear Predictive Coding (LPC)Voiced Frame ExampleVoiced Frame Example

Original

Page 24 Multimedia Systems, Speech II

Synthetic

Time Domain Frequency Domain

180 samples, Pitch period: 75

Page 26: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Linear Predictive Coding (LPC)Unvoiced Frame ExampleUnvoiced Frame Example

Original

Page 25 Multimedia Systems, Speech II

Synthetic:

White noise

with uniform

distribution

Time Domain Frequency Domain

180 samples

Page 27: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Code Excited Linear Prediction

Problem of LPCWhere there is both Hiss and Buzz

Solution

CELPCELP

Encoder

Page 26 Multimedia Systems, Speech II

SolutionEncode residue

MethodVector Quantization

(Codebook)Decoder

Page 28: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Vector QuantizationBlock DiagramBlock Diagram

Page 27 Multimedia Systems, Speech II

Page 29: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Vector QuantizationExampleExample

Sample scalar quantizer

We have 3 possible colors for

each square; so we can quantize

each square with 2 bits � (28 *

2 = 56 bits for all 28 (7*4)

squares.

Page 28 Multimedia Systems, Speech II

squares.

Sample vector quantizer

We have 8 forms in the

codebook; so we can quantize

each form with 3 bits � (7 * 3

= 21 bits for all 28 (7*4)

squares.Codebook

Page 30: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Vector QuantizationCodebook DesignCodebook Design

Page 29 Multimedia Systems, Speech II

Page 31: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Comparison of Speech CodersSample SpeechSample Speech

A lathe is a big tool. Grab every dish of sugar.

Page 30 Multimedia Systems, Speech II

Page 32: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Comparison of Speech CodersDemonstrationDemonstration

Page 31 Multimedia Systems, Speech II

Original ADPCM

LPC CELP

Page 33: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Speech Coding

G.711

PCM

u-law, a-law

64, 80 and 96 kbps

G.722

ITUITU--T StandardsT Standards

Check out a complete list athttp://en.wikipedia.org/wiki/List_of_codecs#Audio_codecs

A comparison of Internet audio compression formats

http://www.sericyb.com.au/audio.html

Page 32 Multimedia Systems, Speech II

G.722

ADPCM

48, 56 and 64 kbps

G.728

A form of CELP

16 kbps

Vocoders

http://www.sericyb.com.au/audio.html

Page 34: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Speech Coding

HawkVoice

Free and Open Source CodeFree and Open Source Code

http://hawksoft.com/hawkvoice/

Page 33 Multimedia Systems, Speech II

Check out voice samples of HawkVoice™ codecs at

http://hawksoft.com/hawkvoice/codecs.shtml

Page 35: Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 x4 = x1+x2+x3 1 x5 = x1 then x2 then x3 Speech Compression

Thank You

Multimedia SystemsMultimedia Systems

Speech IISpeech II

Page 34 Multimedia Systems, Speech II

Thank You

1. http://ce.sharif.edu/~m_amiri/

2. http://www.dml.ir/

FIND OUT MORE AT...

Next Session: Entropy CodingNext Session: Entropy Coding


Recommended