Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Speech Processing

Govind

Center for Computational Engineering & Networking

Amrita Vishwa Vidyapeetham

Govind CEN, Amrita Vishwa Vidyapeetham


Prerequisites

Introduction

Speech Production


Outline

Introduction

Human Speech Production and Perception Systems

Representation of Speech in the Time and Frequency

Domains

Speech Sounds and Features

Signal Processing Methods for Estimating Speech

Features

Speech Processing Applications

Speech RecognitionSpeech Synthesis



Prerequisites

Introduction

Speech Production


Prerequisites: S&S, DSP & ADSP

Prior Knowledge Required:

Signals and Systems

Digital signal Processing

Advanced DSP



Prerequisites

Introduction

Speech Production



Signals and Systems

Classification of Signals

LTI systems

Correlation/Convolution Operations

Fourier Representation: FS, DTFS, DTFT,DFT,FFT,

Z-transform

Concepts of Impulse Response, Frequency Response etc.



Prerequisites

Introduction

Speech Production



Digital signal Processing

Sampling: Nyquist, Aliasing

FFT implementation of DFT

Design of FIR and IIR filters

Structures for realization of Filters

Multirate signal processing: Filter banks



Prerequisites

Introduction

Speech Production



Advanced DSP

Time-Frequency Analysis

TFA by STFT

TFA by wigner Distribututions

TFA by Wavelets



Prerequisites

Introduction

Speech Production



References

L. Rabiner, Biing-Hwang Juang and B.

Yegnanarayana,"Fundamentals of Speech

Recognition",Pearson Education Inc.2009

Douglas O’Shaughnessy,"Speech

Communication",University Press,2001

Thomas F Quatieri,"Discrete Time Speech Signal

Processing", Pearson Education Inc.,2004



Prerequisites

Introduction

Speech Production


Introduction

Information in Speech

Message

Language

AccentSpeaker

Emotions/Stress

ApplicationsRecognition

Speech recognition

Speaker Recognition/Verification

Emotion Recognition etc..

Synthesis

Text to Speech Synthesis

Speech Enhancement

Voice Conversion



Prerequisites

Introduction

Speech Production


Applications:Recognition

Speech Objective Information Extracted

Message Author of the danger...

Speaker Its Govind Speaking

Speaker claim has to

be verified

Hi Govind, your claim is ac-

cepted



Prerequisites

Introduction

Speech Production


Applications:Synthesis

Input Objective Output

Text To Speech Synthesis

Text (Epochs Occur... Synthesize Text

Speech Enhancement

Remove noise

Remove reverberation

Enhance desired

speaker speech

Voice Conversion

Convert source

speaker speech target

speakr speech



Prerequisites

Introduction

Speech Production


What makes automatic processing of speech

Complicated?

Its an inter-disciplinary area1 Signal Processing: The process of extracting relevant information from

speech signal2 Physics: The science of understanding relationship between physical

speech signal and physiological mechanisms that produced it.3 Pattern Recognition: Grouping or classifying patterns of various events

in speech4 Communication and information theory: Deals with efficient way of

encodng or decoding parameters of speech, efficient serach for patterns ofinterest in speech (dynamic programming, viterbi search, stack algorithmsetc..)

5 Linguistics: The relationship between sounds (phonology) with syntaxand semantics of a language and sense that derived from the meaning(pragmatics)

6 Computer Science: The study of diferent algorithms for implementing inSoftware/Hardware

7 Psychology: Understanding the psychological state of thespeaker/listener will be helpful for the tasks like emotion analysis.



Prerequisites

Introduction

Speech Production


Speaker-Listener Schematic Diagram in Speech

Communication

Figure: Schematic Diagram of Speech Communication: FigureCourtesy- Rabiner et al.



Prerequisites

Introduction

Speech Production


Production-Perception Block Diagram

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��!��

��

��" � #��

#��

� ��

��

��

#��

��

��

$" ��

��

�%��

��

Figure: Speech production BlockDiagram: Figure Courtesy- Rabiner

et al.



Prerequisites

Introduction

Speech Production


Speech Production

Figure: Speech production mechanism: Figure Courtesy- Thomas F. Quatieri,"Discrete-Time Speech Signal Processing", Chapter. 3, pp. 58, Pearson Edu., Delhi



Prerequisites

Introduction

Speech Production


Mechanical Equivalent of Speech Production System

Figure: Speech production mechanism: Figure Courtesy- Rabiner et

al.



Prerequisites

Introduction

Speech Production


Spectro-Temporal Representation

classification of Phonemes

Representation of Speech Signal

0 0.5 1 1.5 2 2.5−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Figure: Speech Signal in Time domain



Prerequisites

Introduction

Speech Production




Glottal Air Flow During Speech Production

Figure: Glottal air flow: Courtesy- Rabinar et al.



Prerequisites

Introduction

Speech Production




Glottal Air Flow: Graphical Illustration

1.3 1.35 1.4 1.45 1.5 1.55

x 104

−1

−0.5

0

0.5

Time (Samples)

Am

plitu

de

Speech Waveform

1.3 1.35 1.4 1.45 1.5 1.55

x 104

−1

−0.5

0

0.5

Time (Samples)

Am

plitu

de

Glottal Flow: EGG

Speech EGGGlottis

Vibration



Prerequisites

Introduction

Speech Production




Classification of Speech Sounds

Silence (S): No Speech is produced

Unvoiced (U): Vocal folds are not vibrating

Voiced (V): Periodic vibration of vocal cords

0 0.5 1 1.5 2 2.5−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

US S

V

V

V

Figure: Speech signal in time domainGovind CEN, Amrita Vishwa Vidyapeetham


Prerequisites

Introduction

Speech Production




Classification of Speech Sounds

Separation of voiced sounds from unvoiced and silence

sounds is known as voiced-non-voiced detection

Issues in voiced-non-voiced detection:

Difficult to identify weak unvoiced sound from silence

Difficult to distinguish weakly periodic voiced sounds fromunvoiced sounds



Prerequisites

Introduction

Speech Production




SpectroGrams: Narrow-band & Wide-band



Prerequisites

Introduction

Speech Production




Spectral Envelope from a Long Segment of Speech

0

10

20

30

0

1000

2000

3000

40000

20

40

FrameIndex

Frequency (Hz)

Mag

nit

ud

e



Prerequisites

Introduction

Speech Production




Classification of sound units

� � � � � � � ��

��

��

��

��

��

��

��

��

� � ��

��

�� !��

�� "�� # �$�

�� %��

��&��'�

(�&��'�

��&� ��'�

)�&��'�

%&*��$'�

��&*��'�

��&��'�

�

�& ��'�

�&+*�,'�

�&��'�

�

,�&*�,'�

�&��'�

�,�&* ��'�

+�&*�,'�

�-�&��'�

.�&.��'�

��&��'�

�

��&� ��'�

&��'�

�

��&��'�

,�&,��'�

�

��&��'�

�&��'�

��&��'�

�

��&��'�

�

*�&* ��'�

��&��*�'�

��&��'�

�

$�&$��'�

��&��'�

�&��'�

�

�� %��

��&� �'�

��&�� '�

-&-��'�

�

�&��'�

��&��'�

�&� �'�

��&��'�

�



Prerequisites

Introduction

Speech Production




Representation of sound units in speech

Sounds are classified into vowels and consonant

Vowels: By exciting fixed vocaltract shape with quasi

periodic glottal pulses

Vowels are classified into front, mid and back based on thetongue-hump-position

Front vowels:/i/("eve"), /I/("it"),//("at"),/e/("hate")

Mid vowels: /a/("father"), /Λ/("Up")Back Vowels: /U/("foot"),/u/("boot"),/o/("Obey")

Another classification is based on the length of vowels:

Long and short

Diphthongs: Combination of two vowels

/ay/ as in "buy",/aw/ as in "down",/ey/ as in "bait",/o/ as in

"boat",/cy/ as in "boy" etc.



Prerequisites

Introduction

Speech Production




Front Vowel

Front

VowelSpeech Signal Spectrogram

I(It)0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.32

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

1000

2000

3000

4000

5000

6000

7000

e(Hate)0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.32

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

1000

2000

3000

4000

5000

6000

7000

i(eve)0.32 0.34 0.36 0.38 0.4 0.42

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1



Prerequisites

Introduction

Speech Production




Vowel Analysis

Front vowels found to show high frequency resonance

Front vowels are discriminated among each other by the

tongue height during the vowel production

Mid vowels found to show well separated and balanced

resonant frequency distribution

Back vowels shows almost no energy beyond low

frequency regions



Prerequisites

Introduction

Speech Production




Diphthongs



Prerequisites

Introduction

Speech Production




Semivowels

Group of sounds consisting of /w/,/r/,/l/,/y/

difficult to characterize because they are vowel like in

nature

Characterized by gliding transition in vocaltract area

functions between adjacent phonemes

Best described as transitional vowel like sounds



Prerequisites

Introduction

Speech Production




Nasal Consonants

Group of sounds consisting of /m/,/n/,/η/

Produced with glottal Excitation and vocaltract totally

constricted along the oral passageway

Velam is lowered to block the air passage through oral

cavity and allowing through nasal cavity

Due the acoustic coupling of oral cavity to the pharynx, anti

resonances will be created

/m/,/n/ and /η/ are produced by the constiction at lips,

behind the teeth and at velum, respectively.



Prerequisites

Introduction

Speech Production




Nasalized Vowels



Prerequisites

Introduction

Speech Production




Unvoiced Fricatives

Produced by exciting vocaltract with a turbulant airflow

through a narrow constriction

/f/("four"),/θ/("thing"),/s/("sat") and /sh/ ("shut") are the

class of fricative sounds

/f/: Constriction at teeth

/s/: Constriction near middle of oral cavity

/sh/: constriction at the end of oral tract



Prerequisites

Introduction

Speech Production




Voiced Fricatives

/v/("vat"),/δ/("zoo"),/z/("zoo") and /zh/("azure") are the class

of fricative sounds

/v/: Constriction at teeth

/z/: Constriction near middle of oral cavity

/zh/: constriction at the end of oral tract

Except glottal vibrations, the place of articulation remains

same as that of unvoiced fricatives


Date post:	20-Jan-2015
Category:	Engineering
Upload:	dgovind
View:	152 times
Download:	0 times