+ All Categories
Home > Engineering > Speech processinglecworkshop

Speech processinglecworkshop

Date post: 20-Jan-2015
Category:
Upload: dgovind
View: 152 times
Download: 0 times
Share this document with a friend
Description:
The objective of the presentation is to provide and overview of speech processing
Popular Tags:
32
Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Speech Processing Govind Center for Computational Engineering & Networking Amrita Vishwa Vidyapeetham Govind CEN, Amrita Vishwa Vidyapeetham
Transcript
Page 1: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Speech Processing

Govind

Center for Computational Engineering & Networking

Amrita Vishwa Vidyapeetham

Govind CEN, Amrita Vishwa Vidyapeetham

Page 2: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Outline

Introduction

Human Speech Production and Perception Systems

Representation of Speech in the Time and Frequency

Domains

Speech Sounds and Features

Signal Processing Methods for Estimating Speech

Features

Speech Processing Applications

Speech RecognitionSpeech Synthesis

Govind CEN, Amrita Vishwa Vidyapeetham

Page 3: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Prerequisites: S&S, DSP & ADSP

Prior Knowledge Required:

Signals and Systems

Digital signal Processing

Advanced DSP

Govind CEN, Amrita Vishwa Vidyapeetham

Page 4: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Prerequisites: S&S, DSP & ADSP

Signals and Systems

Classification of Signals

LTI systems

Correlation/Convolution Operations

Fourier Representation: FS, DTFS, DTFT,DFT,FFT,

Z-transform

Concepts of Impulse Response, Frequency Response etc.

Govind CEN, Amrita Vishwa Vidyapeetham

Page 5: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Prerequisites: S&S, DSP & ADSP

Digital signal Processing

Sampling: Nyquist, Aliasing

FFT implementation of DFT

Design of FIR and IIR filters

Structures for realization of Filters

Multirate signal processing: Filter banks

Govind CEN, Amrita Vishwa Vidyapeetham

Page 6: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Prerequisites: S&S, DSP & ADSP

Advanced DSP

Time-Frequency Analysis

TFA by STFT

TFA by wigner Distribututions

TFA by Wavelets

Govind CEN, Amrita Vishwa Vidyapeetham

Page 7: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Prerequisites: S&S, DSP & ADSP

References

L. Rabiner, Biing-Hwang Juang and B.

Yegnanarayana,"Fundamentals of Speech

Recognition",Pearson Education Inc.2009

Douglas O’Shaughnessy,"Speech

Communication",University Press,2001

Thomas F Quatieri,"Discrete Time Speech Signal

Processing", Pearson Education Inc.,2004

Govind CEN, Amrita Vishwa Vidyapeetham

Page 8: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Introduction

Information in Speech

Message

Language

AccentSpeaker

Emotions/Stress

ApplicationsRecognition

Speech recognition

Speaker Recognition/Verification

Emotion Recognition etc..

Synthesis

Text to Speech Synthesis

Speech Enhancement

Voice Conversion

Govind CEN, Amrita Vishwa Vidyapeetham

Page 9: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Applications:Recognition

Speech Objective Information Extracted

Message Author of the danger...

Speaker Its Govind Speaking

Speaker claim has to

be verified

Hi Govind, your claim is ac-

cepted

Govind CEN, Amrita Vishwa Vidyapeetham

Page 10: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Applications:Synthesis

Input Objective Output

Text To Speech Synthesis

Text (Epochs Occur... Synthesize Text

Speech Enhancement

Remove noise

Remove reverberation

Enhance desired

speaker speech

Voice Conversion

Convert source

speaker speech target

speakr speech

Govind CEN, Amrita Vishwa Vidyapeetham

Page 11: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

What makes automatic processing of speech

Complicated?

Its an inter-disciplinary area1 Signal Processing: The process of extracting relevant information from

speech signal2 Physics: The science of understanding relationship between physical

speech signal and physiological mechanisms that produced it.3 Pattern Recognition: Grouping or classifying patterns of various events

in speech4 Communication and information theory: Deals with efficient way of

encodng or decoding parameters of speech, efficient serach for patterns ofinterest in speech (dynamic programming, viterbi search, stack algorithmsetc..)

5 Linguistics: The relationship between sounds (phonology) with syntaxand semantics of a language and sense that derived from the meaning(pragmatics)

6 Computer Science: The study of diferent algorithms for implementing inSoftware/Hardware

7 Psychology: Understanding the psychological state of thespeaker/listener will be helpful for the tasks like emotion analysis.

Govind CEN, Amrita Vishwa Vidyapeetham

Page 12: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Speaker-Listener Schematic Diagram in Speech

Communication

Figure: Schematic Diagram of Speech Communication: FigureCourtesy- Rabiner et al.

Govind CEN, Amrita Vishwa Vidyapeetham

Page 13: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Production-Perception Block Diagram

����������

����� ����

���������

������

�����

��������

��� ����

��������� �

��� ��

����� ���

�������

�����������

��������

����� ���

�������

��������

����� �������

���������

������ ����

������

������� ����

������

��!����

�� ����

��" � #�������

#������

� ����� ���

�� ����

���� ����

#�������

�����

��� ��������� ���

$" �� ����

�������

�%�� ��

���������

Figure: Speech production BlockDiagram: Figure Courtesy- Rabiner

et al.

Govind CEN, Amrita Vishwa Vidyapeetham

Page 14: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Speech Production

Figure: Speech production mechanism: Figure Courtesy- Thomas F. Quatieri,"Discrete-Time Speech Signal Processing", Chapter. 3, pp. 58, Pearson Edu., Delhi

Govind CEN, Amrita Vishwa Vidyapeetham

Page 15: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Mechanical Equivalent of Speech Production System

Figure: Speech production mechanism: Figure Courtesy- Rabiner et

al.

Govind CEN, Amrita Vishwa Vidyapeetham

Page 16: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Spectro-Temporal Representation

classification of Phonemes

Representation of Speech Signal

0 0.5 1 1.5 2 2.5−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Figure: Speech Signal in Time domain

Govind CEN, Amrita Vishwa Vidyapeetham

Page 17: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Spectro-Temporal Representation

classification of Phonemes

Glottal Air Flow During Speech Production

Figure: Glottal air flow: Courtesy- Rabinar et al.

Govind CEN, Amrita Vishwa Vidyapeetham

Page 18: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Spectro-Temporal Representation

classification of Phonemes

Glottal Air Flow: Graphical Illustration

1.3 1.35 1.4 1.45 1.5 1.55

x 104

−1

−0.5

0

0.5

Time (Samples)

Am

plitu

de

Speech Waveform

1.3 1.35 1.4 1.45 1.5 1.55

x 104

−1

−0.5

0

0.5

Time (Samples)

Am

plitu

de

Glottal Flow: EGG

Speech EGGGlottis

Vibration

Govind CEN, Amrita Vishwa Vidyapeetham

Page 19: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Spectro-Temporal Representation

classification of Phonemes

Classification of Speech Sounds

Silence (S): No Speech is produced

Unvoiced (U): Vocal folds are not vibrating

Voiced (V): Periodic vibration of vocal cords

0 0.5 1 1.5 2 2.5−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

US S

V

V

V

Figure: Speech signal in time domainGovind CEN, Amrita Vishwa Vidyapeetham

Page 20: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Spectro-Temporal Representation

classification of Phonemes

Classification of Speech Sounds

Separation of voiced sounds from unvoiced and silence

sounds is known as voiced-non-voiced detection

Issues in voiced-non-voiced detection:

Difficult to identify weak unvoiced sound from silence

Difficult to distinguish weakly periodic voiced sounds fromunvoiced sounds

Govind CEN, Amrita Vishwa Vidyapeetham

Page 21: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Spectro-Temporal Representation

classification of Phonemes

SpectroGrams: Narrow-band & Wide-band

Govind CEN, Amrita Vishwa Vidyapeetham

Page 22: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Spectro-Temporal Representation

classification of Phonemes

Spectral Envelope from a Long Segment of Speech

0

10

20

30

0

1000

2000

3000

40000

20

40

FrameIndex

Frequency (Hz)

Mag

nit

ud

e

Govind CEN, Amrita Vishwa Vidyapeetham

Page 23: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Spectro-Temporal Representation

classification of Phonemes

Classification of sound units

� � � � � � � �����������

�������

��� ��

��

�����������

�������������

�������� �������

������ ��

��

� � ���

���������

�� ������ !�������

���� "��� # �$�

������� %��������

��&���'�

(�&��'�

��&� ��'�

)�&���'�

%&*��$'�

��&*���'�

��&��'�

�& ���'�

�&+*�,'�

�&���'�

,�&*�,'�

�&����'�

�,�&* ��'�

+�&*�,'�

�-�&�����'�

.�&.����'�

���&�����'�

��&� ��'�

&��'�

��&���'�

,�&,��'�

��&���'�

�&���'�

��&����'�

��&��'�

*�&* ��'�

��&��*�'�

��&���'�

$�&$��'�

��&���'�

�&���'�

������� %��������

��&� �'�

��&�� �'�

-&-��'�

�&��'�

����&�����'�

�&� �'�

��&������'�

Govind CEN, Amrita Vishwa Vidyapeetham

Page 24: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Spectro-Temporal Representation

classification of Phonemes

Representation of sound units in speech

Sounds are classified into vowels and consonant

Vowels: By exciting fixed vocaltract shape with quasi

periodic glottal pulses

Vowels are classified into front, mid and back based on thetongue-hump-position

Front vowels:/i/("eve"), /I/("it"),//("at"),/e/("hate")

Mid vowels: /a/("father"), /Λ/("Up")Back Vowels: /U/("foot"),/u/("boot"),/o/("Obey")

Another classification is based on the length of vowels:

Long and short

Diphthongs: Combination of two vowels

/ay/ as in "buy",/aw/ as in "down",/ey/ as in "bait",/o/ as in

"boat",/cy/ as in "boy" etc.

Govind CEN, Amrita Vishwa Vidyapeetham

Page 25: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Spectro-Temporal Representation

classification of Phonemes

Front Vowel

Front

VowelSpeech Signal Spectrogram

I(It)0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.32

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

1000

2000

3000

4000

5000

6000

7000

e(Hate)0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.32

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

1000

2000

3000

4000

5000

6000

7000

i(eve)0.32 0.34 0.36 0.38 0.4 0.42

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Govind CEN, Amrita Vishwa Vidyapeetham

Page 26: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Spectro-Temporal Representation

classification of Phonemes

Vowel Analysis

Front vowels found to show high frequency resonance

Front vowels are discriminated among each other by the

tongue height during the vowel production

Mid vowels found to show well separated and balanced

resonant frequency distribution

Back vowels shows almost no energy beyond low

frequency regions

Govind CEN, Amrita Vishwa Vidyapeetham

Page 27: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Spectro-Temporal Representation

classification of Phonemes

Diphthongs

Govind CEN, Amrita Vishwa Vidyapeetham

Page 28: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Spectro-Temporal Representation

classification of Phonemes

Semivowels

Group of sounds consisting of /w/,/r/,/l/,/y/

difficult to characterize because they are vowel like in

nature

Characterized by gliding transition in vocaltract area

functions between adjacent phonemes

Best described as transitional vowel like sounds

Govind CEN, Amrita Vishwa Vidyapeetham

Page 29: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Spectro-Temporal Representation

classification of Phonemes

Nasal Consonants

Group of sounds consisting of /m/,/n/,/η/

Produced with glottal Excitation and vocaltract totally

constricted along the oral passageway

Velam is lowered to block the air passage through oral

cavity and allowing through nasal cavity

Due the acoustic coupling of oral cavity to the pharynx, anti

resonances will be created

/m/,/n/ and /η/ are produced by the constiction at lips,

behind the teeth and at velum, respectively.

Govind CEN, Amrita Vishwa Vidyapeetham

Page 30: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Spectro-Temporal Representation

classification of Phonemes

Nasalized Vowels

Govind CEN, Amrita Vishwa Vidyapeetham

Page 31: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Spectro-Temporal Representation

classification of Phonemes

Unvoiced Fricatives

Produced by exciting vocaltract with a turbulant airflow

through a narrow constriction

/f/("four"),/θ/("thing"),/s/("sat") and /sh/ ("shut") are the

class of fricative sounds

/f/: Constriction at teeth

/s/: Constriction near middle of oral cavity

/sh/: constriction at the end of oral tract

Govind CEN, Amrita Vishwa Vidyapeetham

Page 32: Speech processinglecworkshop

Organization: Speech Processing

Prerequisites

Introduction

Speech Production

Representation of Speech Signals

Spectro-Temporal Representation

classification of Phonemes

Voiced Fricatives

/v/("vat"),/δ/("zoo"),/z/("zoo") and /zh/("azure") are the class

of fricative sounds

/v/: Constriction at teeth

/z/: Constriction near middle of oral cavity

/zh/: constriction at the end of oral tract

Except glottal vibrations, the place of articulation remains

same as that of unvoiced fricatives

Govind CEN, Amrita Vishwa Vidyapeetham


Recommended