+ All Categories
Home > Documents > Hidden Markov Model (HMM) based Speech Synthesis for Urdu...

Hidden Markov Model (HMM) based Speech Synthesis for Urdu...

Date post: 20-Jun-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
22
Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language Presenter: Dr. Tania Habib
Transcript
Page 1: Hidden Markov Model (HMM) based Speech Synthesis for Urdu ...cle.org.pk/research/presentations/Hidden Markov... · Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language

Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language

Presenter:

Dr. Tania Habib

Page 2: Hidden Markov Model (HMM) based Speech Synthesis for Urdu ...cle.org.pk/research/presentations/Hidden Markov... · Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language

Outline:

2

• Overview

• Unit selection vs HMM based Speech Synthesis (HTS) [1]

• Development

• Requirements for Voice building

• Data Set

• Challenges

• Subjective Evaluation

• Erroneous Words

• Summary

Page 3: Hidden Markov Model (HMM) based Speech Synthesis for Urdu ...cle.org.pk/research/presentations/Hidden Markov... · Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language

Speech Synthesis Overview:

3

Text to be Synthesized

Natural Language Processing

(NLP)

Speech Synthesis Engine

Synthesized Speech

Page 4: Hidden Markov Model (HMM) based Speech Synthesis for Urdu ...cle.org.pk/research/presentations/Hidden Markov... · Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language

Types of Speech Synthesis:

4

• Rule-based, formant synthesis Hand-crafting each phonetic units by rules

• Corpus based: Concatenative synthesis (Unit Selection) High quality speech can be synthesized using waveform concatenation

algorithms.

To obtain various voices, a large amount of speech data is necessary.

Statistical parametric synthesis (HMM based) Generate speech parameters from statistical models

Voice quality can easily be changed by transforming HMM parameters.

Page 5: Hidden Markov Model (HMM) based Speech Synthesis for Urdu ...cle.org.pk/research/presentations/Hidden Markov... · Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language

Unit Selection vs. HTS

5

Unit Selection HTS

Advantages:

High Quality at Waveform level (Specific Domain)

• Small Foot Print• Smooth• Stable Quality

Disadvantages:

• Large footprints• Discontinuous• Unstable quality

Vocoder sound(Domain-independent)

Page 6: Hidden Markov Model (HMM) based Speech Synthesis for Urdu ...cle.org.pk/research/presentations/Hidden Markov... · Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language

HTS Overview:

6

SPEECHDATABASE Excitation

Parameterextraction

SpectralParameterExtraction

Excitationgeneration

Synthesis filter

TEXT

Text analysis

SYNTHESIZEDSPEECH

Parameter generationfrom HMMs

Context-dependent HMMs& state duration models

Labels Excitationparameters

Excitation

Spectralparameters

Speech signal Training part

Synthesis part

Training HMMs

Spectralparameters

Excitationparameters

Labels

[HTS Slides released by HTS Working Group slide no.21]

Page 7: Hidden Markov Model (HMM) based Speech Synthesis for Urdu ...cle.org.pk/research/presentations/Hidden Markov... · Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language

1. Annotated Training data.

2. Define speech features (MFCC, F0 and duration) for model training.

3. Sorting out unique context-dependent as well as context-independent phonemes (from the training data) for model training.

4. Unified question file for spectral, F0 and duration for context clustering.

Preliminary requirements for the HTS toolkit:

7

Page 8: Hidden Markov Model (HMM) based Speech Synthesis for Urdu ...cle.org.pk/research/presentations/Hidden Markov... · Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language

• Source: Paragraphs taken from Urdu Qaida of Grade 2 and 4 respectively

• Duration : 30 minutes

• Total number of utterances: 347

• Recording parameters: Sample rate : 8KHz (up-sampled to 48KHz)

Channel : Mono

Recording format: .WAV

Speaker: Native Urdu female speaker

Data Set Used:

8

Page 9: Hidden Markov Model (HMM) based Speech Synthesis for Urdu ...cle.org.pk/research/presentations/Hidden Markov... · Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language

Challenges:

99

• Generation of the full-context style labels.

• Addition of Prosodic Layers

• Segment

• Stress

• Syllable

• Word

• Unbalanced Training Data

• Defining the Question Set (Context Clustering)

Page 10: Hidden Markov Model (HMM) based Speech Synthesis for Urdu ...cle.org.pk/research/presentations/Hidden Markov... · Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language

Full-Context Format(1/2):

1010

SIL^A-L+I_I=A@ 1_2/A:0_0_1/B:0-0-2@2-1&2-8#1-3$1-1!0-1;0-0|I_I/C:1+0+2/D:0_0/E:content+2@1+5&1+4#0+1/F:content_2/G:0_0/H:9=5^1=2|NONE/I:8=6/J:17+11-2

Supra-Segmental Context

Segmental Context

Segmental Supra-Segmental

• Current Phoneme• Previous two Phonemes• Next two Phonemes

• Syllable• Stress• Word• Phrase• POS

Page 11: Hidden Markov Model (HMM) based Speech Synthesis for Urdu ...cle.org.pk/research/presentations/Hidden Markov... · Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language

Full-Context Format(2/2):

1111

x^x-SIL+A=L@1_0/A:0_0_0/B:0-0-0@1-0&1-1#1-1$1-1!0-0;0- …x^SIL-A+L=I_I@1_1/A:0_0_0/B:0-0-1@1-2&1-9#1-3$1-1!0-2;0- …

SIL^A-L+I_I=A@1_2/A:0_0_1/B:0-0-2@2-1&2-8#1-3$1-1!0-1;0-0 …A^L-I_I+A=P@2_1/A:0_0_1/B:0-0-2@2-1&2-8#1-3$1-1!0-1;0- …

� ۔۔۔�

� �� ��� ا

Page 12: Hidden Markov Model (HMM) based Speech Synthesis for Urdu ...cle.org.pk/research/presentations/Hidden Markov... · Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language

Questions on Segmental/Prosodic Layers:

1212

Phoneme {preceding, succeeding} two phonemes current phoneme

Syllable # of phonemes at {preceding, current, succeeding} syllable {accent, stress} of {preceding, current, succeeding} syllable Position of current syllable in current word # of {preceding, succeeding} {accented, stressed} syllable in current phrase # of syllables {from previous, to next} {accented, stressed} syllable Vowel within current syllable

Word Part of speech of {preceding, current, succeeding} word # of syllables in {preceding, current, succeeding} word Position of current word in current phrase # of {preceding, succeeding} content words in current phrase # of words {from previous, to next} content word

Phrase # of syllables in {preceding, current, succeeding} phrase

…..

Page 13: Hidden Markov Model (HMM) based Speech Synthesis for Urdu ...cle.org.pk/research/presentations/Hidden Markov... · Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language

Addition of Stress/Syllable Layer:

1313

• Added layers: Stress Syllable

Page 14: Hidden Markov Model (HMM) based Speech Synthesis for Urdu ...cle.org.pk/research/presentations/Hidden Markov... · Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language

Unbalanced Training data:

1414

0200400600800

100012001400160018002000

A

A_A

A_Y R K

I_I H N M S

T_D

L B I

O_O

D_D U P G

D_Z

J

A_E V

A_Y

_N

O

T_S

U_U

Z

O_O

_N

T_D

_H

R_R

K_H

S_H

F X Q T

T_S

_H

B_H

E

A_A

_N

A_E

_N

I_I_

N

Figure. Phoneme Coverage for the 30-min speech data

• High occurrence for vowels

• Some of the phonemes were completely ignored {J_H, L_H, M_H, N_G_H, R_H, Y, Z_Z} [2]

Page 15: Hidden Markov Model (HMM) based Speech Synthesis for Urdu ...cle.org.pk/research/presentations/Hidden Markov... · Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language

Context Clustering (Question Set):

1515

• Number of possible combinations are quite enormous with 53 different questions.

• Possible contexts = C n

where C = Total count of basic phonetic units,

n = Total number of Questions

With only Segmental Context (n=5) Possible models are:

665 ≈ 1252 million

• If we consider all the context, it will be practically infinite.

Solution:

• Record data having maximum phoneme coverage at tri-phone or di-phone level.

• Apply context clustering technique to classify and share acoustically similar models

Page 16: Hidden Markov Model (HMM) based Speech Synthesis for Urdu ...cle.org.pk/research/presentations/Hidden Markov... · Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language

Subjective Evaluation:

1616

• Testing Methodology: Mean Opinion Score (MOS)[3] for:

Naturalness Intelligibility

• Naturalness:

How close it seems to be produced by a human?

• Intelligibility:

How much conveniently the word was recognized?

Page 17: Hidden Markov Model (HMM) based Speech Synthesis for Urdu ...cle.org.pk/research/presentations/Hidden Markov... · Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language

Subjective Testing (Results):

1717

ListenerType

MOSNaturalness

MOSIntelligibility

Technical 1 3.23 3.65

Linguistic 1 2.82 3.66

Linguistic 2 2.86 3.58

Linguistic 3 3.48 3.52

Table 1. Mean Opinion Score (MOS) results of four listeners

Page 18: Hidden Markov Model (HMM) based Speech Synthesis for Urdu ...cle.org.pk/research/presentations/Hidden Markov... · Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language

Erroneous words:

1818

NastaliqueStyle

CISAMPA(Correct)

Listened(Incorrect)

Coverage (%)

طرف T_DARAF T_DALAF 5.92

گا GA_A D_DA_A 1.35

معلوم MAYLU_UM MAT_DLU_UM 0.00

تھے T_D_HA_Y T_SA_Y 0.66

رزی RAZI_I RAD_DI_I 0.88

کیونکہ KIU_U_NKA_Y T_SU_NKA_Y 0.15

حق HAQ HABS 0.46

بعد BAYD_D BAD_D 0.00

خیال XAJA_AL FIJA_AL 0.50

Table 2. Synthesized words with errors

Page 19: Hidden Markov Model (HMM) based Speech Synthesis for Urdu ...cle.org.pk/research/presentations/Hidden Markov... · Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language

Some Synthesized Examples:

1919

Seen Context:

Un-seen Context:

Different Carrier Word:

Synthesized: Training Set:

Page 20: Hidden Markov Model (HMM) based Speech Synthesis for Urdu ...cle.org.pk/research/presentations/Hidden Markov... · Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language

Summary:

2020

• Text to Speech Synthesis (TTS): Concatenative Parametric (Hmm based)

• Requirement for Voice building Annotated speech corpus Speech features Question file

• Challenges Full context style labels Addition of prosodic layers Question file for context clustering

• Subjective Evaluation

• Erroneous words

Page 21: Hidden Markov Model (HMM) based Speech Synthesis for Urdu ...cle.org.pk/research/presentations/Hidden Markov... · Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language
Page 22: Hidden Markov Model (HMM) based Speech Synthesis for Urdu ...cle.org.pk/research/presentations/Hidden Markov... · Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language

References:

2222

• 1. H. Zen, T. Nose, J. Yamagishi, S. Sako, T. Masuko, A. Black and K. Tokuda, "The HMM-based speech synthesis system (HTS) version 2.0," in proc. of Sixth ISCA Workshop on Speech Synthesis, Bonn, Germany, August, 2007.

• 2. “IPA to CISAMPA Conversion Chart," Center for Language Engineering, UET, Lahore, [Online]. Available: http://www.cle.org.pk/resources/CISAMPA.pdf. [Accessed 3 March 2014].

• 3. M. Viswanathan and M. Viswanathan, "Measuring speech quality for text-to-speech systems: development and assessment of a modified mean opinion score (MOS) scale," Computer Speech & Language, vol. 19, no. 1, pp. 55-83, 2005.


Recommended