A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

A Text-to-Speech Synthesis SystemA Text-to-Speech Synthesis System

Presented By:

Michael Beddaoui

Abdel-Aziz El-Solh

Presentation OutlinePresentation Outline

Introduction Background 3 Components of TTS System

– Text Pre-processing Aziz– Prosody Mike– Concatenation Mike

Summary What has been done / Future Work Conclusion Questions

What is a TTS System?What is a TTS System?

Definition: A system which takes as input a sequence of words and

converts them to speech

Applications: Services for the hearing impaired Reading email aloud

Commercial TTS Systems: Festival Bell Labs TTS

Different TTS SystemsDifferent TTS Systems

Phonemes are:– The minimal distinctive phonetic units– Relatively small in number (39 phonemes in

English)

Disadvantage:

Phonemes ignore transitional sound !!!

Phoneme-Based TTS System

Different TTS Systems (cont’d)Different TTS Systems (cont’d)

Disadvantage:

Over 1500 diphones in the English language !!!

Diphone-Based TTS System

Diphones are:– Made up of 2 phonemes– Incorporate transitional sound– Make for better sounding speech

TTS System

Fundamental ComponentsFundamental Components

TextPre-processing

Prosody Concatenationwords

Text Pre-ProcessingText Pre-Processing

Input– String of characters (sentence)

Output– String of diphone symbols

Objective– Perform sentence level analysis

• Punctuation marks• Pauses between words

– Convert all input to corresponding diphones

Text Pre-Processing (Block Diagram)Text Pre-Processing (Block Diagram)

WordSegmenter

AcronymConverter

NumberConverter

Word to Diphone

Translator (Phonetization)

DiphoneDictionary

MLDSNumber

Converter

Number ConverterNumber Converter

Replace numerals with their textual versions

100 one hundred

Handle fractional and decimal numbers

0.25 point two five


WordSegmenter

AcronymConverter

NumberConverter

Word to Diphone


DiphoneDictionary

MLDSAcronymConverter

Acronym ConverterAcronym Converter

Replace acronyms with single letter components

A.B.C. A B C

Change abbreviations to full textual format

Mr. Mister


WordSegmenter

AcronymConverter

NumberConverter

Word to Diphone


DiphoneDictionary

MLDSWord

Segmenter

Word SegmenterWord Segmenter

Divide sentence into word segments– Special delimiter to separate segments

(i.e. ‘||’)Segments can be:– A single word– An acronym– A numeral

Identify punctuation marks


WordSegmenter

AcronymConverter

NumberConverter

Word to Diphone


DiphoneDictionary

MLDSWord to Diphone


Word To Diphone Converter Word To Diphone Converter (Phonetization)(Phonetization)Purpose– Translate words to their diphone

representationsResource– Dictionary of words and their diphones

(derived from CMU phoneme database)– Over 175,000 words supported

W-to-D Converter Cont’dW-to-D Converter Cont’d

Implementation– Binary Search Algorithm in C– Start with whole dictionary as search range

start index, end index, middle index– If target word alphabetically less then middle

word, then ignore second half (i.e. end index = middle index)

else ignore first half (i.e. start index = middle index)

– Repeat until word found or range contains zero words

W-to-D Converter Cont’dW-to-D Converter Cont’d

Advantages– Fast search times• Search range decreases exponentially with

each iteration (max of 1 sec currently)

– Less complicated to implement• Compared to indexing dictionary or• Importing the dictionary to an internal

structure


WordSegmenter

AcronymConverter

NumberConverter

Word to Diphone


DiphoneDictionary

MLDSMLDS

The Multi-Level Data StructureThe Multi-Level Data Structure

Contains all necessary data for the next sub-system:–Word– Diphone representation– Prosodic parameters for each diphone• This reflects both word-level and sentence-

level prosody

Allows for modularization

ProsodyProsody

DiphoneRetrieval

ConcatenationAcousticManipulation

DiphoneDatabase

MLDS

done

yes

no

Diphone RetrievalDiphone Retrieval

Database of recorded diphonesEvery diphone matched with txt file– Distinguished by type (CC, CV, VC, VV)– References to specific components

within waveformStore diphone waveform and

prosodic parameters in variables

Properties of Speech SignalsProperties of Speech Signals

c a t

PeriodicNon-Periodic

Non-Periodic

eg. cat.wav

Acoustic ManipulationAcoustic Manipulation - - MATLabMATLab

Recognizes wave files (.WAV)– load, play, write

Vast array of signal processing toolsBuilt-in functionsEase of debuggingGUI-capable

Pitch/Duration/Amplitude AlterationPitch/Duration/Amplitude Alteration

Pitch – vowels only

As pitch increases, pitch period shrinks As pitch decreases, pitch period expands Need to alter length between pitch marks

in order to alter pitch of speech signal

Altering PitchAltering Pitch

X

Hanningwindow

=

Original diphone Extractedpitch period

Hannedpitch period‘C_A’

PSOLA – Pitch Synchronous Overlap and Add

=

Altering Pitch Cont’dAltering Pitch Cont’d

50% Overlap + Add

Pitch Up > 50%Pitch Down < 50%

Altering Pitch Cont’dAltering Pitch Cont’d

X

=

Kaiserwindow

-naturally spokenvowels contain 12-18pitch marks

X 12

Altering DurationAltering Duration

Increase number of PSOLA iterations (overlaps) to increase duration

Decrease number of PSOLA iterations (overlaps) to decrease duration

Altering AmplitudeAltering AmplitudeMultiplying the signal by a constantIf constant > 1, amplitude increaseIf constant < 1, amplitude decrease

ConcatenationConcatenation

Diphones Words

Using PSOLA at the joining ends Ensures smooth transition

Words Sentence

Straight joining at the end points due to presence of pauses

SummarySummary

TTS System

TextPre-processing

Prosody Concatenationwords

System modularized

ProgressProgress

Work Completed / Current Status– Text pre-processing and prosodic manipulation for a multi-

syllable word– Diphone concatenation– 200+ diphones in database– Fully functional GUI implemented

Work To Be Done– Sentence level synthesis– Expand diphone database– Fine-tuning and enhancing– Prepare for Poster Fair– Write final report

Questions?Questions?

Contact Information

Michael Beddaoui

Abdel-Aziz El-Solh

[email protected]

[email protected]

Date post:	23-Dec-2015
Category:	Documents
Upload:	cecil-ross
View:	229 times
Download:	0 times

A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Documents