+ All Categories
Home > Documents > A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Date post: 23-Dec-2015
Category:
Upload: cecil-ross
View: 229 times
Download: 0 times
Share this document with a friend
Popular Tags:
33
A Text-to-Speech A Text-to-Speech Synthesis System Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh
Transcript
Page 1: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

A Text-to-Speech Synthesis SystemA Text-to-Speech Synthesis System

Presented By:

Michael Beddaoui

Abdel-Aziz El-Solh

Page 2: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Presentation OutlinePresentation Outline

Introduction Background 3 Components of TTS System

– Text Pre-processing Aziz– Prosody Mike– Concatenation Mike

Summary What has been done / Future Work Conclusion Questions

Page 3: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

What is a TTS System?What is a TTS System?

Definition: A system which takes as input a sequence of words and

converts them to speech

Applications: Services for the hearing impaired Reading email aloud

Commercial TTS Systems: Festival Bell Labs TTS

Page 4: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Different TTS SystemsDifferent TTS Systems

Phonemes are:– The minimal distinctive phonetic units– Relatively small in number (39 phonemes in

English)

Disadvantage:

Phonemes ignore transitional sound !!!

Phoneme-Based TTS System

Page 5: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Different TTS Systems (cont’d)Different TTS Systems (cont’d)

Disadvantage:

Over 1500 diphones in the English language !!!

Diphone-Based TTS System

Diphones are:– Made up of 2 phonemes– Incorporate transitional sound– Make for better sounding speech

Page 6: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

TTS System

Fundamental ComponentsFundamental Components

TextPre-processing

Prosody Concatenationwords

Page 7: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Text Pre-ProcessingText Pre-Processing

Input– String of characters (sentence)

Output– String of diphone symbols

Objective– Perform sentence level analysis

• Punctuation marks• Pauses between words

– Convert all input to corresponding diphones

Page 8: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Text Pre-Processing (Block Diagram)Text Pre-Processing (Block Diagram)

WordSegmenter

AcronymConverter

NumberConverter

Word to Diphone

Translator (Phonetization)

DiphoneDictionary

MLDSNumber

Converter

Page 9: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Number ConverterNumber Converter

Replace numerals with their textual versions

100 one hundred

Handle fractional and decimal numbers

0.25 point two five

Page 10: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Text Pre-Processing (Block Diagram)Text Pre-Processing (Block Diagram)

WordSegmenter

AcronymConverter

NumberConverter

Word to Diphone

Translator (Phonetization)

DiphoneDictionary

MLDSAcronymConverter

Page 11: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Acronym ConverterAcronym Converter

Replace acronyms with single letter components

A.B.C. A B C

Change abbreviations to full textual format

Mr. Mister

Page 12: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Text Pre-Processing (Block Diagram)Text Pre-Processing (Block Diagram)

WordSegmenter

AcronymConverter

NumberConverter

Word to Diphone

Translator (Phonetization)

DiphoneDictionary

MLDSWord

Segmenter

Page 13: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Word SegmenterWord Segmenter

Divide sentence into word segments– Special delimiter to separate segments

(i.e. ‘||’)Segments can be:– A single word– An acronym– A numeral

Identify punctuation marks

Page 14: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Text Pre-Processing (Block Diagram)Text Pre-Processing (Block Diagram)

WordSegmenter

AcronymConverter

NumberConverter

Word to Diphone

Translator (Phonetization)

DiphoneDictionary

MLDSWord to Diphone

Translator (Phonetization)

Page 15: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Word To Diphone Converter Word To Diphone Converter (Phonetization)(Phonetization)Purpose– Translate words to their diphone

representationsResource– Dictionary of words and their diphones

(derived from CMU phoneme database)– Over 175,000 words supported

Page 16: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

W-to-D Converter Cont’dW-to-D Converter Cont’d

Implementation– Binary Search Algorithm in C– Start with whole dictionary as search range

start index, end index, middle index– If target word alphabetically less then middle

word, then ignore second half (i.e. end index = middle index)

else ignore first half (i.e. start index = middle index)

– Repeat until word found or range contains zero words

Page 17: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

W-to-D Converter Cont’dW-to-D Converter Cont’d

Advantages– Fast search times• Search range decreases exponentially with

each iteration (max of 1 sec currently)

– Less complicated to implement• Compared to indexing dictionary or• Importing the dictionary to an internal

structure

Page 18: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Text Pre-Processing (Block Diagram)Text Pre-Processing (Block Diagram)

WordSegmenter

AcronymConverter

NumberConverter

Word to Diphone

Translator (Phonetization)

DiphoneDictionary

MLDSMLDS

Page 19: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

The Multi-Level Data StructureThe Multi-Level Data Structure

Contains all necessary data for the next sub-system:–Word– Diphone representation– Prosodic parameters for each diphone• This reflects both word-level and sentence-

level prosody

Allows for modularization

Page 20: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

ProsodyProsody

DiphoneRetrieval

ConcatenationAcousticManipulation

DiphoneDatabase

MLDS

done

yes

no

Page 21: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Diphone RetrievalDiphone Retrieval

Database of recorded diphonesEvery diphone matched with txt file– Distinguished by type (CC, CV, VC, VV)– References to specific components

within waveformStore diphone waveform and

prosodic parameters in variables

Page 22: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Properties of Speech SignalsProperties of Speech Signals

c a t

PeriodicNon-Periodic

Non-Periodic

eg. cat.wav

Page 23: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Acoustic ManipulationAcoustic Manipulation - - MATLabMATLab

Recognizes wave files (.WAV)– load, play, write

Vast array of signal processing toolsBuilt-in functionsEase of debuggingGUI-capable

Page 24: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Pitch/Duration/Amplitude AlterationPitch/Duration/Amplitude Alteration

Pitch – vowels only

As pitch increases, pitch period shrinks As pitch decreases, pitch period expands Need to alter length between pitch marks

in order to alter pitch of speech signal

Page 25: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Altering PitchAltering Pitch

X

Hanningwindow

=

Original diphone Extractedpitch period

Hannedpitch period‘C_A’

Page 26: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

PSOLA – Pitch Synchronous Overlap and Add

=

Altering Pitch Cont’dAltering Pitch Cont’d

50% Overlap + Add

Pitch Up > 50%Pitch Down < 50%

Page 27: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Altering Pitch Cont’dAltering Pitch Cont’d

X

=

Kaiserwindow

-naturally spokenvowels contain 12-18pitch marks

X 12

Page 28: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Altering DurationAltering Duration

Increase number of PSOLA iterations (overlaps) to increase duration

Decrease number of PSOLA iterations (overlaps) to decrease duration

Altering AmplitudeAltering AmplitudeMultiplying the signal by a constantIf constant > 1, amplitude increaseIf constant < 1, amplitude decrease

Page 29: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

ConcatenationConcatenation

Diphones Words

Using PSOLA at the joining ends Ensures smooth transition

Words Sentence

Straight joining at the end points due to presence of pauses

Page 30: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

SummarySummary

TTS System

TextPre-processing

Prosody Concatenationwords

System modularized

Page 31: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

ProgressProgress

Work Completed / Current Status– Text pre-processing and prosodic manipulation for a multi-

syllable word– Diphone concatenation– 200+ diphones in database– Fully functional GUI implemented

Work To Be Done– Sentence level synthesis– Expand diphone database– Fine-tuning and enhancing– Prepare for Poster Fair– Write final report

Page 32: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Questions?Questions?

Contact Information

Michael Beddaoui

Abdel-Aziz El-Solh

[email protected]

[email protected]

Page 33: A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh.

Recommended