Date post: | 23-Dec-2015 |
Category: |
Documents |
Upload: | cecil-ross |
View: | 229 times |
Download: | 0 times |
A Text-to-Speech Synthesis SystemA Text-to-Speech Synthesis System
Presented By:
Michael Beddaoui
Abdel-Aziz El-Solh
Presentation OutlinePresentation Outline
Introduction Background 3 Components of TTS System
– Text Pre-processing Aziz– Prosody Mike– Concatenation Mike
Summary What has been done / Future Work Conclusion Questions
What is a TTS System?What is a TTS System?
Definition: A system which takes as input a sequence of words and
converts them to speech
Applications: Services for the hearing impaired Reading email aloud
Commercial TTS Systems: Festival Bell Labs TTS
Different TTS SystemsDifferent TTS Systems
Phonemes are:– The minimal distinctive phonetic units– Relatively small in number (39 phonemes in
English)
Disadvantage:
Phonemes ignore transitional sound !!!
Phoneme-Based TTS System
Different TTS Systems (cont’d)Different TTS Systems (cont’d)
Disadvantage:
Over 1500 diphones in the English language !!!
Diphone-Based TTS System
Diphones are:– Made up of 2 phonemes– Incorporate transitional sound– Make for better sounding speech
TTS System
Fundamental ComponentsFundamental Components
TextPre-processing
Prosody Concatenationwords
Text Pre-ProcessingText Pre-Processing
Input– String of characters (sentence)
Output– String of diphone symbols
Objective– Perform sentence level analysis
• Punctuation marks• Pauses between words
– Convert all input to corresponding diphones
Text Pre-Processing (Block Diagram)Text Pre-Processing (Block Diagram)
WordSegmenter
AcronymConverter
NumberConverter
Word to Diphone
Translator (Phonetization)
DiphoneDictionary
MLDSNumber
Converter
Number ConverterNumber Converter
Replace numerals with their textual versions
100 one hundred
Handle fractional and decimal numbers
0.25 point two five
Text Pre-Processing (Block Diagram)Text Pre-Processing (Block Diagram)
WordSegmenter
AcronymConverter
NumberConverter
Word to Diphone
Translator (Phonetization)
DiphoneDictionary
MLDSAcronymConverter
Acronym ConverterAcronym Converter
Replace acronyms with single letter components
A.B.C. A B C
Change abbreviations to full textual format
Mr. Mister
Text Pre-Processing (Block Diagram)Text Pre-Processing (Block Diagram)
WordSegmenter
AcronymConverter
NumberConverter
Word to Diphone
Translator (Phonetization)
DiphoneDictionary
MLDSWord
Segmenter
Word SegmenterWord Segmenter
Divide sentence into word segments– Special delimiter to separate segments
(i.e. ‘||’)Segments can be:– A single word– An acronym– A numeral
Identify punctuation marks
Text Pre-Processing (Block Diagram)Text Pre-Processing (Block Diagram)
WordSegmenter
AcronymConverter
NumberConverter
Word to Diphone
Translator (Phonetization)
DiphoneDictionary
MLDSWord to Diphone
Translator (Phonetization)
Word To Diphone Converter Word To Diphone Converter (Phonetization)(Phonetization)Purpose– Translate words to their diphone
representationsResource– Dictionary of words and their diphones
(derived from CMU phoneme database)– Over 175,000 words supported
W-to-D Converter Cont’dW-to-D Converter Cont’d
Implementation– Binary Search Algorithm in C– Start with whole dictionary as search range
start index, end index, middle index– If target word alphabetically less then middle
word, then ignore second half (i.e. end index = middle index)
else ignore first half (i.e. start index = middle index)
– Repeat until word found or range contains zero words
W-to-D Converter Cont’dW-to-D Converter Cont’d
Advantages– Fast search times• Search range decreases exponentially with
each iteration (max of 1 sec currently)
– Less complicated to implement• Compared to indexing dictionary or• Importing the dictionary to an internal
structure
Text Pre-Processing (Block Diagram)Text Pre-Processing (Block Diagram)
WordSegmenter
AcronymConverter
NumberConverter
Word to Diphone
Translator (Phonetization)
DiphoneDictionary
MLDSMLDS
The Multi-Level Data StructureThe Multi-Level Data Structure
Contains all necessary data for the next sub-system:–Word– Diphone representation– Prosodic parameters for each diphone• This reflects both word-level and sentence-
level prosody
Allows for modularization
ProsodyProsody
DiphoneRetrieval
ConcatenationAcousticManipulation
DiphoneDatabase
MLDS
done
yes
no
Diphone RetrievalDiphone Retrieval
Database of recorded diphonesEvery diphone matched with txt file– Distinguished by type (CC, CV, VC, VV)– References to specific components
within waveformStore diphone waveform and
prosodic parameters in variables
Properties of Speech SignalsProperties of Speech Signals
c a t
PeriodicNon-Periodic
Non-Periodic
eg. cat.wav
Acoustic ManipulationAcoustic Manipulation - - MATLabMATLab
Recognizes wave files (.WAV)– load, play, write
Vast array of signal processing toolsBuilt-in functionsEase of debuggingGUI-capable
Pitch/Duration/Amplitude AlterationPitch/Duration/Amplitude Alteration
Pitch – vowels only
As pitch increases, pitch period shrinks As pitch decreases, pitch period expands Need to alter length between pitch marks
in order to alter pitch of speech signal
Altering PitchAltering Pitch
X
Hanningwindow
=
Original diphone Extractedpitch period
Hannedpitch period‘C_A’
PSOLA – Pitch Synchronous Overlap and Add
=
Altering Pitch Cont’dAltering Pitch Cont’d
50% Overlap + Add
Pitch Up > 50%Pitch Down < 50%
Altering Pitch Cont’dAltering Pitch Cont’d
X
=
Kaiserwindow
-naturally spokenvowels contain 12-18pitch marks
X 12
Altering DurationAltering Duration
Increase number of PSOLA iterations (overlaps) to increase duration
Decrease number of PSOLA iterations (overlaps) to decrease duration
Altering AmplitudeAltering AmplitudeMultiplying the signal by a constantIf constant > 1, amplitude increaseIf constant < 1, amplitude decrease
ConcatenationConcatenation
Diphones Words
Using PSOLA at the joining ends Ensures smooth transition
Words Sentence
Straight joining at the end points due to presence of pauses
SummarySummary
TTS System
TextPre-processing
Prosody Concatenationwords
System modularized
ProgressProgress
Work Completed / Current Status– Text pre-processing and prosodic manipulation for a multi-
syllable word– Diphone concatenation– 200+ diphones in database– Fully functional GUI implemented
Work To Be Done– Sentence level synthesis– Expand diphone database– Fine-tuning and enhancing– Prepare for Poster Fair– Write final report
Questions?Questions?
Contact Information
Michael Beddaoui
Abdel-Aziz El-Solh