+ All Categories
Home > Technology > MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling &...

MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling &...

Date post: 13-Dec-2014
Category:
Upload: vladimir-kulyukin
View: 171 times
Download: 1 times
Share this document with a friend
Description:
 
45
MobAppDev Text-To-Speech Synthesis Vladimir Kulyukin
Transcript
Page 1: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

MobAppDev

Text-To-Speech Synthesis

Vladimir Kulyukin

Page 2: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

Outline

● Text-to-Speech Synthesis (TTS)● TTS on Android● TTS Customization● Overcoming TTS Limitations with Phonetic Spelling & Human

Recording

Page 3: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

Review

Page 4: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

TTS: Text To Speech

● The General Problem: Take a sequence of characters and generate a waveform

● Words are pronounced as a sequence of individual units called phones

● Phonetic alphabets describe how phones are pronounced● Phonological rules specify how phones combine into

speech

Page 5: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

TTS Engine Anatomy

● A typical TTS engine consists of three components: text analyzer, language analyzer, waveform generator

● Text Analysis – parse text (after transliterating it if necessary) and identify words and utterances

● Linguistic Analysis – identify phrases and assign prosodies (accents, emphasis, duration, pauses, etc)

● Waveform Generation - generate a waveform from a fully specified linguistic description

Page 6: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

TTS Approaches

● Full Automation – machine does everything● Mixed Initiative – human records a set of known

texts; machine learning is used to extract the rules● Human-Based Recording – human records

words/sentences/texts; machine plays them as needed

Page 7: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

TTS on Android

Page 8: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

Android TTS

● Android TTS is an multi-lingual speech synthesis engine

● Android TTS can be used as a black box: text in, speech out

● Android TTS can be parameterized

Page 9: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

Starting TTS

● It is best practice to check if TTS is available on the device

● This is done via Intent to check TTS data● If the check is successful, a instance of TTS can be

created● Activity (or some other component) that uses TTS

implements OnInitListener interface

Page 10: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

Overriding onPause() and onDestroy()

● When your Activity is paused (e.g., it loses focus), have TTS stop synthesizing

● When your Activity is destroyed, shut TTS down to notify Android that the resources can be released and given to other activities or applications

Page 11: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

TTS Customization

Page 12: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

Overcoming TTS Limitations

● Every TTS engine mispronounces some words (one can think of it as a fundamental theorem of TTS)

● There are two ways of overcoming this limitation: Phonetic spelling: spell mispronounced words the way they

sound, generate waveforms, associate words with wave-forms, & save them

Human recording: have a human record mispronounced words, save them in audio files, and use those files

Page 13: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

Audio Dictionary Application

● Develop an application that allows the user to create an audio dictionary of phonetically spelled words if their accurate spellings are mispronounced by the TTS engine

● The application allows the user to spell words as they are pronounced

● The phonetic words are converted into wav files by the TTS engine and saved on the device's sdcard

● The saved wav files are associated with the correct spelling

Page 14: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

Audio Dictionary Application Screenshot

Page 15: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

Implementation

source code is herehttps://github.com/VKEDCO/TTSOnAndroid/blob/master/AudioDictionaryViaSpeechSynthesis.zip?raw=true

Page 16: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

Storing Files on SDCard

● Create a directory on the device's sdcard (manually or programmatically)● If you are using Eclipse:

open the DDMS perspective click on the device's name in the Devices panel on the left click on the File Explorer perspective on the the right go to /storage/sdcard and create a folder (e.g., my_audio_files)

● You can do the same steps on your Android device by connecting it to your computer a storage device with a USB cable

Page 17: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />

<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />

Setting Reading & Writing Permissions in AndroidManfist.xml

Page 18: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

// Initialize TTS on onCreate() of the main activity

String mSDCardFolder = null;

public onCreate(Bundle savedInstance) {

// Do the GUI stuff here & TTS initialization

mSDCardFolder = Environment.getExternalStorageDirectory() + "/phonetic_spelling/";

}

Setting the External Storage Directory

Page 19: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

public class AudioDictionaryAct extends Activity

implements OnInitListener {

// If TTS is initialized successfully, enable the Speak and

// Record buttons

public void onInit(int status) {

if ( status == TextToSpeech.SUCCESS ) {

btnSpeak.setEnabled(true);

btnRecord.setEnabled(true);

}

}}

Implement OnInitListener in the Main Activity

Page 20: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

// Initialize TTS on onCreate() of the main activity

public onCreate(Bundle savedInstance) {

// Do the GUI stuff here

Intent checkIntent = new Intent();

checkIntent.setAction(TextToSpeech.Engine.ACTION_CHECK_TTS_DATA);

startActivityForResult(checkIntent, REQ_TTS_STATUS_CHECK);

}

TTS Initialization

Page 21: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

TextToSpeech mTTS = null;

protected void onActivityResult(int requestCode, int resultCode, Intent data) {

if ( requestCode == REQ_TTS_STATUS_CHECK ) {

switch ( resultCode ) {

case TextToSpeech.Engine.CHECK_VOICE_DATA_PASS:

mTTS = new TextToSpeech(this, this); Log.v(TAG, TTS_INSTALLED_MSG); break;

case TextToSpeech.Engine.CHECK_VOICE_DATA_FAIL:

Log.v(TAG, INSTALL_TTS_DATA_MSG + resultCode);

Intent installTTSDataIntent = new Intent();

installTTSDataIntent.setAction(TextToSpeech.Engine.ACTION_INSTALL_TTS_DATA);

startActivity(installTTSDataIntent);

default: Log.e(TAG, TTS_UNAVAILABLE_MSG);

}}}

TTS Initialization

Page 22: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

Button Logic

Page 23: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

btnSpeak = (Button)findViewById(R.id.btnSpeak);

btnSpeak.setOnClickListener(new OnClickListener() {

public void onClick(View view) {

mTTS.speak(edTxtPhoneticSpelling.getText().toString(), TextToSpeech.QUEUE_ADD, null);

}

});

Speak

Page 24: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

btnRecord = (Button)findViewById(R.id.btnRecord);

btnRecord.setOnClickListener(new OnClickListener() {

public void onClick(View view) {

soundFilename = mSDCardFolder + edTxtUserFileName.getText().toString();

soundFile = new File(soundFilename);

if (soundFile.exists()) { soundFile.delete(); }

if (mTTS.synthesizeToFile(edTxtPhoneticSpelling.getText().toString(), null,

soundFilename) == TextToSpeech.SUCCESS ) {

btnPlay.setEnabled(true);

btnAssociate.setEnabled(true);

}}});

Record

Page 25: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

btnPlay = (Button)findViewById(R.id.btnPlay);

btnPlay.setOnClickListener( new OnClickListener() {

public void onClick(View view) {

try {

Log.v("AUDIODICTIONARY", soundFilename);

mPlayer = new MediaPlayer();

mPlayer.setDataSource(soundFilename);

mPlayer.prepare();

mPlayer.start();

}

catch (Exception e) { // handle exception }}

});

Play

Page 26: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

btnAssociate = (Button)findViewById(R.id.btnAssociate);

btnAssociate.setOnClickListener(new OnClickListener() {

public void onClick(View view) {

mTTS.addSpeech(edTxtRealSpelling.getText().toString(), soundFilename);

}

});

Associate Audio with Spelling

Page 27: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

Overcoming TTS Limitationsthrough

Human Recording

Page 28: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

What Is This?

Page 29: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

Bhagavatgita, Verse 1

Page 30: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

dharmakshetre kurukshetre samaveta yuyutsavah

mamakah pandavashcaiva kim akurvata sanjaya

Bhagavatgita, V. 1 Transliterated

Page 31: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

Что свершали, - скажи Санджая, -

сыновья мои и Пандавы,

ради битвы сойдясь на поле

Kурукшетры, на поле дхармы?

Перевод В.С. Семенцова

What Is This?

Page 32: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

Что свершали, - скажи Санджая, -

сыновья мои и Пандавы,

ради битвы сойдясь на поле

Kурукшетры, на поле дхармы?

Перевод В.С. Семенцова

The Russian Translation of Bhagavatgita V. 1

Page 33: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

Chto svershili, - skazhi Sandzhaya, -

synovya moi i Pandavy,

radi bitvy soydyas' na pole

Kurukshetry, nа pоlе dharmy?

Translated by V.S. Sementsov

Transliteration of Russian Translation

Page 34: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

Oh, Sanjaya, tell me what happened atKurukshetra, the field of dharma, where myfamily and the Pandavas gathered to fight?

Translated by Eknath Easwaran

English Translation of Bhagavatgita, V. 1

Page 35: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

TTS Bhagavatgita Project

source code is herehttps://github.com/VKEDCO/TTSOnAndroid/blob/master/BhagavatGitaTTS_v43.zip

Page 36: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

The Problem

Have your Android device read the first verse of Bhagavatgita in Sanskrit, Russian, & English.

Page 37: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

Sample Screenshot

Page 38: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

Logical Steps of a Solution

● Write a Devanagari transliterator that takes Sanskrit texts and produces their Latin transliterations

● Write a Cyrillic transliterator that takes Russian texts and produces their Latin transliterations

● Have human readers record Sanskrit and Russian words● Associate strings with specific recordings

Page 39: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

Real Steps

● We will skip transliterator implementation (quite likely an M.S./Ph.D. type of project)

● Record .wav files & save them on SD card● Associate .wav files with specific strings● Have the TTS engine load those strings from SD card

at run time

Page 40: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

mTTS.addSpeech("sn_akurvata", snPath + "sn_akurvata.wav");

mTTS.addSpeech("sn_dharmakshetre", snPath + "sn_dharmakshetre.wav");

mTTS.addSpeech("sn_kim", snPath + "sn_kim.wav");

mTTS.addSpeech("sn_kurukshetre", snPath + "sn_kurukshetre.wav");

mTTS.addSpeech("sn_mamakah", snPath + "sn_mamakah.wav");

mTTS.addSpeech("sn_pandavashcaiva", snPath + "sn_pandavashcaiva.wav");

mTTS.addSpeech("sn_samaveta", snPath + "sn_samaveta.wav");

mTTS.addSpeech("sn_samjaya", snPath + "sn_samjaya.wav");

mTTS.addSpeech("sn_yuyutsavah", snPath + "sn_yuyutsavah.wav");

Adding Sanskrit to TTS Engine

Page 41: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

mTTS.addSpeech("ru_bitvy", ruPath + "ru_bitvy.wav");

mTTS.addSpeech("ru_chto", ruPath + "ru_chto.wav");

mTTS.addSpeech("ru_dharmy", ruPath + "ru_dharmy.wav");

mTTS.addSpeech("ru_i", ruPath + "ru_i.wav");

mTTS.addSpeech("ru_kurukshetry", ruPath + "ru_kurukshetry.wav");

mTTS.addSpeech("ru_moi", ruPath + "ru_moi.wav");

mTTS.addSpeech("ru_na", ruPath + "ru_na.wav");

mTTS.addSpeech("ru_pandavy", ruPath + "ru_pandavy.wav");

mTTS.addSpeech("ru_pole", ruPath + "ru_pole.wav");

Adding Russian to TTS Engine

Page 42: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

final static String SN_PREFIX = "sn_";

public void saySanskritWords() {

for(String w: mSNWords) speakWord(SN_PREFIX + w);

}

final static String RU_PREFIX = "ru_";

public void sayRussianWords() {

for(String w: mRUWords) speakWord(RU_PREFIX + w);

}

Speaking Sanskrit & Russian

Page 43: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

public void speakWord(String word) {

mTTS.speak(word, TextToSpeech.QUEUE_ADD, null);

}

Speaking Sanskrit & Russian

Page 44: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

Storing Audio Files on SDCard

● Create a folder on the sdcard called /bhagavatgita in the folder given as the output value of the call Environment.getExternalStorageDirectory().getPath()

● Create two subfolders /bhagavatgita/sn/ and /bhagavatgita/ru/

● Place the audio files from this zip archive into the the appropriate folders● Here is the full link to the above zip archive:

https://github.com/VKEDCO/TTSOnAndroid/blob/master/bhagavatgita.zip

Page 45: MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling & Human Recording

References & Reading Suggestions

● http://developer.android.com/reference/android/speech/tts/TextToSpeech.html


Recommended