MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling &...

transcript

MobAppDev

Text-To-Speech Synthesis

Vladimir Kulyukin

Outline

● Text-to-Speech Synthesis (TTS)● TTS on Android● TTS Customization● Overcoming TTS Limitations with Phonetic Spelling & Human

Recording

Review

TTS: Text To Speech

● The General Problem: Take a sequence of characters and generate a waveform

● Words are pronounced as a sequence of individual units called phones

● Phonetic alphabets describe how phones are pronounced● Phonological rules specify how phones combine into

speech

TTS Engine Anatomy

● A typical TTS engine consists of three components: text analyzer, language analyzer, waveform generator

● Text Analysis – parse text (after transliterating it if necessary) and identify words and utterances

● Linguistic Analysis – identify phrases and assign prosodies (accents, emphasis, duration, pauses, etc)

● Waveform Generation - generate a waveform from a fully specified linguistic description

TTS Approaches

● Full Automation – machine does everything● Mixed Initiative – human records a set of known

texts; machine learning is used to extract the rules● Human-Based Recording – human records

words/sentences/texts; machine plays them as needed

TTS on Android

Android TTS

● Android TTS is an multi-lingual speech synthesis engine

● Android TTS can be used as a black box: text in, speech out

● Android TTS can be parameterized

Starting TTS

● It is best practice to check if TTS is available on the device

● This is done via Intent to check TTS data● If the check is successful, a instance of TTS can be

created● Activity (or some other component) that uses TTS

implements OnInitListener interface

Overriding onPause() and onDestroy()

● When your Activity is paused (e.g., it loses focus), have TTS stop synthesizing

● When your Activity is destroyed, shut TTS down to notify Android that the resources can be released and given to other activities or applications

TTS Customization

Overcoming TTS Limitations

● Every TTS engine mispronounces some words (one can think of it as a fundamental theorem of TTS)

● There are two ways of overcoming this limitation: Phonetic spelling: spell mispronounced words the way they

sound, generate waveforms, associate words with wave-forms, & save them

Human recording: have a human record mispronounced words, save them in audio files, and use those files

Audio Dictionary Application

● Develop an application that allows the user to create an audio dictionary of phonetically spelled words if their accurate spellings are mispronounced by the TTS engine

● The application allows the user to spell words as they are pronounced

● The phonetic words are converted into wav files by the TTS engine and saved on the device's sdcard

● The saved wav files are associated with the correct spelling

Audio Dictionary Application Screenshot

Implementation

source code is herehttps://github.com/VKEDCO/TTSOnAndroid/blob/master/AudioDictionaryViaSpeechSynthesis.zip?raw=true

Storing Files on SDCard

● Create a directory on the device's sdcard (manually or programmatically)● If you are using Eclipse:

open the DDMS perspective click on the device's name in the Devices panel on the left click on the File Explorer perspective on the the right go to /storage/sdcard and create a folder (e.g., my_audio_files)

● You can do the same steps on your Android device by connecting it to your computer a storage device with a USB cable

<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />

<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />

Setting Reading & Writing Permissions in AndroidManfist.xml

// Initialize TTS on onCreate() of the main activity

String mSDCardFolder = null;

public onCreate(Bundle savedInstance) {

// Do the GUI stuff here & TTS initialization

mSDCardFolder = Environment.getExternalStorageDirectory() + "/phonetic_spelling/";

Setting the External Storage Directory

public class AudioDictionaryAct extends Activity

implements OnInitListener {

// If TTS is initialized successfully, enable the Speak and

// Record buttons

public void onInit(int status) {

if ( status == TextToSpeech.SUCCESS ) {

btnSpeak.setEnabled(true);

btnRecord.setEnabled(true);

Implement OnInitListener in the Main Activity

// Initialize TTS on onCreate() of the main activity

public onCreate(Bundle savedInstance) {

// Do the GUI stuff here

Intent checkIntent = new Intent();

checkIntent.setAction(TextToSpeech.Engine.ACTION_CHECK_TTS_DATA);

startActivityForResult(checkIntent, REQ_TTS_STATUS_CHECK);

TTS Initialization

TextToSpeech mTTS = null;

protected void onActivityResult(int requestCode, int resultCode, Intent data) {

if ( requestCode == REQ_TTS_STATUS_CHECK ) {

switch ( resultCode ) {

case TextToSpeech.Engine.CHECK_VOICE_DATA_PASS:

mTTS = new TextToSpeech(this, this); Log.v(TAG, TTS_INSTALLED_MSG); break;

case TextToSpeech.Engine.CHECK_VOICE_DATA_FAIL:

Log.v(TAG, INSTALL_TTS_DATA_MSG + resultCode);

Intent installTTSDataIntent = new Intent();

installTTSDataIntent.setAction(TextToSpeech.Engine.ACTION_INSTALL_TTS_DATA);

startActivity(installTTSDataIntent);

default: Log.e(TAG, TTS_UNAVAILABLE_MSG);

TTS Initialization

Button Logic

btnSpeak = (Button)findViewById(R.id.btnSpeak);

btnSpeak.setOnClickListener(new OnClickListener() {

public void onClick(View view) {

mTTS.speak(edTxtPhoneticSpelling.getText().toString(), TextToSpeech.QUEUE_ADD, null);

btnRecord = (Button)findViewById(R.id.btnRecord);

btnRecord.setOnClickListener(new OnClickListener() {

soundFilename = mSDCardFolder + edTxtUserFileName.getText().toString();

soundFile = new File(soundFilename);

if (soundFile.exists()) { soundFile.delete(); }

if (mTTS.synthesizeToFile(edTxtPhoneticSpelling.getText().toString(), null,

soundFilename) == TextToSpeech.SUCCESS ) {

btnPlay.setEnabled(true);

btnAssociate.setEnabled(true);

Record

btnPlay = (Button)findViewById(R.id.btnPlay);

btnPlay.setOnClickListener( new OnClickListener() {

Log.v("AUDIODICTIONARY", soundFilename);

mPlayer = new MediaPlayer();

mPlayer.setDataSource(soundFilename);

mPlayer.prepare();

mPlayer.start();

catch (Exception e) { // handle exception }}

btnAssociate = (Button)findViewById(R.id.btnAssociate);

btnAssociate.setOnClickListener(new OnClickListener() {

mTTS.addSpeech(edTxtRealSpelling.getText().toString(), soundFilename);

Associate Audio with Spelling

Overcoming TTS Limitationsthrough

Human Recording

What Is This?

Bhagavatgita, Verse 1

dharmakshetre kurukshetre samaveta yuyutsavah

mamakah pandavashcaiva kim akurvata sanjaya

Bhagavatgita, V. 1 Transliterated

Что свершали, - скажи Санджая, -

сыновья мои и Пандавы,

ради битвы сойдясь на поле

Kурукшетры, на поле дхармы?

Перевод В.С. Семенцова

What Is This?

Что свершали, - скажи Санджая, -

сыновья мои и Пандавы,

ради битвы сойдясь на поле

Kурукшетры, на поле дхармы?

Перевод В.С. Семенцова

The Russian Translation of Bhagavatgita V. 1

Chto svershili, - skazhi Sandzhaya, -

synovya moi i Pandavy,

radi bitvy soydyas' na pole

Kurukshetry, nа pоlе dharmy?

Translated by V.S. Sementsov

Transliteration of Russian Translation

Oh, Sanjaya, tell me what happened atKurukshetra, the field of dharma, where myfamily and the Pandavas gathered to fight?

Translated by Eknath Easwaran

English Translation of Bhagavatgita, V. 1

TTS Bhagavatgita Project

source code is herehttps://github.com/VKEDCO/TTSOnAndroid/blob/master/BhagavatGitaTTS_v43.zip

The Problem

Have your Android device read the first verse of Bhagavatgita in Sanskrit, Russian, & English.

Sample Screenshot

Logical Steps of a Solution

● Write a Devanagari transliterator that takes Sanskrit texts and produces their Latin transliterations

● Write a Cyrillic transliterator that takes Russian texts and produces their Latin transliterations

● Have human readers record Sanskrit and Russian words● Associate strings with specific recordings

Real Steps

● We will skip transliterator implementation (quite likely an M.S./Ph.D. type of project)

● Record .wav files & save them on SD card● Associate .wav files with specific strings● Have the TTS engine load those strings from SD card

at run time

mTTS.addSpeech("sn_akurvata", snPath + "sn_akurvata.wav");

mTTS.addSpeech("sn_dharmakshetre", snPath + "sn_dharmakshetre.wav");

mTTS.addSpeech("sn_kim", snPath + "sn_kim.wav");

mTTS.addSpeech("sn_kurukshetre", snPath + "sn_kurukshetre.wav");

mTTS.addSpeech("sn_mamakah", snPath + "sn_mamakah.wav");

mTTS.addSpeech("sn_pandavashcaiva", snPath + "sn_pandavashcaiva.wav");

mTTS.addSpeech("sn_samaveta", snPath + "sn_samaveta.wav");

mTTS.addSpeech("sn_samjaya", snPath + "sn_samjaya.wav");

mTTS.addSpeech("sn_yuyutsavah", snPath + "sn_yuyutsavah.wav");

Adding Sanskrit to TTS Engine

mTTS.addSpeech("ru_bitvy", ruPath + "ru_bitvy.wav");

mTTS.addSpeech("ru_chto", ruPath + "ru_chto.wav");

mTTS.addSpeech("ru_dharmy", ruPath + "ru_dharmy.wav");

mTTS.addSpeech("ru_i", ruPath + "ru_i.wav");

mTTS.addSpeech("ru_kurukshetry", ruPath + "ru_kurukshetry.wav");

mTTS.addSpeech("ru_moi", ruPath + "ru_moi.wav");

mTTS.addSpeech("ru_na", ruPath + "ru_na.wav");

mTTS.addSpeech("ru_pandavy", ruPath + "ru_pandavy.wav");

mTTS.addSpeech("ru_pole", ruPath + "ru_pole.wav");

Adding Russian to TTS Engine

final static String SN_PREFIX = "sn_";

public void saySanskritWords() {

for(String w: mSNWords) speakWord(SN_PREFIX + w);

final static String RU_PREFIX = "ru_";

public void sayRussianWords() {

for(String w: mRUWords) speakWord(RU_PREFIX + w);

Speaking Sanskrit & Russian

public void speakWord(String word) {

mTTS.speak(word, TextToSpeech.QUEUE_ADD, null);

Speaking Sanskrit & Russian

Storing Audio Files on SDCard

● Create a folder on the sdcard called /bhagavatgita in the folder given as the output value of the call Environment.getExternalStorageDirectory().getPath()

● Create two subfolders /bhagavatgita/sn/ and /bhagavatgita/ru/

● Place the audio files from this zip archive into the the appropriate folders● Here is the full link to the above zip archive:

https://github.com/VKEDCO/TTSOnAndroid/blob/master/bhagavatgita.zip

References & Reading Suggestions

● http://developer.android.com/reference/android/speech/tts/TextToSpeech.html

MobAppDev (Fall 2014): Text-to-Speech Synthesis, Overcoming TTS Limitations with Phonetic Spelling &...

Technology