Speech System for Dumb People

Project by:Jaspreet Singh Walia, 10105030

Lovish Choudhary, 10105035Gunjan Sharma, 10105074Arvinder Singh, 09105070

Speech System for Dumb People(Using LabVIEW)

plus many more applications…

Project Submitted to :

Mr. Sukhwinder Singh

AcknowledgementThis project report has been written in simple, lucid and direct language. This project report contains detailed description of our project i.e. Speech System for Dumb people. The project report contains ample amount of pictures and diagrams to support the text.Our thanks are due to Sukhwinder Singh sir for extending all cooperation during the preparation of the project and correcting all our mistakes and short comings.

What is Text-to-Speech?

Text to speech is the automated synthesis of speech from text. The heart of the system is the text to speech engine – a sophisticated piece of software that:

parses the text input,

analyzes its grammar, sentence structure, punctuation and capitalization, and

activates voice simulations to produce a vocal rendering of the text.

Advances in Synthetic Voice

• Advances in text to speech technology have replaced the old robotic computer voices with new, amazingly natural and realistic ones.

• Synthesized from real voice talents, these remarkable text to speech voices can read books aloud beautifully without a mistake, guided only by grammar, sentence structure and punctuation.

Uses Real Voices

• Recently, voices that use the concatenation method have become commercially available: the voice of a real human speaker is divided into phonemes, which are stored in the voice file.

• In a particular application, the text to speech engine assembles the phonemes according to the input text to reconstruct the original human voice to speak the text. Because a real human voice is used, it is sometimes hard to tell the difference between it and the real thing.

Motivation

With computing devices becoming smaller, cheap and embedded

interaction with them will increasingly shift to the speech modality.

The major advantage in using speech based interfaces is that a person unable to read can access and interact with such devices and can learn many things which are beyond their limit.

This includes a large number of the handicapped and very young and not so literate. Hindi is the national language which is spoken in most of parts of the country and most of the people live in the villages and are literate.

A text to speech (TTS) device is useful for speech based interfaces, voice response based applications (e.g. IVR applications) and as readers for the vocally or visually handicapped.

Application:

TTS systems have numerous potential applications. Few are listed below.

i. In telecommunication service: most of the calls required very less connectivity, TTS systems are show huge presence in telecommunication services by making it possible to access textual information over the phone.

ii. In e-governance service: TTS can be very helpful by providing government policy information over the phone, polling centre information, land records information, mandi prices, application tracking and monitoring etc.

iii. Aid to disabilities: TTS can give invaluable support to voice handicapped individuals with the help of an

especially design keyboards and fast sentence assembling program, also helpful for visually handicapped.

iv. Voice browsing: TTS is the backbone of voice browsers, which can be controlled by voice instead of by mouse and keyboard, thus allowing hands-free and eyes free browsing.

v. Vocal monitoring: At times oral information is supposed to be more efficient than its written counterpart. Hence, the idea of incorporating speech synthesizers in the measurement or control systems, like cockpits to prevent pilots from being overwhelmed with visual information.

vi. Complex interactive voice response systems: with the support of good quality speech recognisers, speech synthesis systems are able to make complex interactive voice response systems a reality.

vii. Multimedia, man-machine communication: In the long run, the development of high quality TTS systems is necessary step towards more complete means of communication between people and computers. Multimedia is first but promising move in the direction and it includes talking books and toys, mail and document readers.

Stephen Hawking is one of the most famous people using speech synthesis to communicate. He uses the speech synthesiser created by Neospeech

How the

works?

Microsoft Speech SDK

SAPI 5.1

The Microsoft® Speech SDK 5.1 is the developer kit for the Microsoft® Windows environment.

Tools, information, and sample engines and applications are provided to integrate and optimize speech synthesis engines.

In general all versions of the API have been designed such that a software developer can write an application to perform speech recognition and synthesis by using a standard set of interfaces, accessible from a variety of programming languages. In addition, it is possible for a 3rd-party company to produce their own Text-To-Speech engines or adapt existing engines to work with SAPI. In principle, as long as these engines conform to the defined interfaces they can be used instead of the Microsoft-supplied engines.

What can we do with the SDK?

You can use the SDK components and redistributable SAPI/engine run-time to build applications that incorporate speech synthesis.

Automation Support

SAPI 5.1 supports automation. That means languages other than C/C++ may now use SAPI for application development. The languages themselves need to support automation. Common languages which may be used includes Visual Basic, C#, and Jscript.

Speech Components and Services

Included in the Speech API architecture is a collection of speech components for directly managing the audio, training wizard, events, grammar compiler, resources, speech recognition manager, and TTS manager for low-level control and greater flexibility. The Speech API also enables support for running multiple speech-enabled applications.

Major applications using SAPI

Microsoft Windows XP Tablet PC Edition includes SAPI 5.1 and speech recognition engines 6.1 for English, Japanese, and Chinese (simplified and traditional)

Windows Speech Recognition in Windows Vista Microsoft Narrator in Windows 2000 and later

Windows operating systems Microsoft Office XP and Office 2003 Microsoft Excel 2002, Microsoft Excel 2003, and

Microsoft Excel 2007 for speaking spreadsheet data

Microsoft Voice Command for Windows Pocket PC and Windows Mobile

Microsoft Plus! Voice Command for Windows Media Player

Adobe Reader uses voice output to read document content

CoolSpeech, text-to-speech application that reads text aloud from a variety of sources

Window-Eyes screen reader JAWS screen reader

ISpTTSEngine

The SAPI speech synthesis (text-to-speech, or TTS) engine implements the ISpTTSEngine interface.

ISpTTSEngine::Speak is the primary method called by SAPI to perform speech rendering. SAPI, rather than the engine, performs parsing of the input text stream. The Speak method receives a linked list of text fragments. The Speak method also receives a pointer to the ISpVoice ISpTTSEngineSite interface. The TTS engine uses this interface to queue events and to write the output audio data.

ISpVoice

The central SAPI API for reading text and converting it into speech (TTS) is ISpVoice. Using this interface, applications can add TTS support such as speaking text, modifying speech characteristics, changing voices, as well as responding to real-time events while speaking. In fact, most applications should need only this single interface to accomplish everything that is needed for basic TTS support.

Applications obtain access to ISpVoice interface methods by creating a COM object. As the name implies, an ISpVoice object is simply a single instance of a specific TTS voice. Every ISpVoice object is an individual voice. Even if two different ISpVoice objects select the same base voice (for example "Mike"), each

of the two voices can be changed and modified independently of the other.

Speaking

When an application first creates an ISpVoice object, the object initializes to the default voice (set in Speech properties of Control Panel). This means that the new object is immediately ready to speak text, no special initialization is needed. At this point, applications can use Speak or SpeakStream to speak any Unicode text data.

Synchronous vs. Asynchronous Speaking

The two speaking functions can generate speech either synchronously (function does not return until text has completely spoken) or asynchronously (function returns immediately but continues speaking as a background process). Asynchronous operation is chosen if the application needs to do something else (highlight text, paint animation, monitor controls, etc.) while speaking. Otherwise, the simplest case is to speak synchronously.

TTS Engine Characteristics

Engines use the three characteristics of Volume, Pitch, and Rate to partially define speech traits. At the application level, setting these values is simple; you need only set them to a given number. However, implementation of these traits is more complex for the engine.

Volume

At the application level, volume is a number from zero to 100 where 100 is the maximum value for a voice. It is a linear progression and a value of 50 represents half of the loudest permitted. The increments should be the range divided by 100.

Pitch adjustment

The value can range from -10 to +10. A value of zero sets a voice to speak at its default pitch. A value of -10 sets a voice to speak at three-fourths of its default pitch. A value of +10 sets a voice to speak at four-thirds of its default pitch.

Rate adjustment

The value can range from -10 to +10. A value of zero sets a voice to speak at its default rate. A value of -10 sets a voice to speak at one-third of its default rate. A value of +10 sets a voice to speak at three times its default rate.

ISpVoice

• The ISpVoice interface enables an application to perform text synthesis operations.

• Applications can speak text strings and text files, or play audio files through this interface. All of these can be done synchronously or asynchronously.

• Applications can choose a specific TTS voice using ISpVoice::SetVoice.

• The state of the voice (for example, rate, pitch, and volume), can be modified. Some attributes, like rate and volume, can be changed in real time using ISpVoice::SetRate and ISpVoice::SetVolume. Voices can be set to different priorities using ISpVoice::SetPriority.

• ISpVoice inherits from the ISpEventSource interface. An ISpVoice object forwards events back

to the application when the corresponding audio data has been rendered to the output device.

ActiveX Functions

Use the ActiveX functions to pass properties and methods to and from other ActiveX-enabled applications, such as Microsoft Excel.

ActiveX is a framework for defining reusable software components in a programming language-independent way. Software applications can then be composed from one or more of these components in order to provide their functionality.

Many Microsoft Windows applications —including many of those from Microsoft itself, such as Internet Explorer, Microsoft Office, Microsoft Visual Studio, and Windows Media Player — use ActiveX controls to build their feature-set and also encapsulate their own functionality as ActiveX controls which can then be embedded into other applications.

Use the ActiveX functions to pass properties and methods to and from other ActiveX-enabled applications, such as Microsoft Excel, Notepad, Microsoft Word

Some applications provide ActiveX data in the form of a self-describing data type called a variant.

1) The first step, we run the VI

Front Panel of our VI

Stepwise explanation…

2) Then we need to specify the synthetic voice that we want to use. By default Microsoft Anna – English(UK) comes with the SDK and that is the voice we are going to use because other voices are needed to be registered

3) After selecting the voice we have to select the audio output device.

Step 3…

Step 4…

4) We have given two input methods-

• directly speak the entered text

• speak text by reading it from a text file saved on the computer

Step 5…

Method 1:

Enter the text to be spoken

Method 2:

Browse for the text file to be used and click ok

Control Speech Rate, Volume Control and Pause/resume execution

Step 6…

Finally after selecting the voice, the audio output, typing the text or selecting the text file and setting the speech rate and volume control now press GO

Additional Steps…

Stopping the execution

See if any error occurred

Block Diagram Explanation

The Get Voices VI calls the Microsoft Speech SDK 5.1 software and retrieves the voices available for use on your computer. It loads their names into a list box, so you can select the one you want to use.

The Get Audio Devices VI retrieves information on the audio devices available on your computer. It then loads this information into a listbox, so you can select the device you want like to use.

The While Loop and Event structure allow you to perform the following:

a) Select the voice you want to use.

b) Select the audio device you want to use.

c) Select the location of the text to be read (string control or path to a

.txt file).

d) Adjust the speech rate and volume.

e) Pause and resume the reading of the text.

f) Stop the VI.

This case includes the Speak.vi and the GO button.

This case is used for Pause/Resume.

This case stops the VI if you click the Stop button.

Applications of this project:

1)Speech system for dumb people or people with any other speech inability.

2)Hiding the identity of people in secret videos and sting operations.

3)Learning pronunciation of new words.

4)Narrating any text in teaching purposes

5)Text-to-speech systems are now commonly used by people with dyslexia and other reading difficulties as well as by pre-literate children.

6)They are also frequently employed to aid those with severe speech impairment usually through a dedicated voice output communication aid.

7)Speech synthesis techniques are also used in entertainment productions such as games and animations.

References:

wikipedia.com answers.yahoo.com microsoft.com ni.com Mattingly, Ignatius G. (1974). Sebeok, Thomas A.

ed. "Speech synthesis for phonetic and phonological models"

Date post:	26-Oct-2014
Category:	Documents
Upload:	jaspreet-singh-walia
View:	220 times
Download:	6 times

Speech System for Dumb People

Documents