p 1530 Voice Controlled Robot

VOICE CONTROLLED ROBOT

(Submitted in partial fulfillment for the award of Bachelor of Electronics

Engineering Degree by the University of Mumbai)

By

Pratik Chopra

Harshad Dange

Under the guidance of

Mr. Shirish S. Halbe

(Asst. Professor & Hobby Centre Co-ordinator )

Department of Electronics Engineering,

K. J. Somaiya College of Engineering,

Vidyavihar, Mumbai - 400077.

2006 - 2007.

________________________________________________________________________

_____________________________________________________________________ 2

K.J.SOMAIYA COLLEGE OF ENGINERING,

VIDYAVIHAR, MUMBAI -400077.

DEPARTMENT OF ELECTRONICS

Certificate

This is to certify that Mr. Pratik Chopra of Electronics Department, bearing the

University seat number 8139 has completed the B.E. project on Voice

Controlled Robot and is accepted and examined for the partial fulfillment of the

Bachelor of Electronics Engineering Degree by the University of Mumbai.

Prof. Shirish S. Halbe Prof. Milind Marathe Dr. P.P Parikh

GUIDE H. O. D. Director / Principal

_______________ ________________

Examiner Date of Examination

________________________________________________________________________

_____________________________________________________________________ 3

ACKNOWLEGDEMENT

We take this opportunity to express our deepest gratitude towards Mr. S.S. Halbe, our

project guide, who has been the driving force behind this project and whose guidance and

co-operation has been a source of inspiration for us.

We would also like to thank Prof. Samir Mhatre for his valuable support whenever

needed.

We are very much thankful to our professors, colleagues and authors of various

publications to which we have been referring to. We express our sincere appreciation

and thanks to all those who have guided us directly or indirectly in our project. Also

much needed moral support and encouragement was provided on numerous occasions by

our whole division

Finally we thank our parents for their immense support.

________________________________________________________________________

_____________________________________________________________________ 4

Contents

1. Introduction--------------------------------------------------------------------5

2. The Task------------------------------------------------------------------------7

3. Speech Recognition Types/Styles-------------------------------------------9

4. Approaches to statistical Speech Recognition----------------------------11

5. Nature of Problem------------------------------------------------------------13

6. Solution to Problems---------------------------------------------------------16

7. Design Approach-------------------------------------------------------------18

a. Speech Recognition Module----------------------------------------19

b. Microcontroller and Decoder circuit-----------------------------28

c. RF module------------------------------------------------------------33

d. Driver Circuit--------------------------------------------------------35

e. Buffer-----------------------------------------------------------------35

f. Batteries--------------------------------------------------------------35

8. Training and Recognition---------------------------------------------------36

9. Applications-------------------------------------------------------------------37

10. Components Used------------------------------------------------------------38

11. Datasheet-HM2007----------------------------------------------------------39

12. Project Progress Report Summary----------------------------------------46

13. Bibliography------------------------------------------------------------------47

________________________________________________________________________

_____________________________________________________________________ 5

Chapter1. INTRODUCTION

When we say voice control, the first term to be considered is Speech Recognition i.e.

making the system to understand human voice. Speech recognition is a technology where

the system understands the words (not its meaning) given through speech.

Speech is an ideal method for robotic control and communication. The speech-

recognition circuit we will outline, functions independently from the robot’s main

intelligence [central processing unit (CPU)]. This is a good thing because it doesn’t take

any of the robot’s main CPU processing power for word recognition. The CPU must

merely poll the speech circuit’s recognition lines occasionally to check if a command has

been issued to the robot. We can even improve upon this by connecting the recognition

line to one of the robot’s CPU interrupt lines. By doing this, a recognized word would

cause an interrupt, letting the CPU know a recognized word had been spoken. The

advantage of using an interrupt is that polling the circuit’s recognition line occasionally

would no longer be necessary, further reducing any CPU overhead.

Another advantage to this stand-alone speech-recognition circuit (SRC) is its

programmability. You can program and train the SRC to recognize the unique words you

want recognized. The SRC can be easily interfaced to the robot’s CPU.

To control and command an appliance (computer, VCR, TV security system, etc.) by

speaking to it, will make it easier, while increasing the efficiency and effectiveness of

________________________________________________________________________

_____________________________________________________________________ 6

working with that device.At its most basic level speech recognition allows the user to

perform parallel tasks, (i.e. hands and eyes are busy elsewhere) while continuing to work

with the computer or appliance.

Robotics is an evolving technology. There are many approaches to building robots, and

no one can be sure which method or technology will be used 100 years from now. Like

biological systems, robotics is evolving following the Darwinian model of survival of

the fittest.

Suppose you want to control a menu driven system. What is the most striking property

that you can think of?

Well the first thought that came to our mind is that the range of inputs in a menu driven

system is limited. In fact, by using a menu all we are doing is limiting the input domain

space. Now, this is one characteristic which can be very useful in implementing the menu

in stand alone systems. For example think of the pine menu or a washing machine menu.

How many distinct commands do they require?

Why build robots?

Robots are indispensable in many manufacturing industries. The reason is that the cost

per hour to operate a robot is a fraction of the cost of the human labor needed to perform

the same function. More than this, once programmed, robots repeatedly perform

functions with a high accuracy that surpasses that of the most experienced human

operator. Human operators are, however, far more versatile. Humans can switch job tasks

easily. Robots are built and programmed to be job specific. You wouldn’t be able to

program a welding robot to start counting parts in a bin. Today’s most advanced

industrial robots will soon become “dinosaurs.” Robots are in the infancy stage of their

evolution. As robots evolve, they will become more versatile, emulating the human

capacity and ability to switch job tasks easily. While the personal computer has made an

indelible mark on society, the personal robot hasn’t made an appearance. Obviously

there’s more to a personal robot than a personal computer. Robots require a combination

of elements to be effective: sophistication of intelligence, movement, mobility,

navigation, and purpose.

Without risking human life or limb, robots can replace humans in some hazardous duty

service. Robots can work in all types of polluted environments, chemical as well as

nuclear. They can work in environments so hazardous that an unprotected human would

quickly die.

________________________________________________________________________

_____________________________________________________________________ 7

Chapter2. THE TASK

The purpose of this project is to build a robotic car which could be controlled

using voice commands. Generally these kinds of systems are known as Speech

Controlled Automation Systems (SCAS). Our system will be a prototype of the same.

We are not aiming to build a robot which can recognize a lot of words. Our basic

idea is to develop some sort of menu driven control for our robot, where the menu is

going to be voice driven.

What we are aiming at is to control the robot using following voice commands.

Robot which can do these basic tasks:-

1. move forward

2. move back

3. turn right

4. turn left

5. load

6. release

7. stop ( stops doing the current job )

________________________________________________________________________

_____________________________________________________________________ 8

SAMPLE INPUT OUTPUT

INPUT (Speaker speaks)

OUTPUT (Robot does)

forward moves forward

Back moves back

Right turns right

Left turns left

Load Lifts the load

Release Releases the load

Stop stops doing current task

(Words are chosen in such a way that they sound least familiar)

________________________________________________________________________

_____________________________________________________________________ 9

Chapter3. SPEECH RECOGNITION TYPES AND STYLES

Voice enabled devices basically use the principal of speech recognition.It is the process

of electronically converting a speech waveform (as the realization of a linguistic

expression) into words (as a best-decoded sequence of linguistic units).

Converting a speech waveform into a sequence of words involves several essential steps:

1. A microphone picks up the signal of the speech to be recognized and converts it

into an electrical signal. A modern speech recognition system also requires that

the electrical signal be represented digitally by means of an analog-to-digital

(A/D) conversion process, so that it can be processed with a digital computer or a

microprocessor.

2. This speech signal is then analyzed (in the analysis block) to produce a

representation consisting of salient features of the speech. The most prevalent

feature of speech is derived from its short-time spectrum, measured successively

over short-time windows of length 20–30 milliseconds overlapping at intervals of

10–20 ms. Each short-time spectrum is transformed into a feature vector, and the

temporal sequence of such feature vectors thus forms a speech pattern.

3. The speech pattern is then compared to a store of phoneme patterns or models

through a dynamic programming process in order to generate a hypothesis (or a

number of hypotheses) of the phonemic unit sequence. (A phoneme is a basic unit

of speech and a phoneme model is a succinct representation of the signal that

corresponds to a phoneme, usually embedded in an utterance.) A speech signal

inherently has substantial variations along many dimensions.

Before we understand the design of the project let us first understand speech recognition

types and styles. Speech recognition is classified into two categories, speaker dependent

and speaker independent.

Speaker dependent systems are trained by the individual who will be using the system.

These systems are capable of achieving a high command count and better than 95%

accuracy for word recognition. The drawback to this approach is that the system only

responds accurately only to the individual who trained the system. This is the most

common approach employed in software for personal computers.

Speaker independent is a system trained to respond to a word regardless of who speaks.

Therefore the system must respond to a large variety of speech patterns, inflections and

enunciation's of the target word. The command word count is usually lower than the

speaker dependent however high accuracy can still be maintain within processing limits.

Industrial requirements more often need speaker independent voice systems, such as the

AT&T system used in the telephone systems.

A more general form of voice recognition is available through feature analysis and this

technique usually leads to "speaker-independent" voice recognition. Instead of trying to

________________________________________________________________________

_____________________________________________________________________ 10

find an exact or near-exact match between the actual voice input and a previously stored

voice template, this method first processes the voice input using "Fourier transforms" or

"linear predictive coding (LPC)", then attempts to find characteristic similarities between

the expected inputs and the actual digitized voice input. These similarities will be present

for a wide range of speakers, and so the system need not be trained by each new user. The

types of speech differences that the speaker-independent method can deal with, but which

pattern matching would fail to handle, include accents, and varying speed of delivery,

pitch, volume, and inflection. Speaker-independent speech recognition has proven to be

very difficult, with some of the greatest hurdles being the variety of accents and

inflections used by speakers of different nationalities. Recognition accuracy for speaker-

independent systems is somewhat less than for speaker-dependent systems, usually

between 90 and 95 percent. Speaker independent systems do not ask to train the system

as an advantage, but perform with lower quality. These systems find applications in

telephony communications such as dictating a number or a word where many people are

in concern. However, there is a need for a well training database in speaker independent

systems.

Recognition Style

Speech recognition systems have another constraint concerning the style of speech they

can recognize. They are three styles of speech: isolated, connected and continuous.

Isolated speech recognition systems can just handle words that are spoken separately.

This is the most common speech recognition systems available today. The user must

pause between each word or command spoken. The speech recognition circuit is set up to

identify isolated words of .96 second lengths.

Connected is a half way point between isolated word and continuous speech recognition.

Allows users to speak multiple words. The HM2007 can be set up to identify words or

phrases 1.92 seconds in length. This reduces the word recognition vocabulary number to

20.

Continuous is the natural conversational speech we are use to in everyday life. It is

extremely difficult for a recognizer to shift through the text as the word tend to merge

together. For instance, "Hi, how are you doing?" sounds like "Hi,.howyadoin"

Continuous speech recognition systems are on the market and are under continual

development.

________________________________________________________________________

_____________________________________________________________________ 11

4. Approaches of Statistical Speech Recognition

a. Hidden Markov model (HMM)-based speech recognition

Modern general-purpose speech recognition systems are generally based on hidden

Markov models (HMMs). This is a statistical model which outputs a sequence of symbols

or quantities.

One possible reason why HMMs are used in speech recognition is that a speech signal

could be viewed as a piece-wise stationary signal or a short-time stationary signal. That

is, one could assume in a short-time in the range of 10 milliseconds, speech could be

approximated as a stationary process. Speech could thus be thought as a Markov model

for many stochastic processes (known as states).

Another reason why HMMs are popular is because they can be trained automatically and

are simple and computationally feasible to use. In speech recognition, to give the very

simplest setup possible, the hidden Markov model would output a sequence of n-

dimensional real-valued vectors with n around, say, 13, outputting one of these every 10

milliseconds. The vectors, again in the very simplest case, would consist of cepstral

coefficients, which are obtained by taking a Fourier transform of a short-time window of

speech and de-correlating the spectrum using a cosine transform, then taking the first

(most significant) coefficients. The hidden Markov model will tend to have, in each state,

a statistical distribution called a mixture of diagonal covariance Gaussians which will

give likelihood for each observed vector. Each word, or (for more general speech

recognition systems), each phoneme, will have a different output distribution; a hidden

Markov model for a sequence of words or phonemes is made by concatenating the

individual trained hidden Markov models for the separate words and phonemes.

The above is a very brief introduction to some of the more central aspects of speech

recognition. Modern speech recognition systems use a host of standard techniques which

it would be too time consuming to properly explain, but just to give a flavor; a typical

large-vocabulary continuous system would probably have the following parts. It would

need context dependency for the phones (so phones with different left and right context

have different realizations); to handle unseen contexts it would need tree clustering of the

contexts; it would of course use cepstral normalization to normalize for different

recording conditions and depending on the length of time that the system had to adapt on

different speakers and conditions it might use cepstral mean and variance normalization

for channel differences, vocal tract length normalization (VTLN) for male-female

normalization and maximum likelihood linear regression (MLLR) for more general

speaker adaptation. The features would have delta and delta-delta coefficients to capture

speech dynamics and in addition might use heteroscedastic linear discriminant analysis

(HLDA); or might skip the delta and delta-delta coefficients and use LDA followed

perhaps by heteroscedastic linear discriminant analysis or a global semi tied covariance

transform (also known as maximum likelihood linear transform (MLLT)). A serious

company with a large amount of training data would probably want to consider

discriminative training techniques like maximum mutual information (MMI), MPE, or

(for short utterances) MCE, and if a large amount of speaker-specific enrollment data was

available a more wholesale speaker adaptation could be done using MAP or, at least, tree-

________________________________________________________________________

_____________________________________________________________________ 12

based maximum likelihood linear regression. Decoding of the speech (the term for what

happens when the system is presented with a new utterance and must compute the most

likely source sentence) would probably use the Viterbi algorithm to find the best path, but

there is a choice between dynamically creating combination hidden Markov models

which includes both the acoustic and language model information, or combining it

statically beforehand (the AT&T approach, for which their FSM toolkit might be useful).

Those who value their sanity might consider the AT&T approach, but be warned that it is

memory hungry.

b. Neural network-based speech recognition

Another approach in acoustic modeling is the use of neural networks. They are capable of

solving much more complicated recognition tasks, but do not scale as well as HMMs

when it comes to large vocabularies. Rather than being used in general-purpose speech

recognition applications they can handle low quality, noisy data and speaker

independence. Such systems can achieve greater accuracy than HMM based systems, as

long as there is training data and the vocabulary is limited. A more general approach

using neural networks is phoneme recognition. This is an active field of research, but

generally the results are better than for HMMs. There are also NN-HMM hybrid systems

that use the neural network part for phoneme recognition and the hidden Markov model

part for language modeling.

c. Dynamic time warping (DTW)-based speech recognition

Dynamic time warping is an algorithm for measuring similarity between two sequences

which may vary in time or speed. For instance, similarities in walking patterns would be

detected, even if in one video the person was walking slowly and if in another they were

walking more quickly, or even if there were accelerations and decelerations during the

course of one observation. DTW has been applied to video, audio, and graphics -- indeed,

any data which can be turned into a linear representation can be analyzed with DTW.

A well known application has been automatic speech recognition, to cope with different

speaking speeds. In general, it is a method that allows a computer to find an optimal

match between two given sequences (e.g. time series) with certain restrictions, i.e. the

sequences are "warped" non-linearly to match each other. This sequence alignment

method is often used in the context of hidden Markov models.

________________________________________________________________________

_____________________________________________________________________ 13

Chapter4. NATURE OF PROBLEM

Speech recognition is the process of finding a interpretation of a spoken utterance;

typically, this means finding the sequence of words that were spoken.

This involves preprocessing the acoustic signals to parameterize it in a more usable

and useful form. The input signal must be matched against a stored pattern and then

makes a decision of accepting or rejecting a match. No two utterances of the same word

or sentence are likely to give rise to the same digital signal. This obvious point not only

underlies the difficulty in speech recognition but also means that we be able to extract

more than just a sequence of words from the signal.

The different types of problems we are going to face in our project have been

enumerated below: -

DIFFERENCES IN THE VOICES OF DIFFERENT PEOPLE:-

The voice of a man differs from the voice of a woman that again differs from the

voice of a baby. Different speakers have different vocal tracts and source physiology.

Electrically speaking, the difference is in frequency. Women and babies tend to speak at

higher frequencies from that of men.

DIFFERENCES IN THE LOUDNESS OF SPOKEN WORDS:-

No two persons speak with the same loudness. One person will constantly go on

speaking in a loud manner while another person will speak in a light tone. Even if the

same person speaks the same word on two different instants, there is no guarantee that he

will speak the word with the same loudness at the different instants. The problem of

loudness also depends on the distance the microphone is held from the user's mouth.

Electrically speaking, the problem of difference is reflected in the amplitude of

the generated digital signal.

DIFFERENCE IN THE TIME:-

Even if the same person speaks the same word at two different instants of time,

there is no guarantee that he will speak exactly similarly on both the occasions.

Electrically speaking there is a problem of difference in time i.e. indirectly frequency.

________________________________________________________________________

_____________________________________________________________________ 14

OSCILLOGRAM (WAVEFORM)

Physically the speech signal (actually all sound) is a series of pressure changes in the

medium between the sound source and the listener. The most common representation of

the speech signal is the oscillogram, often called the waveform. In this the time axis is the

horizontal axis from left to right and the curve shows how the pressure increases and

decreases in the signal. The utterance we have used for demonstration is "phonetician",.

The signal has also been segmented, such that each phoneme in the transcription has been

aligned with its corresponding sound event.

phonetician

SPECTROGRAM

In the spectrogram the time axis is the horizontal axis, and frequency is the vertical axis.

The third dimension, amplitude, is represented by shades of darkness. Consider the

spectrogram to be a number of spectrums in a row, looked upon "from above", and where

the highs in the spectra are represented with dark spots in the spectrogram.

From the picture it is obvious how different the speech sounds are from a spectral point

of view

________________________________________________________________________

_____________________________________________________________________ 15

Now, let's look at the spectrograms of the vowel /i:/ in "three" and "tea".

Figure Example of vowel /i:/ in different phonetic contexts.

PROBLEMS DUE TO NOISE:-

A machine will have to face many problems, when trying to imitate the ability of

humans. The audio range of frequencies varies from 20 Hz to 20 kHz. Some external

noises have frequencies that may be within this audio range. These noises pose a problem

since they cannot be filtered out.

DIFFERENCES IN THE PROPERTIES OF MICROPHONES: -

There may be problems due to differences in the electrical properties of different

mikes and transmission channels.

DIFFERENCES IN THE PITCH:-

Pitch and other source features such as breathiness and amplitude can be varied

independently.

OTHER PROBLEMS:-

We have to make sure that robot does not go out of reach of our voice.

• Output of microphone is very small.

• Output of Voice recognition chip is not compatible with input required at motors.

________________________________________________________________________

_____________________________________________________________________ 16

Chapter5. SOLUTION TO PROBLEMS

After analyzing the problems we come out with the solutions which are listed below.

1. Amplitude Variation:-

Amplitude variation of the electrical signal output of microphone may occur mainly due

to:

a) Variation of distance between sound source and the transducer.

b) Variation of strength of sound generated by source.

To recognize a spoken word, it does not matter whether it has been spoken loudly

or less loudly. This is because characteristic features of a word spoken lies in its

frequency & not in its loudness (amplitude). Thus, at a certain stage this amplitude

information is suitably normalized.

2. Recognition of a word: -

If same word is spoken two times at different time instants, they sound similar to

us; question arises what is the similarity in-between them? It is important to note that it

does not matter whether one of spoken word was of different loudness than the other. The

difference lies in frequency. Hence, any large frequency variation would cause the system

not to recognize the word. In speaker independent type of system, some logic can be

implemented to take care of frequency variation. A small frequency variation i.e. features

variation within tolerable limits is considered to be acceptable.

3. Noise:-

Along with the sound source of the speech the other stray sounds also are picked

up by the microphone, thus degrading the information contained in the signal.

4. Microphone response: -

Two different microphones may not have same response. Hence if microphone is

changed, or the system is installed on a new PC due to different response the success rate

of recognition may drop.

5. In order our voice is recognized by robot at a distance we will use wireless mic. In

case robot does not recognize any word, we will make an arrangement such that robot

automatically stops after some time.

6. We will use microphone pre-amplifier circuit. It is in-built in HM2007

7. We use decoding logic and motor driving circuits so chip and motors are made

compatible, thereby solving compatibility problem.

________________________________________________________________________

_____________________________________________________________________ 17

8. One of the important problem which needed to be solved was to provide sufficient

current and voltage to entire assembly when interfered together. Since the current drawn

from supply was so much that a 9V battery could not last for a longer period, we used

current buffer IC. In our application we have used 74LS245.

________________________________________________________________________

_____________________________________________________________________ 18

Chapter7. DESIGN APPROACH

The most challenging part of the entire system is designing and interfacing various stages

together. Our approach was to get the analog voice signal being digitized. The frequency

and pitch of words be stored in a memory. These stored words will be used for matching

with the words spoken. When the match is found, the system outputs the address of

stored words. Hence we have to decode the address and according to the address sensed,

the car will perform the required task. Since we wanted the car to be wireless, we used

RF module. The address was decoded using microcontroller and then applied to RF

module. This together with driver circuit at receivers end made complete intelligent

systems.

It must be noted that we did not use wireless mic instead used analog RF module which

transmitted 5 different frequencies each for right, left, forward, backward, crane

movement.

SYSTEM DESIGN

a. Voice Recognition Module

b. Microcontroller and Decoder

c. RF module

d. Motor Driver Circuit

e. Buffer

________________________________________________________________________

_____________________________________________________________________ 19

Block Diagram:

Voice Recognition Module

The speech recognition module basically consists of:

Voice Recognition Chip: It is the heart of the entire system. HM2007 is a voice

recognition chip with on-chip analog front end, voice analysis, recognition process and

system control functions. The input voice command is analyzed, processed, recognized

and then obtained at one of its output port which is then decoded , amplified and given to

motors of robot car.

________________________________________________________________________

_____________________________________________________________________ 20

We initially used an Indian manufactured voice recognition chip AP7003. It is a

monolithic user dependence speech recognition IC designed for toy application. AP7003

consist of microphone amplifier, A/D converter, speech processor and I/O controller.

After pre-recording, AP7003 can recognize up to 12 different sentences each with 1.5 sec

length with highly I/O programmability. However it was not much accurate and reliable.

So we started looking for another alternative. We found HM 2007 as a right choice.

The chip provides the options of recognizing either forty .96 second words or

twenty 1.92 second words. This circuit allows the user to choose either the .96 second

word length (40 word vocabulary) or the 1.92 second word length (20 word vocabulary).

For memory the circuit uses an 8K X 8 static RAM.

The chip has two operational modes; manual mode and CPU mode. The CPU

mode is designed to allow the chip to work under a host computer. This is an attractive

approach to speech recognition for computers because the speech recognition chip

operates as a co-processor to the main CPU. The jobs of listening and recognition don’t

occupying any of the computer's CPU time. When the HM2007 recognizes a command it

can signal an interrupt to the host CPU and then relay the command code. The HM2007

chip can be cascaded to provide a larger word recognition library.

The circuit we are building operates in the manual mode. The manual mode allows one

to build a stand alone speech recognition board that doesn't require a host computer and

may be integrated into other devices to utilize speech control.

The major components of this design are: a speech recognition chip, memory,

keypad, and LED 7-segment display. The chip is designed for speaker dependent (one-

user) applications, but can be manipulated to perform speaker independent (multiple-

users) applications. The keypad and LED 7-segment display will be used to program and

test the voice recognition circuit.

________________________________________________________________________

_____________________________________________________________________ 21

More about the HM2007 chip

The HM2007 is a single-chip complementary metal-oxide semiconductor (CMOS) voice-

recognition large-scale integration (LSI) circuit. The chip contains an analog front end,

voice analysis,recognition, and system control functions. The chip may be used in a

stand-alone or connected CPU.

Features

• Single-chip voice-recognition CMOS LSI

• Speaker-dependent

• External RAM support

• Maximum of 40-word recognition

• Maximum word length of 1.92 s

• Microphone support

• Manual and CPU modes available

• Response time less than 300 milliseconds (ms)

• 5 volt (5V) power supply

The system we are building is typically trained as speaker dependent (single user).Thus

the user will be its real master.

Microphone: It takes the analog voice commands and sends it to voice recognition

chip(HM 2007) in the form of electrical signal.

The human ear has an auditory range from 10 to 15,000 Hz. Sound can be picked up

easily using a microphone and amplifier. Microphones typically have an auditory range

that surpasses that of human hearing.

Microphones are transducers which detect sound signals and produce an electrical image

of the sound, i.e., they produce a voltage or a current which is proportional to the sound

signal. The most common microphones for musical use are dynamic, ribbon, or

condenser microphones. Besides the variety of basic mechanisms, microphones can be

designed with different directional patterns and different impedances.

________________________________________________________________________

_____________________________________________________________________ 22

Dynamic Microphones

Principle: sound moves the cone and the attached coil of

wire moves in the field of a magnet. The generator

effect produces a voltage which "images" the sound

pressure variation - characterized as a pressure

microphone.

Advantages:

• Relatively cheap and

rugged.

• Can be easily

miniaturized.

Disadvantages:

• The uniformity of

response to different

frequencies does not

match that of the ribbon

or condenser

microphones

Ribbon Microphones

Principle: the air movement associated with the sound

moves the metallic ribbon in the magnetic field, generating

an imaging voltage between the ends of the ribbon which is

proportional to the velocity of the ribbon - characterized as

a "velocity" microphone.

Advantages:

• Adds "warmth" to

the tone by accenting

lows when close-

miked.

• Can be used to

discriminate against

distant low frequency

noise in its most

common gradient

form.

Disadvantages:

• Accenting lows

sometimes produces

"boomy" bass.

• Very susceptible to

wind noise. Not

suitable for outside

use unless very well

shielded

________________________________________________________________________

_____________________________________________________________________ 23

Condenser Microphones

Principle: sound pressure changes the spacing

between a thin metallic membrane and the

stationary back plate. The plates are charged to a

total charge

where C is the capacitance and V the voltage of the

biasing battery.

Advantages:

• Best overall frequency

response makes this the

microphone of choice for

many recording applications.

Disadvantages:

• Expensive

• May pop and crack when

close miked

• Requires a battery or external

power supply to bias the

plates.

A change in plate spacing will cause

a change in charge Q and force a

current through resistance R. This

current "images" the sound pressure,

making this a "pressure" microphone

Pop filters in front of mics.

Some microphones are very sensitive to minor gusts of wind--so sensitive in fact that

they will produce a loud pop if you breath on them. To protect these mics (some of which

can actually be damaged by blowing in them) engineers will often mount a nylon screen

between the mic and the artist. This is not the most common reason for using pop filters

though:

Vocalists like to move around when they sing; in particular, they will lean into

microphones. If the singer is very close to the mic, any motion will produce drastic

changes in level and sound quality. (You have seen this with inexpert entertainers using

hand held mics.) Many engineers use pop filters to keep the artist at the proper distance.

The performer may move slightly in relation to the screen, but that is a small proportion

of the distance to the microphone.

________________________________________________________________________

_____________________________________________________________________ 24

Keypad: It is used for training/programming the chip. It also allocates definite memory

locations to voice commands. The keypad is made up of 12 switches.

.

Figure 2

When the circuit is turned on, the HM2007 checks the static RAM. If everything checks

out the board displays "00" on the digital display and lights the red LED (READY). It is

in the "Ready" waiting for a command.

________________________________________________________________________

_____________________________________________________________________ 25

7-segment Display: It is used to test the voice recognition circuit.

The 7 segment display is used as a numerical indicator on many types of test equipment.It

is an assembly of light emitting diodes which can be powered individually.

They most commonly emit red light.

Powering all the segments will display the number 8.

Powering a,b,c d and g will display the number 3.

Numbers 0 to 9 can be displayed.

The d.p represents a decimal point.

The one shown is a common anode display since all anodes are joined together and go to

the positive supply.

The cathodes are connected individually to zero volts.

Resistors must be placed in series with each diode to limit the current through each diode

to a safe value.

Common cathode displays where all the cathodes are joined are also available.

________________________________________________________________________

_____________________________________________________________________ 26

Applications and Drivers

A numeral to be displayed on a seven segment display is usually encoded in BCD form,

and a logic circuit driver ON or OFF the proper segments of the display. This logic is also

called decoder. Various decoders are available to drive common anode and common

cathode displays. One of the easily available decoder is 7447 AND 7448 TTL decoders.

They are open collector TTL that are designed to pull down common anode (7447 type)

and common cathode (7448 type) through external current limiting resistors.

We used 7448 decoder chip driving a common cathode seven segment display.

Circuit Diagram of voice recognition module:

8k x 8 RAM: It stores decoded voice commands by the chip at the assigned locations.

________________________________________________________________________

_____________________________________________________________________ 27

Output of Voice recognition module

The 8-bit output is taken from the output of the 74LS373 data octal latch. The output is

not a standard 8-bit byte, but it is broken into two 4-bit binary coded decimal (BCD)

nibbles. BCD code is related to standard binary numbers as Table below illustrates.

As you can see, the binary and BCD numbers remain the same until reaching decimal 10.

At decimal 10, BCD jumps to the upper nibble and the lower nibble resets to zero. The

binary numbers continue to decimal 15, and then jump to the upper nibble at 16 where

the lower nibble resets. If a computer is expecting to read an 8-bit binary number and

BCD is provided, this will be the cause of errors. Further since the module outputs nos.

55, 66 and 77 as default value for errors and we want these outputs not to be used, we use

microcontroller.

________________________________________________________________________

_____________________________________________________________________ 28

Microcontroller and driver circuit

Decoder: It is second most important part of the project. The output from the chip is

given to decoder (micro-controller) which acts as a DMC i.e. a Digital Motor Controller.

DMC senses the output ports of HM2007 chip and produces proper o/p as per the

commands forward, backward, left, right, load, release, stop. The proper functionality of

the system depends on the proper decoding logic.

We use port0 as input port and port1 as output port.

P0.0 to P0.6 are given inputs from 7 output pins of voice recognition module

While P0.7 is kept grounded

________________________________________________________________________

_____________________________________________________________________ 29

Microcontroller circuit:

________________________________________________________________________

_____________________________________________________________________ 30

Table shows the output codes generated due to different commands after programming

the microcontroller.

Commands P0.7 P0.6 P0.5 P0.4 P0.3 P0.2 P0.1 P0.0 Code

Stop 1 1 1 1 1 1 1 1 FF

Right 1 1 1 1 0 1 1 1 F7

Left 1 1 1 0 1 1 1 1 EF

Backward 1 1 0 1 1 1 1 1 DF

Forward 1 0 1 1 1 1 1 1 BF

Crane 0 1 1 1 1 1 1 1 7F

(For wireless car, this is input to RF module and then to motors through driver ckt)

Commands P0.0 P0.1 P0.2 P0.3 P0.4 P0.5 P0.6 P0.7 Code

Stop 0 0 0 0 0 0 0 0 00

Right 0 0 0 0 0 1 0 0 04

Left 0 0 0 0 0 0 0 1 01

Backward 0 0 0 0 1 0 1 0 0A

Forward 0 0 0 0 0 1 0 1 05

Crane 0 1 1 1 1 1 1 0 FC

(For wired car, this is input directly to driver ckt)

________________________________________________________________________

_____________________________________________________________________ 31

Keil 2 µVision

• It is software which allows us to use C language, basic language as per user

convenience. This can be then converted into hex codes. Thus making

programming simpler. Thus no need to refer opcodes for commands.

________________________________________________________________________

_____________________________________________________________________ 32

Aec_isp_v3 µC Programmer

• It is used to program 89S51, 89S52, 89S53.

• It reads, programs hex files into microcontroller.

• Running the Software: Your code needs to be in Intel Hex Format.AEC_ISP

will open the file you specify and load it into a buffer. You can specify a default

file in the command line; e.g.: To specify TEST.HEX as the default file; start by

typing ‘AEC_ISP TEST.HEX’.

________________________________________________________________________

_____________________________________________________________________ 33

RF module:

Let's take a closer look at the RC truck we saw in 1st chapter. We will assume that the

exact frequency used is 27.9 MHz. Here's the sequence of events that take place when

you use the RC transmitter:

• You press a trigger to make the truck go forward.

• The trigger causes a pair of electrical contacts to touch, completing a circuit

connected to a specific pin of an integrated circuit (IC).

• The completed circuit causes the transmitter to transmit a set sequence of

electrical pulses.

Each sequence contains a short group of synchronization pulses, followed by the pulse

sequence. For our truck, the synchronization segment -- which alerts the receiver to

incoming information -- is four pulses that are 2.1 milliseconds (thousandths of a second)

long, with 700-microsecond (millionths of a second) intervals. The pulse segment, which

tells the antenna what the new information is, uses 700-microsecond pulses with 700-

microsecond intervals.

________________________________________________________________________

_____________________________________________________________________ 34

A typical RC signal transmission

Here are the pulse sequences used in the pulse segment:

1. Forward: 16 pulses

2. Backward: 40 pulses

3. Forward/Left: 28 pulses

4. Forward/Right: 34 pulses

5. U-turn: 52 pulses

6. Crane movement: 46 pulses

The transmitter sends bursts of radio waves that oscillate with a frequency of 27,900,000

cycles per second (27.9 MHz).

The truck is constantly monitoring the assigned frequency (27.9 MHz) for a signal. When

the receiver receives the radio bursts from the transmitter, it sends the signal to a filter

that blocks out any signals picked up by the antenna other than 27.9 MHz. The remaining

signal is converted back into an electrical pulse sequence.

The pulse sequence is sent to the IC in the truck, which decodes the sequence and starts

the appropriate motor. For our example, the pulse sequence is 16 pulses (forward), which

means that the IC sends positive current to the motor running the wheels. If the next pulse

sequence were 40 pulses (reverse), the IC would invert the current to the same motor to

make it spin in the opposite direction.

The motor's shaft actually has a gear on the end of it, instead of connecting directly to the

axle. This decreases the motor's speed but increases the torque, giving the truck adequate

power through the use of a small electric motor!

The truck moves forward.

________________________________________________________________________

_____________________________________________________________________ 35

Buffer: We used IC 74LS245 as buffer ic.It solved the current supply problem. It is a 3-

state octal bus transceiver. They are designed for asynchronous two-way communication

between data buses.The device allows the A bus to the B bus or vice-versa depending

upon the logic level at the direction control (DIR) input. The enable input Ġ can be used

to disable the device so that the buses are effectively isolated.

Batteries

Batteries are by far the most commonly used electric power supply for robotics. Batteries

are so commonplace that it’s easy to take them for granted. An understanding of batteries

will help you choose batteries that will optimize your robot’s design.

Primary batteries

Primary batteries are one-time-use batteries. The batteries we will look at in this class

deliver 1.5 V per cell. They are designed to deliver their rated electrical capacity and then

be discarded. When building robotic systems, discarding depleted primary batteries can

become expensive. However, one advantage to using primary batteries is that they

typically have a greater electrical capacity than rechargeables. If one is engaged in a

function (i.e., a robotic war) that requires the highest power density available for one-shot

use, primary batteries may be the way to go.

Secondary batteries

Secondary batteries are rechargeable. The most common rechargeable batteries are NiCds

and lead-acid. Secondary batteries, while initially more expensive, are cheaper in the long

run. Typically secondary batteries can be recharged 200 to 1000 times.

________________________________________________________________________

_____________________________________________________________________ 36

Chapter8. TRAINING AND RECOGNITION

To record or train a command, the chip stores the analog signal pattern and amplitude and

saves it in the 8kx8 SRAM. In recognition mode, the chip compares the user- inputted

analog signal from the microphone with those stored in the SRAM and if it recognizes a

command, an output of the command identifier will be sent to the microprocessor through

the D0 to D7 ports of the chip. For training, testing (if recognized properly) and clearing

the memory, keypad and 7-segment display is used.

To Train:

To train the circuit begin by pressing the word number you want to train on the keypad.

Use any numbers between 1 and 40. For example press the number "1" to train word

number 1. When you press the number(s) on the keypad the red led will turn off. The

number is displayed on the digital display. Next press the "#" key for train. When the "#"

key is pressed it signals the chip to listen for a training word and the red led turns back

on. Now speak the word you want the circuit to recognize into the microphone clearly.

The LED should blink off momentarily, this is a signal that the word has been accepted.

Continue training new words in the circuit using the procedure outlined above. Press the

"2" key then "#" key to train the second word and so on. The circuit will accept up to

forty words. You do not have to enter 40 words into memory to use the circuit. If you

want you can use as many word spaces as you want..

Recognition:

The circuit is continually listening. Repeat a trained word into the microphone. The

number of the word should be displayed on the digital display. For instance if the word

"directory" was trained as word number 25. Saying the word "directory" into the

microphone will cause the number 25 to be displayed.

Error Codes:

The chip provides the following error codes:

55 = word too long

66 = word too short

77 = word no match

________________________________________________________________________

_____________________________________________________________________ 37

Chapter9. APPLICATIONS

We believe such a system would find wide variety of applications. Menu driven

systems such as e-mail readers, household appliances like washing machines, microwave

ovens, and pagers and mobiles etc. will become voice controlled in future

• The robot is useful in places where humans find difficult to reach but human

voice reaches. E.g. in a small pipeline, in a fire-situations, in highly toxic areas.

• The robot can be used as a toy.

• It can be used to bring and place small objects.

• It is the one of the important stage of Humanoid robots.

• Command and control of appliances and equipment

• Telephone assistance systems

• Data entry

• Speech and voice recognition security systems

________________________________________________________________________

_____________________________________________________________________ 38

Chapter10. COMPONENTS USED

Parts list for speech-recognition circuit

• (1) IC1 HM2007 IC

• (1) IC2 SRAM 8K X 8

• (1) IC3 74LS373

• (2) IC4 and IC5 7448

• (1) XTAL 3.57 MHz

• (1) Speech-recognition PCB

• (1) 12-contact keypad

• (2) 7-segment displays

• (2) 16-pin, 220-ohm, 1/4W resistor packs

• (1) 22K-ohm, 1/4-W resistor

• (1) 5.6K-ohm, 1/4-W resistor

• (1) 0.0047-uF cap

• (1) C2 100-uF, 16V cap

• (1) C5 0.1-uF cap

• (1) 7805 voltage regulator

• (1) Microphone

• (1) 9V battery clip

Parts list for interface circuit

• (1) Micrcontroller 89S51

• (1) 74LS373 Octal D flip-flop tri-state

• (4) 220 ohm 7pin Ressistor Bank

• (10) Miniature LEDs

• (1)L298N

• (1)RF module

• (1)40 Mhz crystal

• (3)DC motors

• (1)Antenna

• (4)Male-Female 7pin connectors

Parts available from: Images Company

39 Seneca Loop

Staten Island, NY 10314

http://www.imagesco.com

________________________________________________________________________

_____________________________________________________________________ 39

Chapter11. DATASHEET

HM2007

Features

• Single-chip voice-recognition CMOS LSI

• Speaker-dependent

• External RAM support

• Maximum of 40-word recognition

• Maximum word length of 1.92 s

• Microphone support

• Manual and CPU modes available

• Response time less than 300 milliseconds (ms)

• 5 volt (5V) power supply

________________________________________________________________________

_____________________________________________________________________ 40

________________________________________________________________________

_____________________________________________________________________ 41

________________________________________________________________________

_____________________________________________________________________ 42

________________________________________________________________________

_____________________________________________________________________ 43

________________________________________________________________________

_____________________________________________________________________ 44

________________________________________________________________________

_____________________________________________________________________ 45

________________________________________________________________________

_____________________________________________________________________ 46

Chapter12. Project Progress Report Summary

Calendar year 2006:

June -Work started

July -Gathered useful information on voice processing techniques, microphone

properties. (Chapter 3,4)

August -We tried another chip AP7003-02, manufactured by Indian company A-plus

India. (Page 20)

September – We built a voice recognition module using AP7003-02.

October – Our attempts did not suceed with AP7003-02.

November- Tried to find some better alternative but finally decided to go with HM2007

and decided to get it imported from US.(Page 19,20; Chapter 11)

December- Project work was on hold.

Calendar year 2007:

January – In 2nd

week of January we worked upon voice recognition part and circuit was

soldered.In last week we got desired output of voice recognition module. (Page 26)

February – We worked upon microcontroller part. With lot of minor problems being

solved we finally even managed to complete microcontroller part. At the end of February

we got somewhat success with our wired model using proper driver circuit. (Page 28, 29,

30,31,32)

March- We made use of our waste toy car and decoded it’s remote control logic and

matched it with our microcontroller output. Finally with buffers being added between

microcontroller and rf module we were able to bring entire circuit together. At this point

of time we also won certificate in project paper presentation.(Page 33,34,35)

April- Project complete.

________________________________________________________________________

_____________________________________________________________________ 47

Chapter13. BIBLIOGRAPHY

Web:

• www.imagesco.com/articles/hm2007/SpeechRecognitionTutorial01.html

• www.migi.com for selecting motors and other robotic concepts.

• www.migindia.com/modules.php?name=News&file=article&sid=22

• www.datasheetcatalog.com

• www.alldatasheet4u.com

• http://arts.ucsc.edu/ems/music/tech_background/TE-20/teces_20.html#I. For

microphones types and properties.

• www.Howstuffworks.com for understanding microphone concepts, rf radio

working and other related concepts.

Book:

• The 8051 microcontroller –Kenneth Ayala, 3rd

reprint, 2005; Thomson

Asia Ltd.,Singapore; Chapter 3,6,7&8.For programming 89S51

• Modern Digital Electronics –RP Jain, 3rd

edition; Tata Mcgraw Hill;

Chapter 6&10. For A/D converter and 7 segment display connections.

Others:

• Keil2 software

Used for simulating the microcontroller program

• Aec_isp_v3

Used for burning/programming the microcontroller

Date post:	28-Nov-2014
Category:	Documents
Upload:	aswathymenon4
View:	676 times
Download:	1 times

p 1530 Voice Controlled Robot

Documents