Final Ppt on Speech Processing

8/4/2019 Final Ppt on Speech Processing

1/20


2/20

It is a message information which is converted into aset of neural signals which control articulatorymechanism generating an accoustic waveformcontaining information in original message

MessageInformation

Neural signal ArticulatorymechanismAccousticwaveform

Speech


3/20

Concatenation of elements from finite set of phonemes Each language having distict set of phonemes Typically in the range of 30 50 English has around 42 phonemes

Six bit numerical code sufficient for numbering Average of 10 phonemes per second Total make up of 60 bits per second- average information rate Concerns

Representation of message content Representation in a form convenient for transmission/storage


4/20

study of speech signals and the processingmethods of these signals

usually processed in a digital representation, So regarded as a special case of digital signal

processing

Information

source

Measurement of

observationSignal representation Signal transformation

Signal processing

Extraction &utilization ofinformation


5/20

SPEECH RECOGNITION SPEAKER RECOGNITION SPEECH CODING VOICE ANALYSIS SPEECH SYNTHESIS SPEECH ENHANCEMENT


6/20

AUTOMATIC SPEECH RECOGNITION CONVERTS SPOKEN WORDS TO TEXT


7/20

TEXT DEPENDENT TEXT INDEPENDENT


8/20

In a system using text dependent speech, theindividual presents either a fixed (password) orprompted (Please say the numbers 33-54-

63) phrase that is programmed into thesystem and can improve performanceespecially with cooperative users.


9/20

A text independent system has no advanceknowledge of the presenter's phrasing and ismuch more flexible in situations where the

individual submitting the sample may beunaware of the collection or unwilling tocooperate, which presents a more difficultchallenge.


10/20

Speech codingis the application of data compression ofdigital audio signals containing speech.

Speech coding uses speech-specific parameter estimationusing audio signal processing techniques to model thespeech signal, combined with generic data compression

algorithms to represent the resulting modeled parameters ina compact bit stream. The two most important applications of speech coding are

mobile telephony and Voice over IP.


11/20

Voice analysis is the study of speech sounds for purposes otherthan linguistic content, such as in speech recognition.

Such studies include mostly medical analysis of the voice i.e.phoniatrics, but also speaker identification.

More controversially, some believe that the truthfulness oremotional state of speakers can be determined using Voice StressAnalysis or Layered Voice Analysis.


12/20

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be

implemented in software or hardware. A text-to-speech (TTS) system converts normal language text into speech; other systems

render symbolic linguistic representations like phonetic transcriptions into speech. Synthesized speech can be created by concatenating pieces of recorded speech that are

stored in a database. Systems differ in the size of the stored speech units; a system that

stores phones or diaphones provides the largest output range, but may lack clarity. The quality of a speech synthesizer is judged by its similarity to the human voice and byits ability to be understood. An intelligible text-to-speech program allows people withvisual impairments or reading disabilities to listen to written works on a homecomputer.


13/20

Speech enhancement aims to improve speech quality byusing various algorithms.

The objective of enhancement is improvement inintelligibility and/or overall perceptual quality of degradedspeech signal using audio signal processing techniques.

Enhancing of speech degraded by noise, or noise reduction,is the most important field of speech enhancement, and usedfor many applications such as mobile phones, VoIP,teleconferencing systems , speech recognition, and hearingaids.


14/20

As basic parameters in speech processing we

regard Pitch Duration Intensity voice quality signal to noise ratio voice activity detection strength of Lombard effect.


15/20

In the area of speech recognition, speech synthesis and speaker

characterization basic parameters are needed which are crucial for goodperformance of the systems.

There are two sets parameters. The first is related to prosody Pitch Duration Intensity

The second characterizes the acoustic properties of the environmentincluding the impact on the speakers voice. voice quality signal to noise ratio voice activity detection strength of Lombard effect

Taking in account also adverse conditions the performance of manypublished algorithms to extract those parameters from the speech signalautomatically is not known. A framework based on competitiveevaluation is proposed to push algorithmic research and to makeprogress comparable.


16/20

personal voice qualities differ in the speakers use of temporal structures,articulation precision, vocal effort and type of phonation.

Whereas temporal structures can be measured directly in the acousticsignal and conclusions about articulation precision can be made from theformant structure

These voice quality percepts are a combination of several acoustic voicequality parameters.

In an investigation on emotionally loaded speech material it could beshown, that the named acoustic parameters are useful for differentiatingbetween the emotions happiness, sadness, anger, fear and boredom.


17/20

The signal-to-noise ratio (SNR) is an important feature in determining the qualityof audio data.

This is particularly important in speech recognition technology since it is wellknown that recognition performance is strongly influenced by the SNR.

In most applications the SNR cannot be easily derived since the noise energy is notknown.

Further, the question arises as to what is "signal" and what is "noise". For example, would a cough or breath noise be considered part of the "signal" in

spontaneous speech? Does it convey information?


18/20

Voice activity detection (VAD), also known as speech activitydetection or speech detection

A technique used in speech processing in which the presence or absenceof human speech is detected.

The main uses of VAD are in speech coding and speech recognition. It can facilitate speech processing, and can also be used to deactivate

some processes during non-speech section of an audio session. It can avoid unnecessary coding/transmission of silence packets in Voice

over Internet Protocol applications, saving on computation andon network bandwidth.


19/20

The Lombard effect or Lombard reflex is the involuntary tendency of speakers toincrease the intensity of their voice when speaking in loud noise to enhance itsaudibility.

This change includes not only loudness but also other acoustic features suchas pitch and rate and duration of sound syllables.

This compensation effect results in an increase in the auditory signal-to-noise ratioof the speaker's spoken words.

The effect links to the needs of effective communication as there is a reduced effectwhen words are repeated or lists are read where communication intelligibility isnot important.

Since the effect is also involuntary it is used as a means to detect malingering inthose simulating hearing loss.

The effect was discovered in 1909 by tienne Lombard, a French otolaryngologist.


20/20

Health care

Military

High-performance fighter aircraft

Helicopters Battle management

Training air traffic controllers

Telephony and other domains

Date post:	07-Apr-2018
Category:	Documents
Upload:	bhavik-patel
View:	220 times
Download:	0 times

Final Ppt on Speech Processing

Documents