+ All Categories
Home > Documents > Final Ppt on Speech Processing

Final Ppt on Speech Processing

Date post: 07-Apr-2018
Category:
Upload: bhavik-patel
View: 220 times
Download: 0 times
Share this document with a friend

of 20

Transcript
  • 8/4/2019 Final Ppt on Speech Processing

    1/20

  • 8/4/2019 Final Ppt on Speech Processing

    2/20

    It is a message information which is converted into aset of neural signals which control articulatorymechanism generating an accoustic waveformcontaining information in original message

    MessageInformation

    Neural signal ArticulatorymechanismAccousticwaveform

    Speech

  • 8/4/2019 Final Ppt on Speech Processing

    3/20

    Concatenation of elements from finite set of phonemes Each language having distict set of phonemes Typically in the range of 30 50 English has around 42 phonemes

    Six bit numerical code sufficient for numbering Average of 10 phonemes per second Total make up of 60 bits per second- average information rate Concerns

    Representation of message content Representation in a form convenient for transmission/storage

  • 8/4/2019 Final Ppt on Speech Processing

    4/20

    study of speech signals and the processingmethods of these signals

    usually processed in a digital representation, So regarded as a special case of digital signal

    processing

    Information

    source

    Measurement of

    observationSignal representation Signal transformation

    Signal processing

    Extraction &utilization ofinformation

  • 8/4/2019 Final Ppt on Speech Processing

    5/20

    SPEECH RECOGNITION SPEAKER RECOGNITION SPEECH CODING VOICE ANALYSIS SPEECH SYNTHESIS SPEECH ENHANCEMENT

  • 8/4/2019 Final Ppt on Speech Processing

    6/20

    AUTOMATIC SPEECH RECOGNITION CONVERTS SPOKEN WORDS TO TEXT

  • 8/4/2019 Final Ppt on Speech Processing

    7/20

    TEXT DEPENDENT TEXT INDEPENDENT

  • 8/4/2019 Final Ppt on Speech Processing

    8/20

    In a system using text dependent speech, theindividual presents either a fixed (password) orprompted (Please say the numbers 33-54-

    63) phrase that is programmed into thesystem and can improve performanceespecially with cooperative users.

  • 8/4/2019 Final Ppt on Speech Processing

    9/20

    A text independent system has no advanceknowledge of the presenter's phrasing and ismuch more flexible in situations where the

    individual submitting the sample may beunaware of the collection or unwilling tocooperate, which presents a more difficultchallenge.

  • 8/4/2019 Final Ppt on Speech Processing

    10/20

    Speech codingis the application of data compression ofdigital audio signals containing speech.

    Speech coding uses speech-specific parameter estimationusing audio signal processing techniques to model thespeech signal, combined with generic data compression

    algorithms to represent the resulting modeled parameters ina compact bit stream. The two most important applications of speech coding are

    mobile telephony and Voice over IP.

  • 8/4/2019 Final Ppt on Speech Processing

    11/20

    Voice analysis is the study of speech sounds for purposes otherthan linguistic content, such as in speech recognition.

    Such studies include mostly medical analysis of the voice i.e.phoniatrics, but also speaker identification.

    More controversially, some believe that the truthfulness oremotional state of speakers can be determined using Voice StressAnalysis or Layered Voice Analysis.

  • 8/4/2019 Final Ppt on Speech Processing

    12/20

    Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be

    implemented in software or hardware. A text-to-speech (TTS) system converts normal language text into speech; other systems

    render symbolic linguistic representations like phonetic transcriptions into speech. Synthesized speech can be created by concatenating pieces of recorded speech that are

    stored in a database. Systems differ in the size of the stored speech units; a system that

    stores phones or diaphones provides the largest output range, but may lack clarity. The quality of a speech synthesizer is judged by its similarity to the human voice and byits ability to be understood. An intelligible text-to-speech program allows people withvisual impairments or reading disabilities to listen to written works on a homecomputer.

  • 8/4/2019 Final Ppt on Speech Processing

    13/20

    Speech enhancement aims to improve speech quality byusing various algorithms.

    The objective of enhancement is improvement inintelligibility and/or overall perceptual quality of degradedspeech signal using audio signal processing techniques.

    Enhancing of speech degraded by noise, or noise reduction,is the most important field of speech enhancement, and usedfor many applications such as mobile phones, VoIP,teleconferencing systems , speech recognition, and hearingaids.

  • 8/4/2019 Final Ppt on Speech Processing

    14/20

    As basic parameters in speech processing we

    regard Pitch Duration Intensity voice quality signal to noise ratio voice activity detection strength of Lombard effect.

  • 8/4/2019 Final Ppt on Speech Processing

    15/20

    In the area of speech recognition, speech synthesis and speaker

    characterization basic parameters are needed which are crucial for goodperformance of the systems.

    There are two sets parameters. The first is related to prosody Pitch Duration Intensity

    The second characterizes the acoustic properties of the environmentincluding the impact on the speakers voice. voice quality signal to noise ratio voice activity detection strength of Lombard effect

    Taking in account also adverse conditions the performance of manypublished algorithms to extract those parameters from the speech signalautomatically is not known. A framework based on competitiveevaluation is proposed to push algorithmic research and to makeprogress comparable.

  • 8/4/2019 Final Ppt on Speech Processing

    16/20

    personal voice qualities differ in the speakers use of temporal structures,articulation precision, vocal effort and type of phonation.

    Whereas temporal structures can be measured directly in the acousticsignal and conclusions about articulation precision can be made from theformant structure

    These voice quality percepts are a combination of several acoustic voicequality parameters.

    In an investigation on emotionally loaded speech material it could beshown, that the named acoustic parameters are useful for differentiatingbetween the emotions happiness, sadness, anger, fear and boredom.

  • 8/4/2019 Final Ppt on Speech Processing

    17/20

    The signal-to-noise ratio (SNR) is an important feature in determining the qualityof audio data.

    This is particularly important in speech recognition technology since it is wellknown that recognition performance is strongly influenced by the SNR.

    In most applications the SNR cannot be easily derived since the noise energy is notknown.

    Further, the question arises as to what is "signal" and what is "noise". For example, would a cough or breath noise be considered part of the "signal" in

    spontaneous speech? Does it convey information?

  • 8/4/2019 Final Ppt on Speech Processing

    18/20

    Voice activity detection (VAD), also known as speech activitydetection or speech detection

    A technique used in speech processing in which the presence or absenceof human speech is detected.

    The main uses of VAD are in speech coding and speech recognition. It can facilitate speech processing, and can also be used to deactivate

    some processes during non-speech section of an audio session. It can avoid unnecessary coding/transmission of silence packets in Voice

    over Internet Protocol applications, saving on computation andon network bandwidth.

  • 8/4/2019 Final Ppt on Speech Processing

    19/20

    The Lombard effect or Lombard reflex is the involuntary tendency of speakers toincrease the intensity of their voice when speaking in loud noise to enhance itsaudibility.

    This change includes not only loudness but also other acoustic features suchas pitch and rate and duration of sound syllables.

    This compensation effect results in an increase in the auditory signal-to-noise ratioof the speaker's spoken words.

    The effect links to the needs of effective communication as there is a reduced effectwhen words are repeated or lists are read where communication intelligibility isnot important.

    Since the effect is also involuntary it is used as a means to detect malingering inthose simulating hearing loss.

    The effect was discovered in 1909 by tienne Lombard, a French otolaryngologist.

  • 8/4/2019 Final Ppt on Speech Processing

    20/20

    Health care

    Military

    High-performance fighter aircraft

    Helicopters Battle management

    Training air traffic controllers

    Telephony and other domains


Recommended