Automatic Speech Recognition System_Updated_20.09.13

8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

1/24

Automatic SpeechRecognition System

Research Supervisor

Prof. Dr.K.Lakshmi

Asst. Dean, SCSE

PMU

Research Scholar

Ms.T.Kavitha


2/24


3/24


4/24

P(O|W)estimates the sequence of acoustic observations, conditioned

on the word string.

For Large Vocabulary SRS,

Build statistical models for sub word speech units,

Build up word models from these sub word speech unit models

Postulate word sequences

Evaluate the acoustic model probabilities via standard

concatenation methods.

Basic Model of Speech Recognition

Contd

ACOUSTIC MODEL


5/24

Basic Model of Speech Recognition

P(W) describes the probability associated with a

postulated sequence of words.

Language models can incorporate both syntactic and

semantic constraints of the Language and the recognition

task.

LANGUAGE MODEL


6/24

Language Modeling

The probability of the sentence (string of words) P(W) is defined as:

Usually P(W) is approximated by an N-gram language model as follows:

n is the number of words in the sentence and

N is the order of the N-gram model.

To estimate the probability of the different N-grams in the language,

a large text corpus is required to train the language model.


7/24

Left-to-right (Bakis)

hidden Markov Model

(HMM)

Three states HMM

to model one

phoneme, where

aijis the transition

weight

from state i to j


8/24

The conditional probability P(O|W)is computed by the acoustic

model.

It is typically a statistical model for all the phonemes in the language

to be recognized.

This type of acoustic models is known as phoneme-based acoustic

models.

Statistical modeling for the different phonemes is performed using

Hidden Markov Model (HMM)


9/24

Isolated Words

Connected Words

Continuous Speech

Spontaneous Speech


10/24

Acoustic Phonetic Approach

Pattern Recognition Approach

o Template Based Approach

o Stochastic Approach

o Dynamic Time Warping (DTW)

o Vector Quantization (VQ)

Artificial Neural Networks

Artificial Intelligence Approach (Knowledge Based approach)

Support Vector Machine


11/24

Acoustic Phonetic ApproachFirst step : Spectral analysis of the speech combined with a feature

detection that converts the spectral measurements to a set of features

that describe the broad acoustic properties of the different phonetic

units.

Second step: Segmentation and labeling phaseresulting in a phonemelattice characterization of the speech.

Third step : Determine a valid word (or string of words) from the

phonetic label sequences produced by the segmentation to labeling.

In the validation process, linguistic constraints on the task are invoked

in order to access the lexicon for word decoding based on the phoneme

lattice.

h


12/24


Two essential steps

Pattern training and

Pattern comparison

well formulated mathematical framework and

establishes consistent speech pattern representations

Pattern Representation : template or statistical model

Pattern Comparison :unknown with known patterns

There exists 2 Methods

Template based

Stochastic based

i i h


13/24


T t B d A


14/24

Temp ate Based Approac

A collection of prototypical speech patternsare stored as reference

patterns representing the dictionary of candidate words.

Recognitionis then carried out by matching an unknown spoken

utterance with each of these reference templates and

Selectingthe category of the best matching pattern.

Usually templates for entire words are constructed

Each word must have its own full reference template.

Advantage

Disadvantage

2 Key Ideas

St ti A


15/24

Stoc astic Approac

Using Probabilistic models to deal with uncertain or incomplete

information

Stochastic models are particularly suitable approach to SR.

Most popular stochastic approach today is HMM.

A hidden Markov model is

characterized by a finite state markov model and

a set of output distributions.

The transition parameters in the Markov chain models, temporal

variabilities, while the parameters in the output distribution

model,spectral variabilities.

D i Ti W i (DTW)


16/24

Dynamic Time Warping(DTW)

An algorithm for measuring similarity between two sequences which

may vary in time or speed.

In general, DTW is a method that allows a computer to find an

optimal match between 2 given sequences with certain restrictions.

The sequences are "warped" non-linearly in the time dimension to

determine a measure of their similarity independent of certain non-

linear variations in the time dimension. This sequence alignment

method is often used in the context of HMM. Contd

D i Ti W i (DTW)


17/24

Dynamic Time Warping(DTW)

Continuity is less important in DTW than in other pattern

matching algorithms;

DTW is an algorithm particularly suited to matching

sequences with

missing information, provided there are long enough

segments for

matching to occur.

The optimization process is performed using dynamic

programming, hence the name.

A tifi i l I t lli A h ( l d b d h)


18/24

Artificial Intelligence Approach (Knowledge based approach)

Knowledge based approach uses the information regarding linguistic,

phonetic and spectrogram.

Template based approaches have been very effective in the design of a

variety of speech recognition systems; they provided little insight

about human speech processing, thereby making error analysis

and knowledge-based system enhancement difficult.

Drawbacks:

Limited success due to the difficulty in quantifying expert

knowledge. The integration of many levels of human knowledge

phonetics, lexical access, syntax, semantics

Combining independent and asynchronous knowledge sources

optimally remains an unsolved problem

i i h ( ifi i l l k )


19/24

Connectionist Approaches(Artificial Neural Networks)

The way a person applies intelligence in visualizing, analyzing, and

characterizing speech based on a set of measured acoustic features.

Uses an expert system (e.g., a neural network) that integrates

phonemic, lexical, syntactic, semantic, and even pragmatic knowledge

for segmentation and labeling, and uses tools such as ANN for

learning the relationships among phonetic events.

Its focus is mostly in the representation of knowledge and

integration of knowledge sources.

P i ti M d li


20/24

Pronunciation ModelingPronunciation Model consists of

1. a definition of the elemental sounds in a language

2. a dictionary3. Post-Lexical rules

Pronunciation Dictionarya link between the Acoustic &Language Models


21/24


22/24


23/24


24/24

Date post:	02-Jun-2018
Category:	Documents
Upload:	jeysam
View:	240 times
Download:	2 times

Automatic Speech Recognition System_Updated_20.09.13

Documents