+ All Categories
Home > Documents > Automatic Speech Recognition System_Updated_20.09.13

Automatic Speech Recognition System_Updated_20.09.13

Date post: 02-Jun-2018
Category:
Upload: jeysam
View: 240 times
Download: 2 times
Share this document with a friend

of 24

Transcript
  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    1/24

    Automatic SpeechRecognition System

    Research Supervisor

    Prof. Dr.K.Lakshmi

    Asst. Dean, SCSE

    PMU

    Research Scholar

    Ms.T.Kavitha

  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    2/24

  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    3/24

  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    4/24

    P(O|W)estimates the sequence of acoustic observations, conditioned

    on the word string.

    For Large Vocabulary SRS,

    Build statistical models for sub word speech units,

    Build up word models from these sub word speech unit models

    Postulate word sequences

    Evaluate the acoustic model probabilities via standard

    concatenation methods.

    Basic Model of Speech Recognition

    Contd

    ACOUSTIC MODEL

  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    5/24

    Basic Model of Speech Recognition

    P(W) describes the probability associated with a

    postulated sequence of words.

    Language models can incorporate both syntactic and

    semantic constraints of the Language and the recognition

    task.

    LANGUAGE MODEL

  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    6/24

    Language Modeling

    The probability of the sentence (string of words) P(W) is defined as:

    Usually P(W) is approximated by an N-gram language model as follows:

    n is the number of words in the sentence and

    N is the order of the N-gram model.

    To estimate the probability of the different N-grams in the language,

    a large text corpus is required to train the language model.

  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    7/24

    Left-to-right (Bakis)

    hidden Markov Model

    (HMM)

    Three states HMM

    to model one

    phoneme, where

    aijis the transition

    weight

    from state i to j

  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    8/24

    The conditional probability P(O|W)is computed by the acoustic

    model.

    It is typically a statistical model for all the phonemes in the language

    to be recognized.

    This type of acoustic models is known as phoneme-based acoustic

    models.

    Statistical modeling for the different phonemes is performed using

    Hidden Markov Model (HMM)

  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    9/24

    Isolated Words

    Connected Words

    Continuous Speech

    Spontaneous Speech

  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    10/24

    Acoustic Phonetic Approach

    Pattern Recognition Approach

    o Template Based Approach

    o Stochastic Approach

    o Dynamic Time Warping (DTW)

    o Vector Quantization (VQ)

    Artificial Neural Networks

    Artificial Intelligence Approach (Knowledge Based approach)

    Support Vector Machine

  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    11/24

    Acoustic Phonetic ApproachFirst step : Spectral analysis of the speech combined with a feature

    detection that converts the spectral measurements to a set of features

    that describe the broad acoustic properties of the different phonetic

    units.

    Second step: Segmentation and labeling phaseresulting in a phonemelattice characterization of the speech.

    Third step : Determine a valid word (or string of words) from the

    phonetic label sequences produced by the segmentation to labeling.

    In the validation process, linguistic constraints on the task are invoked

    in order to access the lexicon for word decoding based on the phoneme

    lattice.

    h

  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    12/24

    Pattern Recognition Approach

    Two essential steps

    Pattern training and

    Pattern comparison

    well formulated mathematical framework and

    establishes consistent speech pattern representations

    Pattern Representation : template or statistical model

    Pattern Comparison :unknown with known patterns

    There exists 2 Methods

    Template based

    Stochastic based

    i i h

  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    13/24

    Pattern Recognition Approach

    T t B d A

  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    14/24

    Temp ate Based Approac

    A collection of prototypical speech patternsare stored as reference

    patterns representing the dictionary of candidate words.

    Recognitionis then carried out by matching an unknown spoken

    utterance with each of these reference templates and

    Selectingthe category of the best matching pattern.

    Usually templates for entire words are constructed

    Each word must have its own full reference template.

    Advantage

    Disadvantage

    2 Key Ideas

    St ti A

  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    15/24

    Stoc astic Approac

    Using Probabilistic models to deal with uncertain or incomplete

    information

    Stochastic models are particularly suitable approach to SR.

    Most popular stochastic approach today is HMM.

    A hidden Markov model is

    characterized by a finite state markov model and

    a set of output distributions.

    The transition parameters in the Markov chain models, temporal

    variabilities, while the parameters in the output distribution

    model,spectral variabilities.

    D i Ti W i (DTW)

  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    16/24

    Dynamic Time Warping(DTW)

    An algorithm for measuring similarity between two sequences which

    may vary in time or speed.

    In general, DTW is a method that allows a computer to find an

    optimal match between 2 given sequences with certain restrictions.

    The sequences are "warped" non-linearly in the time dimension to

    determine a measure of their similarity independent of certain non-

    linear variations in the time dimension. This sequence alignment

    method is often used in the context of HMM. Contd

    D i Ti W i (DTW)

  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    17/24

    Dynamic Time Warping(DTW)

    Continuity is less important in DTW than in other pattern

    matching algorithms;

    DTW is an algorithm particularly suited to matching

    sequences with

    missing information, provided there are long enough

    segments for

    matching to occur.

    The optimization process is performed using dynamic

    programming, hence the name.

    A tifi i l I t lli A h ( l d b d h)

  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    18/24

    Artificial Intelligence Approach (Knowledge based approach)

    Knowledge based approach uses the information regarding linguistic,

    phonetic and spectrogram.

    Template based approaches have been very effective in the design of a

    variety of speech recognition systems; they provided little insight

    about human speech processing, thereby making error analysis

    and knowledge-based system enhancement difficult.

    Drawbacks:

    Limited success due to the difficulty in quantifying expert

    knowledge. The integration of many levels of human knowledge

    phonetics, lexical access, syntax, semantics

    Combining independent and asynchronous knowledge sources

    optimally remains an unsolved problem

    i i h ( ifi i l l k )

  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    19/24

    Connectionist Approaches(Artificial Neural Networks)

    The way a person applies intelligence in visualizing, analyzing, and

    characterizing speech based on a set of measured acoustic features.

    Uses an expert system (e.g., a neural network) that integrates

    phonemic, lexical, syntactic, semantic, and even pragmatic knowledge

    for segmentation and labeling, and uses tools such as ANN for

    learning the relationships among phonetic events.

    Its focus is mostly in the representation of knowledge and

    integration of knowledge sources.

    P i ti M d li

  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    20/24

    Pronunciation ModelingPronunciation Model consists of

    1. a definition of the elemental sounds in a language

    2. a dictionary3. Post-Lexical rules

    Pronunciation Dictionarya link between the Acoustic &Language Models

  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    21/24

  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    22/24

  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    23/24

  • 8/10/2019 Automatic Speech Recognition System_Updated_20.09.13

    24/24


Recommended