+ All Categories
Home > Documents > 23Speech v104 Vg

23Speech v104 Vg

Date post: 14-Apr-2018
Category:
Upload: senthil-kumar
View: 230 times
Download: 0 times
Share this document with a friend

of 70

Transcript
  • 7/30/2019 23Speech v104 Vg

    1/70

    DSP C5000

    Chapter 23

    Mobile CommunicationSpeech Coders

    Copyright 2003 Texas Instruments. All rights reserved.

  • 7/30/2019 23Speech v104 Vg

    2/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 2

    Outline

    Speech Coding, CELP Coders

    Implementation using C54x

  • 7/30/2019 23Speech v104 Vg

    3/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 3

    OutlineSpeech Coding

    Generalities on speech and coding

    Linear Prediction based coders

    Short term and long term prediction

    Vector Quantization

    CELP coders

    Structure and calculations

    Standards

  • 7/30/2019 23Speech v104 Vg

    4/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 4

    Applications of Speech Coding

    Digital Transmissions

    On wired telephone:

    Multiplexing

    Integration of services

    On wireless channels:

    Spectral efficiency

    For better protection against errors

    Voice mail/messaging Storage: telephone answering machine

    Secure phone

  • 7/30/2019 23Speech v104 Vg

    5/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 5

    Characteristics of Coders

    Bit Rate D: 50 bps < D < 96 kbps

    Coding Delay ~ frame delay

    Quality

    Objective measurements: SNR, PSQM

    Subjective measurements: MOS(excellent,good,fair,poor,unacceptable)

    Intelligibility:

    Objective measure STI or subjective DRT

    Acceptability: E model of ETSI standard,communicability

    Immunity to noise

    Complexity

  • 7/30/2019 23Speech v104 Vg

    6/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 6

    Objective Evaluation of the Quality

    The PSQM method:

    Objective evaluation

    Based on a model of auditive perception

    Takes into account the masking effects

    Good correlation with the MOS grade in basic conditions:

    Low bit rate speech coding, tandem, transmissionerrors, ...

    But sometimes not very reliable :

    Loss of frames, effect of the automatic control

    Still under development (PSQM+)

  • 7/30/2019 23Speech v104 Vg

    7/70Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 7

    Subjective Evaluation of Qualityusing the ACR Method yielding MOS score

    A great number of auditors give grades to agreat number of speech sequences.

    Database with phonetically balanced sentences

    Presentation in random order

    Naive auditors

    Statistical processing of results gives the MOS.

    MOS = Mean OpinionScore

    Bad

    Mediocre

    Average

    Good

    Excellent

    1

    2

    3

    4

    5

  • 7/30/2019 23Speech v104 Vg

    8/70Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 8

    Speech Production

    Diaphragm

    Lungs Larynx

    Vocal chords

    Nasal Cavity

    Mouth

    Palate

    Tongue

    Lips

    Jaws

    Vocal tract

    Wind Pipe

    Art icu lators

    Wind Tunnel

    Osci l lator

  • 7/30/2019 23Speech v104 Vg

    9/70Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 9

    Speech Signal

  • 7/30/2019 23Speech v104 Vg

    10/70Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 10

    Speech Spectrum for a Voiced Sound

  • 7/30/2019 23Speech v104 Vg

    11/70Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 11

    Speech Spectrogram

    Non stationary

    Voiced / unvoiced

  • 7/30/2019 23Speech v104 Vg

    12/70Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 12

    Calculation of Spectrograms

    Preac = Preaccentuation, enhances high freqeuncies

    Window = limits the edge effects

    Preac Window FFT Log(| |)

    Powerspectral

    density

    Speech

    signal

    Time

    Example of window

  • 7/30/2019 23Speech v104 Vg

    13/70Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 13

    Example: Time Signal and Spectrogram

    1 2 3 4 5 6 7 80

    500

    1000

    1500

    2000

    2500

    3000

    3500

    0 1 2 3 4 5 6 7 8 9-3

    -2

    -1

    0

    1

    2

    3x 10

    Time

    Frequency

    Time

    SPECTROGRAM

    TIME SIGNAL

  • 7/30/2019 23Speech v104 Vg

    14/70Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 14

    Equivalent Electrical Model

    NV

    V

    Stochastic

    excitation

    Periodical

    excitation

    Speech

    Spectrum

    shaping filter

    f

    Transfer functionof the vocal tract

    Voicing

    Gain

  • 7/30/2019 23Speech v104 Vg

    15/70Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 15

    Simplified Speech Production Model

    y(t)=h(t)*e(t) - Y(z)= H(z)E(z)

    V z e z e zc jb T

    k

    Kc jb T k k k k ( ) / ( )( )( ) ( )

    1 1 11

    1

    1

    L z z( )

    1

    1

    G z e z cT( ) / ( ) 1 1 1 2

    G(z) V(z) L(z)

    glott isradiation

    at the l ips

    vocal

    tract yy((tt))ee((tt))

  • 7/30/2019 23Speech v104 Vg

    16/70Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 16

    All Pole Model of theSpectrum Shaping Filter

    The filter H(z) represents the spectralenvelope since the excitation has a whitespectrum.

    1

    1 1( ) ( ) ( ) ( ) .

    ( )1

    pi

    i

    i

    H z G z V z L z A z

    a z

  • 7/30/2019 23Speech v104 Vg

    17/70Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 17

    Short Term Linear Prediction

    The coefficients of H(z)=1/A(z) can be

    obtained by linear prediction. Short term analysis on x(n)speech signal

    Frames of 10 to 30 ms.

    Least square error criterion:

    1

    2

    ( ) ( ) : linear prediction.

    ( ) ( ) ( ). criterion: min ( ).

    p

    i

    i

    n

    x n a x n i

    e n x n x n e n

  • 7/30/2019 23Speech v104 Vg

    18/70Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 18

    Determination of the Spectral Envelope byLinear Prediction

    Prediction error e(n) = residual is nearlywhite,

    so the spectral envelope of x(n) can beapproximated by Sx(f):

    2

    2

    2

    ( ) ,( )

    is the least square prediction error.

    xS fA f

  • 7/30/2019 23Speech v104 Vg

    19/70Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 19

    Calculation of the Prediction Coeffcients

    The prediction coefficients aiare the

    solution of the normal equations:

    1,1 1, 1 0,1

    ,1 , 0,

    ,

    .

    ( ) ( ).

    p

    p p p p p

    i j

    r r a r

    r r a r

    r x n i x n j

    The Levinson Durbin algorithm is often

    used to solve these equations

  • 7/30/2019 23Speech v104 Vg

    20/70Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 20

    Example of Linear Prediction

    Amplitude ofthe speechsignal

    Amplitude ofresidual signal

    E l f Li P di ti S t l

  • 7/30/2019 23Speech v104 Vg

    21/70Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 21

    Example of Linear Prediction: SpectralEnvelope Estimation

    Formants

  • 7/30/2019 23Speech v104 Vg

    22/70Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 22

    Estimation of the Pitch Period

    Pitch Period T0 estimated by correlation

    of the speech signal or residual. Other methods exist (e.g. cepstrum)

    F0 = fundamental frequency = 1/T0

    Fractional pitch estimation if the precisionis better than the sampling period.

    0

    0

    50 400

    120 160 f 8000 .S

    S

    Hz F Hz

    T kT k i Fs HzT

  • 7/30/2019 23Speech v104 Vg

    23/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 23

    Long Term Prediction (LTP)

    The idea is to predict one period of

    signal from the preceding one:

    ( ) ( ).x n b x n M

    2 unknowns: b and M.

    M is the pitch period (when voiced).

    Least square error criterion is used.

  • 7/30/2019 23Speech v104 Vg

    24/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 24

    Long Term Prediction (LTP)

    2

    n n M n M

    n n

    b x x x

    2

    2

    n Mn n M

    n n

    x x x

    For a given value of M, optimal b is:

    The best M value maximizes:

    All possible values of M must be tested.

  • 7/30/2019 23Speech v104 Vg

    25/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 25

    Example of Long Term Prediction

  • 7/30/2019 23Speech v104 Vg

    26/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 26

    LPC 10 Vocoder

    One of the oldest speech coder is the

    LPC10 vocoder:

    The analysis (coder) calculates eachframe:

    Pitch period, predictioncoefficients, energy, voicing.

    The synthesis (decoder) uses theseparameters to synthesize speech fromthe electrical equivalent model.

  • 7/30/2019 23Speech v104 Vg

    27/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 27

    LPC 10 Vocoder (Order 10)

    Speech

    frameLinear

    Predict ion

    pitch/voicing

    energy

    mux

    (ai)

    E

    excitat ion

    (600 bps )

    Spectrum

    sha ping fi lter(1800 bps )

    V/UV,F

    CODER

    V/UV,F0

    E(ai)

    1/A(z)Gain

    Synthet icspeech

    Decoder

    Frame= 22,5 ms

    i i S

  • 7/30/2019 23Speech v104 Vg

    28/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 28

    Prediction Spectral Parameters The aicoefficients are sensitive to coding and

    interpolation.

    They are replaced by other coefficients:

    Reflexion coefficients ki, log area ratio LARi.

    Line spectrum frequencies LSFi.

    In the LPC10 vocoder

    The pitch and voicing are coded on 7 bits

    The log of energy on 5 bits

    The 10 prediction coefficients ai (transformed inki and LARi) are coded on 41 bits.

    A total of 53 bits per frame of 22,5ms = 2400bps

    Vector Quantization

  • 7/30/2019 23Speech v104 Vg

    29/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 29

    Vector Quantization(2-dimensional example )

    k1

    k2

    -1

    -1

    1

    1

    k1

    k2

    -1

    -1

    1

    1

    **

    *

    * *

    sca lar q u a n t i za t i o n

    . .....

    .. ...

    . .....

    .. .... ....

    .

    ...k1

    k2

    -1

    -1

    1

    1..

    ...

    . ..

    . .

    i n pu t v ect o r s (t r a i n i n g base)

    ...

    .

    . ..

    vect o r qua n t i za t i on

    t h e i n dex o f th e cod e

    vec t or i s t r an sm i t t ed

    Code

    Bit rate can be decreased by applying VQ to the coefficients.

    Li S F i LSF LSP

  • 7/30/2019 23Speech v104 Vg

    30/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 30

    Line Spectrum Frequencies LSF, LSP

    The Line Spectrum Frequencies fi and

    Line spectrum pairs cos(fi) have goodproperties for quantization andinterpolation.

    The LSF and LSP are derived from the

    inverse filter A(z). Build F1(z) and F2(z) symetrical and

    antisymmetrical polynomials by (for order10):

    11 1 11

    11 1 1

    2

    F ( ) ( ) ( ) /(1 ).

    F ( ) ( ) ( ) /(1 ).

    z A z z A z z

    z A z z A z z

    LSF d LSP

  • 7/30/2019 23Speech v104 Vg

    31/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 31

    LSF and LSP

    Roots of F1 and F2 on lie on the unit

    circle and are interleaved. 5 conjugate roots exp(ji), fi= i/(2).

    1 2

    11,3,...,9

    1 2

    2

    2,4,...,10

    F ( ) 1 2 .

    F ( ) 1 2 .

    cos cos 2 . = LSP, = LSP.

    ik

    i

    k

    i i i i i

    z q z z

    z q z z

    q f q f

    Coders using Short Term and Long Term

  • 7/30/2019 23Speech v104 Vg

    32/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 32

    Coders using Short Term and Long TermPrediction RELP MPE LPCELP

    x(n) e(n) r(n)

    Calculation

    ofri ,jand ai

    Calculation

    ofband M

    Coding of

    r nu nB(z)A(z)Speechsignal

    residualsignal

    Analysis

    Quantizationof coefficients

    b, M, ai

    Quantizedcoefficients

    Quantized

    residual

    u(n)

    1/A(z)1/B(z)

    Synthetic

    speech

    signal

    Synthesis

    Quantized

    coefficients

    Quantized

    residual

    RPE LTP GSM F ll R t C d

  • 7/30/2019 23Speech v104 Vg

    33/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 33

    RPE-LTP GSM Full Rate Coders

    GSM Full Rate Coder is called:

    RPE LTP= Regular Pulse Excited, LongTerm Prediction coder

    The signal u = the best down-sampledversion ( 4) of the residual signal r.

    In CELP coders, vector quantization isapplied on the signal.

    CELP = Code Excited Linear Prediction

    coder Each frame of residual signal is compared

    to sequences of signal stored in a codebook.The codebook sequences are white and the

    codebook is called stochastic codebook.

    CELP C d B i S h

  • 7/30/2019 23Speech v104 Vg

    34/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 34

    CELP Coder Basic Scheme

    Analysis by synthesis (closed loop) to

    find the best excitation sequence.

    NormalizedStochasticCodebook

    1/A(z)1/B(z)

    g-

    LScriterion

    q

    Original

    speech signal x

    Synthetic

    Speech signal

  • 7/30/2019 23Speech v104 Vg

    35/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 35

    Structure of CELP Coder: Perceptual Filter

    Perceptual filter: the reconstruction erroris spectrally weighted exploiting noisemasking properties of formants.

    W(z)=A(z/1)/A(z/ 2), 0 1, 2 1 A*(z)=A(z/) (poles towards zero)

    NormalizedStochasticCodebook

    1/A(z)1/B(z)

    gain-

    LScriterion

    Error q

    Original

    speech signal x

    Synthetic

    Speech signal

    W(z)

    CELP C d ith P t l Filt

  • 7/30/2019 23Speech v104 Vg

    36/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 36

    CELP Coder with Perceptual Filter

    Original Signal

    perceptualCriterion

    1/A(z)

    LPC Analysis

    Syntheticspeech

    Iteration on the whole codebook

    Gain g

    Waveformscodebook

    01

    2

    M-

    1/B(z)

    Pitch estimationana-synt

    Search for the bestcode and gain

    W(z) MC

    s

    s

    0 50010001500200025003000350001234567

    8910

    W(f)

    1/A(f)

    Basic CELP Structure: Perceptual Filter

  • 7/30/2019 23Speech v104 Vg

    37/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 37

    Basic CELP Structure: Perceptual FilterInserted in the 2 Branches

    Least

    Squarecj

    r

    Speech frame

    H(z)including

    W(z)

    W(z)

    e

    +-

    s Weighted speech frame

    Codebook

    1/B(z)

    e

    p

    H(z)=W(z)/A(z)

    CELP St t M f H( )

  • 7/30/2019 23Speech v104 Vg

    38/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 38

    CELP Structure: Memory of H(z)

    Memory of H(z) = Output for a zero input

    hi= impulse response of H(z)

    0

    0

    0 1 0known

    0

    0

    ( )

    ( ) ( ).

    ( ) ( ) ( )

    ( ) ( ) ( ).

    i n i

    in n

    i n i i n i i n i

    i i n i

    p n h e

    p n h e h e h e p n

    p n p n p np n s n p n

    CELP Coder: Memory of H(z)

  • 7/30/2019 23Speech v104 Vg

    39/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 39

    CELP Coder: Memory of H(z)

    LeastSquare

    + -

    cj

    r

    Speech frame

    H(z)including

    W(z)

    W(z)

    pe

    +-

    p0H(z) Memory

    s

    1/B(z)

    s

    Initial conditions

    equal to zero

    CELP: Adaptive Codebook

  • 7/30/2019 23Speech v104 Vg

    40/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 40

    CELP: Adaptive Codebook

    LTP can be realized by an adaptive codebook

    0 ( ) 2 1

    0 0

    ( ) .n n

    jn in n i i n i n i M n n

    i i

    p p he h gc be p p

    1

    0

    2 ( )

    0

    .

    .

    n

    n i n i M

    i

    nj

    n i n i

    i

    p b h e

    p g h c

    CELP with Stochastic Codebook

  • 7/30/2019 23Speech v104 Vg

    41/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 41

    CELP with Stochastic Codebook

    Least

    Square

    + -

    g1

    g2c1,i(0)

    c2,i(1)

    Speech frame

    pH(z)

    W(z)

    p e+-

    p0H filter memory

    s

    H(z)

    1

    2

    Adaptive

    codebook

    Stochastic

    codebook

    The adaptive codebook stores the past residual frames. It is calledadaptive because its content changes with time.

    CELP Decoder

  • 7/30/2019 23Speech v104 Vg

    42/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 42

    CELP Decoder

    g1

    g2

    c1,i(0)

    c2,i(1)

    Synthetic speech1/A(z)

    Adaptive

    codebook

    Stochastic

    codebook

    Decode received parameters:

    Index of stochastic codebook

    Gain of stochastic codebookIndex of adaptive codebook

    Gain of adaptive codebook

    Linear prediction filter coefficients

    CELP E ti

  • 7/30/2019 23Speech v104 Vg

    43/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 43

    CELP EquationsExample: Searching through Codebooks

    The main load is the filtering of all thecodebook vectors.

    c,i(j)H(z)

    Nul Initial

    conditions

    f,i(j)

    jth Filtered

    codebook

    jth Original

    codebook

    , ( ) , ( ) , ( )

    , ( ) , ( ) , ( ) , ( )

    0

    . ( ) ( )* ( ).

    ( ) ( ) ( ). .

    j j i j j i j j i j

    n

    j i j j i j j i j j i j

    k

    g f n h n c n

    f n h k c n k

    jp f

    f Hc

    Filtering Matrix H

  • 7/30/2019 23Speech v104 Vg

    44/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 44

    Filtering Matrix H

    0

    1 0

    2 1 0

    1 2 1 0

    0 0

    0

    0

    0

    N

    h

    h h

    H h h h

    h h h h

    H(n) is the impulse responsecorresponding to H(z).

    N = length of the codebook vectors.

    Finding the Best Excitation in the Coder:

  • 7/30/2019 23Speech v104 Vg

    45/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 45

    Finding the Best Excitation in the Coder:Equation of the Solution

    J least square criterion For a set of 2 vectors cj,i (j ), F is the 2

    column matrix of filtered vectors fj,i (j )

    2 2

    2 1

    Optimal gain:

    min max ( )

    T T

    T T T

    J p p p Fg

    F F g F p

    J p p F F F F p

    CELP Optimal Solution

  • 7/30/2019 23Speech v104 Vg

    46/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 46

    CELP Optimal Solution

    Optimal algorithm finds the best

    combination of code vectors maximizingthe norm and finds the optimal gains gj.

    But the number of combinations ofcodebook vectors is very high and thecomplexity is also great. Example:

    M=1024 for the stochastic codebook

    and M=256 for the adaptive codebook

    Leads to 262 144 solutions to test and

    1280 vectors to filter.

    Iterative Suboptimal Algorithm for 2

  • 7/30/2019 23Speech v104 Vg

    47/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 47

    Iterative Suboptimal Algorithm for 2Codebooks

    First step:

    Target vector = p

    Find the best vector in the adaptivecodebook and its gain.

    Calculate the new target vector p1:1

    1

    n np p p Second step:

    Target vector = p1

    Find the best vector in the stochasticcodebook and its gain.

    Iterative Algorithm

  • 7/30/2019 23Speech v104 Vg

    48/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 48

    Iterative Algorithm

    -

    H(z)

    W(z)

    p 2g 2

    ^

    ^

    p 0

    Speech

    c(n,j)

    ,g 1

    H(z) p 1

    g 1 ^

    e(n-M)

    LS

    M,g 2

    g 1

    g 2

    p

    p 1

    LS

    Operations of the Iterative Algorithm

  • 7/30/2019 23Speech v104 Vg

    49/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 49

    Operations of the Iterative Algorithm

    At step j, the optimal codebook vector

    has index i:2 2

    ,argmin =j j j j j k

    ki g p p p f

    , ,

    opt,

    , , , ,

    .

    T T

    j j k j j k

    k T T

    j k j k j k j k

    g T

    p f p Hc

    f f c H Hc

    22

    opt,

    , ,

    arg max with

    T

    j j k kkT Tk

    j k j k k k

    i g

    p Hc

    c H Hc

    Iterative Algorithm

  • 7/30/2019 23Speech v104 Vg

    50/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 50

    Iterative Algorithm

    Numerical Example :

    FS=8000Hz,

    M=256 size of the stochastic codebook

    Ma=128 size of the adaptive codebook

    Frame size NT=160, 20ms

    Frames split in 4 subframes of N=40 samples

    p=10 linear prediction order

    10 Mips to filter the stochastic codebook.

    Iterative Algorithm

  • 7/30/2019 23Speech v104 Vg

    51/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 51

    Iterative Algorithm

    The main processing load is the filtering

    of the codebooks vectors. Many algorithms have been proposed to

    decrease the computation load:

    Special structures of the codebook: VSELP: Vector Sum

    Algebraic codebook: ACELP

    Linear codebook (the adaptive codebook is

    linear). Structure of H avoiding the filtering:

    Diagonalization of HTH

    CELP Coding

  • 7/30/2019 23Speech v104 Vg

    52/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 52

    CELP Coding

    Standards from 4.8 kbps to 16 kbps

    Federal standard (DOD) (4.8 kbps) frame = 260 samples (30 ms)

    LPC 8 --> (LSP coding 34 bits)

    adaptive codebook (256 vectors (fractionalpitch))

    stochastic codebook (512 vectors (-1,0,1))

    VSELP (Vector Sum Excitation Coding)

  • 7/30/2019 23Speech v104 Vg

    53/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 53

    VSELP (Vector Sum Excitation Coding)

    Codebook vectors v are combinations of

    basis vectors (b1,b2,...,bk) v=+/- b1 +/- b2 +/- ... +/- bk

    Only the basis vectors are filtered

    Motorola ( 8 kbps)

    GSM (half rate)(5.6 kbps)

    Fractional Pitch

  • 7/30/2019 23Speech v104 Vg

    54/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 54

    Fractional Pitch

    The precision of the pitch period is a fraction ofsample TS. An interpolation filter is used.

    B(z)=1-bz-Mf with Mf=M+ x(n-M-) can be written as:

    TF-1(X(f)*e(-j2f(M+)Te)) = x(n-M)* TF-1(e(-j2fTe)) =x(n-M)* h(n)

    h(n)s inc((n-))

    +Fe/2-Fe /2f

    | H(f)|

    Interpolation filter H

    0 2 4 6 8 10 12 14-0.5

    0

    0.5

    1

    =1/4 TS

    Time in TS

    Standards

  • 7/30/2019 23Speech v104 Vg

    55/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 55

    Standards

    Standards

  • 7/30/2019 23Speech v104 Vg

    56/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 56

    Standards

    Wired Telephony UIT-T

    G 711 (1972) : PCM 64 kbps G 721 (1984) : ADPCM 32 kbps

    G 728 (1991) : LD_CELP 16 kbps

    G 729 : CS-ACELP 8 kbps

    Mobile Communications (ETSI - CTIA) GSM (FR ) :RPE_LTP 13 kbps

    GSM (HR) :VSELP 5.6 kbps

    GSM (EFR) :ACELP 12.2 kbps

    UMTS (AMR) :ACELP 12.2 to 4.75 Kbps Military applications (NATO)

    FS 1015 (1976) : LPC10 2.4 kbps

    FS 1016 (1991) : CELP 4.8 kbps

    AMR Adaptive MultiRate Coder

  • 7/30/2019 23Speech v104 Vg

    57/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 57

    pfor 3G Application

    8 Narrow Band NB AMR source coders 12.2 10.2 7.95 7.40 6.70 5.90 5.15 4.75 kbps

    9 Wide Band coders WB AMR coders

    Based on ACELP

    Frame of 20 ms, fs=8000 Hz

    IUT Civil Standards

  • 7/30/2019 23Speech v104 Vg

    58/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 58

    IUT Civil Standards

    UITStandard

    Method Year Bit rate inKbps

    Delay in ms QualityMOS

    Complexityin Mips

    G711 PCM 1972 64 0.125 4.3

  • 7/30/2019 23Speech v104 Vg

    59/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 59

    ETSI and Inmarsat Standards

    Inmarsat

    StandardMethod Year

    Bit rate in

    KbpsDelay in ms

    Quality

    MOS

    Complexity

    in Mips

    IMBE IMBE 1990 4.15 78.75 3.4 7

    Mini-M AMBE 1995 3.6 3.6 25

    Standard

    ETSIeurope

    Method Year

    Bit rate in

    Kbps Delay in ms

    Quality

    MOS

    Complexity

    in Mips

    GSM RPE-LTP 1987 13 40 3.47 6

    TETRA ACELP 1994 4.567 67.5 12

    GSM HR VSELP 1994 5.6 45 30

    EFR-GSM

    (identicalDCS)

    ACELP 1995 12.8 40 15

    TIA and RCR Standards

  • 7/30/2019 23Speech v104 Vg

    60/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 60

    TIA and RCR Standards

    TIA

    standard

    USA

    Method Year Bit rate in

    KbpsDelay in ms

    Quality

    MOS

    Complexity

    in Mips

    IS54 VSELP 1989 7.95 45 3.5 20

    IS96 QCELP 1993 8.5-4.2-0.8 45 4 20

    IS136 ACELP 1995 7.4 45

    IS127 EVRC RCELP 1995 8.5-4-0.8 >45 20

    PCS 1900 ACELP 1995 40 15

    TIA

    standard

    USA

    Method Year Bit rate in

    KbpsDelay in ms

    Quality

    MOS

    Complexity

    in Mips

    RCR STD-

    27BVSELP 1990 6.7 45 20

    JDC HR PSI-CELP 1993 3.45 90 50

    Implementation of CELP Coders on C54x

  • 7/30/2019 23Speech v104 Vg

    61/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 61

    Implementation of CELP Coders on C54x

    Example of the G729 Annex A.

    Specific instruction for codebook search

    Some functions of DSPLIB

    Profiling Example for G729 Annex A

  • 7/30/2019 23Speech v104 Vg

    62/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 62

    g pusing C Compiler

    G729 is a CS-ACELP Coder (ITU 1995) 8Kbps with quality of ADPCM at

    32Kbps G726.

    DSVD: G729 Annex A voice overinternet, voice e-mail

    Digital Simultaneous Voice & Data

    G729 Annex A

  • 7/30/2019 23Speech v104 Vg

    63/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 63

    Main Blocks of the Coder Algorithm

    Frame = 10 ms = 80 Samples.

    Short term LPC analysis on 40ms frame

    LSP derived from ai coefficients andquantized using Split VQ.

    Long Term LTP analysis, 2 subframesof 40 samples.

    LTP lag and gain. LTP fractional lag (1/3)

    8 bits 1rst

    subframe and 5 bits for the 2nd

    . Search fixed codebook: 2 subframes of

    40 samples. Index and gains

    Code length = 40 with 4 non-zero pulses 1.

    Structures of Frames

  • 7/30/2019 23Speech v104 Vg

    64/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 64

    Previous speech

    120

    Subframe 1

    40

    Subframe 2

    40

    Next frame

    40

    Present frame, 80 samples = L_Frame

    New speech, 80 samples = L_Frame

    Total speech vector, 240 samples = L_Total

    LPC analysis window, 240 samples = L_Window

    G729 Annex A, Bit Allocation

  • 7/30/2019 23Speech v104 Vg

    65/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 65

    ,

    Total inbits

    18

    18

    13

    1

    26

    8

    1480

    1rst subframe(5ms) in bits

    2nd subframe(5ms) in bits

    LPC filter

    pitch period

    parity check

    Fixed codebook

    Pulse positions

    Pulse signs

    Codebook gains

    8

    1

    7 = 3+4

    4

    Adaptive codebook:

    7 = 3+4

    13 = (3x3+4)

    5

    13 = (3x3+4)

    4

    G729 Annex A

  • 7/30/2019 23Speech v104 Vg

    66/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 66

    Main Blocks of the Decoder Algorithm

    The serial received bits are converted intoparameters:

    LSP vector, 2 fractional pitch lags and gains, 2fixed codebook index and gains.

    LSP are converted to LP filter coefficientsai and interpolated at each subframe.

    At each subframe:

    The excitation is constructed and scaled.

    The speech is synthesized by filtering theexcitation by the LP synthesis filter.

    Postprocessing by an adaptive postfilter.

    Using the C Compiler

  • 7/30/2019 23Speech v104 Vg

    67/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 67

    g p

    Use the C program of the standard and Ccompiler with maximum optimization.

    Autocorrelation = 488 445 cycles

    Levinson = 164 843 cycles

    Conversion ai LSF = 410 404 cycles

    LSF Quantization = 883 853 cycles

    Synthesis filtering = 501 472 cycles

    Pitch open loop = 793 533 cycles

    Fractional Pitch = 2 x 618 354 cycles Search Algebraic code = 2x 617 582 cycles

    Gains quantization = 2x 108 480 cycles

    Assembly Language Instructions for

  • 7/30/2019 23Speech v104 Vg

    68/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 68

    Codebook Search Better results can be obtained with

    assembly language than C. Specific instructions for codebook

    search: Conditional stores.

    CodeBook Search (Conditional Stores)STRCD Xmem, cond

    SRCCD Xmem, cond

    SACCD src, Xmem, cond

    Xmem = T if condition is true

    Xmem = BRC if condition is true

    Xmem = src if condition is true

    STRCD = Store T Conditionally

    SRCCD = Store Block Repeat Counter Conditionally

    SACCD = Store Accumulator Conditionally

    Assembly Language Codebook Search

  • 7/30/2019 23Speech v104 Vg

    69/70

    Copyright 2003 Texas Instruments. All rights reserved.ESIEE, Slide 69

    y g g

    2 2

    i

    Best code book index

    carg max noted find max of

    G

    opt

    iopt

    ii i

    j

    j

    222 2

    To avoid division:

    .opti

    i opt opt i

    i opt

    ccc G c G

    G G

    Assembly Language Codebook Search

  • 7/30/2019 23Speech v104 Vg

    70/70

    y g g

    .mmregs

    .text

    CBS: STM #C, AR5

    STM #G, AR2STM #G-opt, AR3

    STM #I-opt, AR4

    ST #0, *AR4

    ST #1, *AR3+

    ST #0, *AR3-STM #N-1, BRC

    RPTBD done

    SQUR *AR5+, A

    MPYA *AR3+

    MAS *AR2+, *AR3-, B

    SRCCD *AR4, BGEQSTRCD *AR3+, BGEQ

    SACCD A, *AR3-, BGEQ

    SQUR *AR5+, Adone: MPYA *AR3+

    R5 C(0)

    ...

    R3 Gopt=1

    Copt2=0

    R4 Iopt=0

    R2 G(0)

    ...

    A=C(i)2

    B=C(i)2Gopt T=Gopt

    B=C(i)2Gopt-G(i)Copt2

    If (B 0) then: BRC Gopt T Iopt A C t2


Recommended