+ All Categories
Home > Documents > Speech Coders for Wireless Communication

Speech Coders for Wireless Communication

Date post: 04-Jun-2018
Category:
Upload: vikas-ps
View: 219 times
Download: 0 times
Share this document with a friend

of 53

Transcript
  • 8/13/2019 Speech Coders for Wireless Communication

    1/53

    Speech Coders forWireless Communication

  • 8/13/2019 Speech Coders for Wireless Communication

    2/53

    2Courtesy: Communication Networks Research (CNR) Lab.EECS, KAIST

    Digital representation of the

    speech waveform

    Sampler Quantizerx(t) x(n) = x(nt) x(n)

    Continuous-time

    Continuous-amp.

    Discrete-time

    Continuous-amp.

    Discrete-time

    Discrete-amp.

  • 8/13/2019 Speech Coders for Wireless Communication

    3/53

    Three acoustic signalsFrequency

    range

    Sampling

    rate

    PCM

    bits per samplePCM bit rate

    Telephone

    speech3003,400Hz* 8kHz 8 64kb/s

    Wideband

    speech507,000Hz 16kHz 14 224kb/s

    Wideband audio 1020,000Hz 48kHz 16 768kb/s

    * Bandwidth in Europe : 2003200Hz in the United States and Japan

    Frequency response of Telephone transmission channel

    Courtesy: Communication Networks Research (CNR) Lab.EECS, KAIST

  • 8/13/2019 Speech Coders for Wireless Communication

    4/53

    Encoder

    Decompress

    Telephone Speech Music (CD Quality)

    0 Hz 4kHz 7kHz 20kHz

    Talk A/D Decoder ListenD/A

    Storage

    Compress

    (play)(record/store)

    Courtesy: Communication Networks Research (CNR) Lab.EECS, KAIST

  • 8/13/2019 Speech Coders for Wireless Communication

    5/53

  • 8/13/2019 Speech Coders for Wireless Communication

    6/53

    Hybrid coders

    Multi-Pulse Excitation

    Efficient at medium bit rates.

    A sequence of nonuniformly spaced pulses as an excitation signal

    Amplitudes and positions are excitation parameters

    Regular-Pulse Excitation (RPE) Efficient at medium bit rates.

    A sequence of uniformly spaced pulses as an excitation signal

    The position of first pulse within a vector and amplitudes are excitationparameters

    Code-Excited Linear Prediction (CELP)

    Efficient at low bit rates (below 8 kbps)

    A code book of excitation sequences

    Two key issues; the design and search of a codebook

    Courtesy: Communication Networks Research (CNR) Lab.EECS, KAIST

  • 8/13/2019 Speech Coders for Wireless Communication

    7/53

    7Communication Networks Research (CNR) Lab.EECS, KAIST

    0 5 10 15 20

    g1g3

    n1g2

    n2

    n3g4

    gk

    n4

    nk

    a)

    0 5 10 15 20

    g1

    g3

    K

    g2 g4g6

    b)

    g5

    c)

    Codevector# N

    Codevector# 3Codevector# 2

    Codevector # 1

    Codevector# N

    Codevector# 3Codevector# 2

    Codevector # 1

    Codebook

    2M= N

    (M = Transmission Bit)

    Examples of excitationsa) multipulse

    b) regular-pulsec) Code-excited

    Linear Prediction

  • 8/13/2019 Speech Coders for Wireless Communication

    8/53

    8Communication Networks Research (CNR) Lab.EECS, KAIST

    Speech Compression

    Standards

    64 kbps -law/A-law PCM(CCTT G.711)

    64 kbps 7kHz Subband/ADPCM(CCITT G.722)

    32 kbps ADPCM(CCITT G.721)

    16 kbps Low Delay CELP(CCITT G.728)

    13.2 kbps RPE-LTP(GSM 06.10)

    13 kbps ACELP(GSM 06.60)

    13 kbps QCELP(US CDMA Cellular)

    8 kbps QCELP(US CDMA Cellular)

    8 kbps VSELP(US TDMA Cellular)

    8 kbps CS-ACELP(ITU G.729)

    6.7 kbps VSELP(Japan Digital Cellular)

    6.4 kbps IMBE(Immarsat Voice Coding Standard)

    5.3 & 6.4 kbps True Speech Coder(ITU G.723)

    4.8 kbps CELP(Fed. Standard 1016-STU-3)

    2.4 kbps LPC(Fed. Standard 1015 LPC-10E)

  • 8/13/2019 Speech Coders for Wireless Communication

    9/53

    9Communication Networks Research (CNR) Lab.EECS, KAIST

    Performance of speech codec

    Speech Quality (SNR/SEGSNR, MOS, etc)

    Bit Rate (bits per second)

    Complexity (MIPS)

    Coding Delay (msec)

  • 8/13/2019 Speech Coders for Wireless Communication

    10/53

    10Communication Networks Research (CNR) Lab.EECS, KAIST

    Requirements of speech codec

    for digital cellular

    More channel capacity

    Noise immunity

    Encryption

    Reasonable complexity and encoding delay

  • 8/13/2019 Speech Coders for Wireless Communication

    11/53

    Vocoders

  • 8/13/2019 Speech Coders for Wireless Communication

    12/53

    Anatomy of Speech Organs:

    The source of most speech occurs in the larynx.

    It contains two folds of tissue called the vocal folds

    or vocal cords which can open and shut like a pair of

    fans.

    The gap between the vocal cords is called the glottis

    and as air is forced through the glottis the vocal cords

    will start to vibrate and modulate the air flow.

    This process is known as phonation.

    The frequency of vibration determines the pitch of

    the voice

    for a male is typically in the range 50-200Hz

    for a female the range can be up to 500Hz.

  • 8/13/2019 Speech Coders for Wireless Communication

    13/53

    Amplitude

    Time (ms)

    50

    Opening

    phaseClosing

    phase

    Closure

    Period = 12.5ms

    Fundamental frequency = 1/.0125 = 80Hz

    Rosenberg JASM 49, 1971

    Glottal Pulse

  • 8/13/2019 Speech Coders for Wireless Communication

    14/53

    Spectrum of glottal pulseIntensity

    Frequency (Hz)

    Harmonics of spectrum spaced at 80 Hz, corresponding to

    pitch period of 12.5ms.

  • 8/13/2019 Speech Coders for Wireless Communication

    15/53

    Spectrum of glottal pulse

    filtered by the vocal tract

    Intensity

    Frequency (Hz)

    Harmonics of spectrum spaced at 80 Hz, corresponding to

    pitch period of 12.5ms.

  • 8/13/2019 Speech Coders for Wireless Communication

    16/53

    /ee/ /ar/ /uu/

  • 8/13/2019 Speech Coders for Wireless Communication

    17/53

    Properties of Speech in Brief

    ee in keyo in spotoo in blue e in again

    Vowels

    s in spot k in key

    Consonants

    Quasi-periodic

    Relatively high signal power

    Non-periodic (random)

    Relatively low signal power

  • 8/13/2019 Speech Coders for Wireless Communication

    18/53

    Wrong /r/ /o/ /ng/

  • 8/13/2019 Speech Coders for Wireless Communication

    19/53

    Moving /m/ /uu/ /v/ /i/ /ng/

  • 8/13/2019 Speech Coders for Wireless Communication

    20/53

    Southampton /s/ /ou/ /th/ /aa/ /m/ /p/ /t/ /a/ /n/

  • 8/13/2019 Speech Coders for Wireless Communication

    21/53

    Digital speech model

    A basic digital model for speech production

    periodicsignal gen.

    random

    signal gen.

    linear time

    variant filterx

    Gain

  • 8/13/2019 Speech Coders for Wireless Communication

    22/53

    Vocoder

    Send three kinds of information to the

    receiver:

    (1) voiced or unvoiced signal,

    (2) if it is voiced, the period of the excitation

    signal,

    (3) the parameters of the prediction filter

  • 8/13/2019 Speech Coders for Wireless Communication

    23/53

    Vocoder

    voice

    classification

    pitch

    recognition

    determine

    filter coeff.

    digital filter

    excitation

    signal gen

    Encoder/Decoder

  • 8/13/2019 Speech Coders for Wireless Communication

    24/53

    LPC Introduction

    This speech coders are called Vocoders (voice

    coder).

    Basic Idea

    They usually provide more bandwidth compressionthan is possible with waveform coding (2400-

    9600bps).

    Estimate

    parameters

    Encode

    Parameters

    Decode

    Parameters

    Synthetise

    Speech

    Transmit

    Parameters

  • 8/13/2019 Speech Coders for Wireless Communication

    25/53

    Generalities

    LP Model

    Parameter Estimation

    Typical Memory requirements

  • 8/13/2019 Speech Coders for Wireless Communication

    26/53

    LP Model

    Impulse

    Generator

    Pitch Period

    WhiteNoise

    Generator

    All-pole

    filter

    Glottal filterVocal tract filter

    Lip Radiation filter

    Voice/Unvoice

    Switch

    Speech

    Signal

    Voice

    Unvoice Gain

  • 8/13/2019 Speech Coders for Wireless Communication

    27/53

    Parameter Estimation

    Therefore, for each frame:

    estimate LP coefficients (ais)

    estimate Gain

    estimate type of excitation (voice or unvoice).

    Estimate pitch.

  • 8/13/2019 Speech Coders for Wireless Communication

    28/53

    V/UV Estimation Several Methods

    Energy of Signal

    Zero Crossing Rate

    Autocorrelation Coefficient

    SU U

    V

  • 8/13/2019 Speech Coders for Wireless Communication

    29/53

    Speech Measurements (1)

    Zero Crossing Rate

    Log Energy Es

    Normalized Autocorrelation Coefficient

    N

    n

    s nSN

    E

    1

    2 ))(1

    log(10

    ))())(((

    )1()(

    1

    0

    2

    1

    2

    11

    N

    n

    N

    n

    N

    n

    nsns

    nsns

    C

  • 8/13/2019 Speech Coders for Wireless Communication

    30/53

    V

    U V

    V

    S

    S

    V U

    U

    U U

    Comparison between actual data and

    V/U/S determination results.

  • 8/13/2019 Speech Coders for Wireless Communication

    31/53

    Pitch Detection

    Voiced sounds

    Produced by forcing air through glottis

    Vocal cords oscillate and modulate air flow into quasi

    periodic pulses Pulses excite resonances in reminder of vocal tract

    Different soundsproduced as muscles work to change

    shape of vocal tract

    Resonant frequencies or formant frequencies

    Fundamental frequency or pitchrate of pulses

  • 8/13/2019 Speech Coders for Wireless Communication

    32/53

    Pitch Detection

    Short sections of

    Voiced speech

    Unvoiced speech

    0 100 200 300 400 500 600 700-400

    -200

    0

    200

    400

    sample number

    amplitude

    0 100 200 300 400 500 600 700-400

    -200

    0

    200

    400

    sample number

    amplitude

  • 8/13/2019 Speech Coders for Wireless Communication

    33/53

    Time-domain pitch estimation

    Well studied area

    Variations of

    fundamental frequency

    are evident

    Time-domain

    speech processing

    should be capable of detecting pitch frequency

    0 100 200 300 400 500 600 700-400

    -300

    -200

    -100

    0

    100

    200

    300

    400

    sample number

    amplitude

  • 8/13/2019 Speech Coders for Wireless Communication

    34/53

    Pitch Period Estimation Using the Auto-

    correlation Function

    Periodic signals have periodic auto-correlation function

    Basic problems in choosing window length:

    Speech changes over time (N low) but at least 2 periods of the waveform

    Approaches:

    Choose window to catch longest period

    Adaptive N

    Use modified short-time auto-correlation function

    kN

    m

    n mkwkmnxmwmnxkR1

    0

    '')]()()][()([)(

  • 8/13/2019 Speech Coders for Wireless Communication

    35/53

    Pitch Period Estimation Using the Auto-

    correlation Function (Contd)

    Auto-correlation representation - retains too

    muchof the information in the speech signal

    => auto-correlation function has many peaks

    0 100 200 300 400 500 600 7000

    2

    4

    6

    8

    10

    12

    14

  • 8/13/2019 Speech Coders for Wireless Communication

    36/53

    Spectrum flatteners

    techniques

    Remove the effects of the vocal tract transfer function

    Center clipping - nonlinear transformation, clipping value

    depends on maximum amplitude

    => Strong peak at thepitch frequency

    0 100 200 300 400 500 600 700-400

    -300

    -200

    -100

    0

    100

    200

    300

    400

    sample number

    amp

    litude

    0 100 200 300 400 500 600 7000

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    0.14

    0.16

    0 100 200 300 400 500 600 7000

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

  • 8/13/2019 Speech Coders for Wireless Communication

    37/53

    Fundamental Frequency

    F0 estimation: (Hess) determining the mainperiod in quasi-periodic waveform usually using autocorrelation function and the average

    magnitude difference function (AMDF)

    where L is the frame length Npis number of point pairs(peak in ACF and valley in AMDF indicates F0)

    mn

    tt

    p

    t Lmnmnsns

    N

    mAMDF,

    10|,)()(|1

    )(

  • 8/13/2019 Speech Coders for Wireless Communication

    38/53

    Typical Memory Requirements Pitch coefficient (6 bits).

    Gain (5 bits)

    Model parameters:

    LP coefficients (8-10 bits)

    Small changes in the LPC results in large changes

    in the pole positions.

    Reflection coefficients (6 bits)

    If |rk| near 1, then large distortion.

    Log-Area Ratio:

    Represent a non-linear transformation of the

    Reflection Coefficients to expand the scale near to

    |rk| near 1.

  • 8/13/2019 Speech Coders for Wireless Communication

    39/53

    The main difference of the LP vocoders is the

    calculation of the source of excitation.

    LPC 10

  • 8/13/2019 Speech Coders for Wireless Communication

    40/53

    LPC-10

    Impulse

    Generator

    Pitch

    Period

    (7 bits)

    White NoiseGenerator

    1/A(z)

    10 Reflection

    Coefficients.(5 bitsfor one

    and 4 bitsfor the others).

    Voice/Unvoice

    Switch(1 bit)Synthesized

    Speech Signal

    Gain

    (5 bits)

    SpeechSignal

    ADC

    (8kHz)

    Sample

    SpeechReflection

    Coefficients

    (4 bits)

    LP Analysis

    (Covariance

    Method)

    Non-linear

    warping

    Window

    (180 samples)

    LAR

    coefficients

    (4 bits and 5 bits)

    AMDF and

    Zero CrossingVoice/Unvoice

    Switch (1 bit)

    Pitch Frequency

    (7 bit)E

    n

    c

    o

    d

    e

    r

    D

    e

    c

    o

    d

    e

    r

    Channel

  • 8/13/2019 Speech Coders for Wireless Communication

    41/53

    RELP

    Simple vocoder offers poor sound quality and is usually

    unsatisfactory.

    An improvement is to use the prediction error rather than the

    periodical pulse (for voiced signal) or the random noise (for

    unvoiced signal) to excite the digital filter to reproduce the

    speech. The prediction error is also called the residual.

    This scheme is called Residual Excited Linear Prediction(RELP) coding.

  • 8/13/2019 Speech Coders for Wireless Communication

    42/53

    RELP

    determinefilter coeff.

    digital filter

    - quantization encoder

  • 8/13/2019 Speech Coders for Wireless Communication

    43/53

    RELP

    RELP follows essentially the same idea as DPCM.

    However, in RELP the speech signal is divided into

    blocks (20ms/block). The optimum linear predictor is designed for each block.

    For each block, the filter coefficients and the prediction

    error should be sent to the receiver.

    In DPCM, the predictor can be fixed or adaptive.

    Only the prediction error is sent to the receiver.

    M d li f th di ti

  • 8/13/2019 Speech Coders for Wireless Communication

    44/53

    Modeling of the prediction

    error

    In each block of speech signal (a frame), the

    prediction error may also be correlated.

    To decorrelate the prediction error, each frame isfurther divided into 4 sub-frames (5ms). The

    prediction error u(n) is then modelled as

    where M (40

  • 8/13/2019 Speech Coders for Wireless Communication

    45/53

    Long-term prediction

    The decorrelation of the prediction error is

    called long-term prediction.

    determine

    filter coeff.

    digital filter

    -long-term

    predictionencoder

    u(n) e(n)

    A(z)

    U(z)s(n)

  • 8/13/2019 Speech Coders for Wireless Communication

    46/53

    RPE-LTP

    The RPE-LTP has been adopted as the

    speech coding method in the GSM 06.10

    standard

    determine

    filter coeff.

    digital filter

    -long-term

    prediction

    Regular pulse

    selection and

    coding

  • 8/13/2019 Speech Coders for Wireless Communication

    47/53

    RPE-LTP

    Speech is sampled at 8 kHz, quantised to 8 bits/sample

    The speech signal is pre-processed to remove any DC

    component and to pre-emphasis the high-frequencies

    component, partly compensating for their low energy.

    The signal is then dived into frames (20ms, 160 samples). An

    eighth-order optimum linear predictor is designed using the

    Shur algorithm.

    The reflection coefficients (related to the filter coefficients) are

    nonlinearly mapped to another set of values called log-area

    ratio(LAR).

  • 8/13/2019 Speech Coders for Wireless Communication

    48/53

    RPE-LTP

    The 8 LAR parameters are quantized using 6,6,5,5,4,4,3,3 bits.

    So a total of 36 bits for the LAR (or for the filter coefficients).

    The frame is filtered using this filter and produces u(n).

    u(n) is then divided into 4 sub-frames (5ms each, 40 samples).

    Long-term prediction is performed for each sub-frame. The lag M is

    quantized to 7 bits and the gain his represented by 2 bits.

    Long-term prediction produces e(n).

  • 8/13/2019 Speech Coders for Wireless Communication

    49/53

    RPE-LTP

    e(n) is down-sampled by a factor 3. For each sub-frame,

    there are 4 down-sample patterns. Need 2 bits to specify the

    pattern used.

    The down-sampled e(n) has 13 samples. The maximum of

    them is quantized to 6 bits, others are normalised then

    represented by 3 bits.

    So in each sub-frame, e(n) is represented by 6+13*3=6+39 bits.

    A frame has 4 sub-frames, 4*(6+39)=180 bits

    The above method is called regular-pulse-excitation (RPE)

  • 8/13/2019 Speech Coders for Wireless Communication

    50/53

    13 kbps RPE-LTP coder

    - Encoder -

    Input signal

    Short termLPC

    analysis

    RPE grid

    Selection and coding

    Synthesis filter

    1/A(z/5)

    +

    RPE grid

    decoding

    LTP

    analysis

    Pre-

    processing

    Short term

    Analysis

    filter

    +

    LTP parameters

    RPE parameters

    (13 pulses / 5 ms)

    Reflection coefficients

    (36 bit / 20 ms)

    -

  • 8/13/2019 Speech Coders for Wireless Communication

    51/53

    - Decoder -

    RPE

    parameters

    Synthesis filter

    1/A(z/5)

    +RPE grid

    decoding

    Post-

    processingShort term

    synthesis

    LTPparameters

    Reflection

    coefficients

    Output signal

  • 8/13/2019 Speech Coders for Wireless Communication

    52/53

    RPE-LTP

    Summary 8 LAR coefficients 36 (bits)

    For each sub-frame

    pattern code 2

    lag 7

    gain 2

    regular pulse 6+39

    total 56

    4 sub-frames 4*56=224

    Total one frame 224+36=260

    bit rate 260 bits/20 ms=13kbs

  • 8/13/2019 Speech Coders for Wireless Communication

    53/53


Recommended