+ All Categories
Home > Documents > Lecture 2 Acoustic Theory

Lecture 2 Acoustic Theory

Date post: 28-Feb-2018
Category:
Upload: marian-b
View: 223 times
Download: 0 times
Share this document with a friend

of 34

Transcript
  • 7/25/2019 Lecture 2 Acoustic Theory

    1/34

    Acoustic

    Theory

    of

    Speech

    Production

    Overview

    Soundsources

    Vocal

    tract

    transfer

    function

    Waveequations

    Soundpropagationinauniformacoustictube

    Representing

    the

    vocal

    tract

    with

    simple

    acoustic

    tubes

    Estimatingnaturalfrequenciesfromareafunctions

    Representingthevocaltractwithmultipleuniformtubes

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction1

    Lecture # 2

    Session 2003

  • 7/25/2019 Lecture 2 Acoustic Theory

    2/34

    A n a t o m i ca l S tr u ctu r e s f o r S p ee ch P r o d u ct i o n

    6. 34 5 Automatic Speech Recognition Acous tic T heory of Speech Production 2

  • 7/25/2019 Lecture 2 Acoustic Theory

    3/34

    Phonemesin AmericanEnglish

    PHONEME

    EXAMPLE

    PHONEME

    EXAMPLE

    PHONEME

    EXAMPLE

    /i/ beat/I/ bit

    /e/

    bait

    /E/ bet/@/ bat/a/ Bob/O/ bought

    /^/

    but

    /o/

    boat

    /U/ book/u/ boot

    /5/

    Burt

    /a/ bite/O/ Boyd/a/ bout/{/ about

    /s/ see /w/ wet/S/ she /r/ red

    /f/

    fee

    /l/

    let

    /T/ thief /y/ yet/z/ z /m/ meet/Z/ Gigi /n/ neat/v/ v /4/ sing

    /D/

    thee

    /C/

    church

    /p/

    pea

    /J/

    judge

    /t/ tea /h/ heat/k/ key

    /b/

    bee

    /d/ Dee/g/ geese

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction3

  • 7/25/2019 Lecture 2 Acoustic Theory

    4/34

    Places

    of

    Articulation

    for

    Speech

    Sounds

    Palato-Alveolar Velar

    AlveolarLabial

    Uvular

    Dental

    Palatal

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction4

  • 7/25/2019 Lecture 2 Acoustic Theory

    5/34

    Speech

    Waveform:

    An

    Example

    Two

    plus

    seven

    is

    less

    than

    ten

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction5

  • 7/25/2019 Lecture 2 Acoustic Theory

    6/34

    A WidebandSpectrogram

    Two

    plus

    seven

    is

    less

    than

    ten

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction6

  • 7/25/2019 Lecture 2 Acoustic Theory

    7/34

    Acoustic

    Theory

    of

    Speech

    Production

    The

    acoustic

    characteristics

    of

    speech

    are

    usually

    modelled

    as

    a

    sequenceofsource,vocaltractfilter,andradiationcharacteristics

    UG

    UL

    Prr

    Pr(j)=S(j)T(j)R(j)

    For

    vowel

    production:

    S(j) = UG(j)

    T(j) = UL(j)/ UG(j)

    R(j)

    =

    Pr(j)

    / UL(j)

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction7

  • 7/25/2019 Lecture 2 Acoustic Theory

    8/34

    SoundSource: VocalFold Vibration

    Modelled

    as

    a

    volume

    velocity

    source

    at

    glottis,

    UG(j)

    Pr( t )

    UG

    ( t )

    T 1/Fo o=

    t

    t

    UG

    ( f )

    1 / f 2

    f

    F0 ave

    (Hz)

    F0 min

    (Hz)

    F0 max

    (Hz)Men 125 80 200

    Women 225 150 350

    Children 300 200 500

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction8

  • 7/25/2019 Lecture 2 Acoustic Theory

    9/34

    SoundSource: TurbulenceNoise

    Turbulence

    noise

    is

    produced

    at

    a

    constriction

    in

    the

    vocal

    tract

    Aspirationnoiseisproducedatglottis

    Frication

    noise

    is

    produced

    above

    the

    glottis

    Modelledasseriespressuresourceatconstriction,PS(j)

    P ( f )s

    f

    0.2V

    D

    4A

    V: Velocityatconstriction D: Criticaldimension= A

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction9

  • 7/25/2019 Lecture 2 Acoustic Theory

    10/34

    Vocal TractWaveEquations

    Define:

    u(x, t) =

    U(x, t) =p(x, t) =

    =

    c

    =

    particle

    velocity

    volumevelocity(U =uA)

    soundpressurevariation(P=PO +p)

    densityofair

    velocity

    of

    sound

    Assumingplanewavepropagation(foracrossdimension ),andaone-dimensionalwavemotion,itcanbeshownthat

    p = u u = 1 p 2u 1 2u

    =x

    t

    x c2 t x2 c2 t2

    Timeandfrequencydomainsolutionsareoftheform

    u(x, t)=u+(tx)u(t+x) u(x, s)= 1 P+esx/cPesx/cc c c

    x xp(x, t)=c u+(t

    ) + u(t+ ) p(x, s)=P+esx/c +P

    esx/c

    c c

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction10

  • 7/25/2019 Lecture 2 Acoustic Theory

    11/34

    UG

    Propagation

    of

    Sound

    in

    a

    Uniform

    Tube

    A

    x = - l x = 0

    Thevocaltracttransferfunctionofvolumevelocitiesis

    UL(j) U(, j)T(j)=

    UG(j)=

    U(0, j)

    UsingtheboundaryconditionsU(0, s) = UG(s)andP(, s) = 0 2 1

    T(s) =

    e

    s/c +e

    s/cT

    (j)

    =

    cos(/c)

    ThepolesofthetransferfunctionT(j)arewherecos(/c) = 0

    4(2fn)=

    (2n

    2

    1) fn =

    4

    c

    (2n

    1) n =

    (2n

    1)

    n= 1,2, . . .

    c

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction11

  • 7/25/2019 Lecture 2 Acoustic Theory

    12/34

    Propagation

    of

    Sound

    in

    a

    Uniform

    Tube

    (cont)

    For

    c

    = 34,

    000 cm/sec,

    = 17

    cm,

    the

    natural

    frequencies

    (also

    calledtheformants)areat500Hz,1500Hz,2500Hz,. . . j

    x

    x

    x

    x

    x

    x

    40)

    T(j

    2010

    20log

    0

    0 1 2 3 4 5

    Frequency ( kHz )

    Thetransferfunctionofatubewithnosidebranches,excitedat

    oneendandresponsemeasuredatanother,onlyhaspoles

    The

    formant

    frequencies

    will

    have

    finite

    bandwidth

    when

    vocal

    tractlossesareconsidered(e.g.,radiation,walls,viscosity,heat)

    41,42,43,..., Thelengthofthevocaltract,,correspondsto

    1 3 5

    where

    i is

    the

    wavelength

    of

    the

    ith

    natural

    frequency

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction12

  • 7/25/2019 Lecture 2 Acoustic Theory

    13/34

    Standing

    Wave

    Patterns

    in

    a

    Uniform

    Tube

    A

    uniform

    tube

    closed

    at

    one

    end

    and

    open

    at

    the

    other

    is

    often

    referredtoasaquarterwavelengthresonator

    xglottis lips

    SWP forF1

    |U(x)|

    SWP for

    F223

    SWP forF3

    2 45 5

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction13

  • 7/25/2019 Lecture 2 Acoustic Theory

    14/34

    Natural

    Frequencies

    of

    Simple

    Acoustic

    Tubes

    z-l

    A z-l

    A

    x = - l x = 0 x = - l x = 0

    Quarterwavelengthresonator Half-wavelengthresonator

    P(x, j) = 2P+ cosx

    P(x, j) =

    j2P+

    sin

    xc c

    U(x,j)=

    j

    A A

    c

    2P+ sinx

    U(x, j) =

    c

    2P+ cosx

    c c

    ctan

    ccotY =j

    A Y =j A

    c c

    j

    A

    A 1

    c2 =

    jCA /c

    1

    j

    =

    j

    MA /c1CA =A/c

    2 =acousticcompliance MA =/A=acousticmass

    c c

    fn =

    4 (2n

    1)

    n

    = 1, 2, . . .

    fn =

    2 n n

    = 0, 1, 2, . . .

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction14

  • 7/25/2019 Lecture 2 Acoustic Theory

    15/34

    Approximating Vocal TractShapes

    [ i] [a] [ u]

    A1 A2

    1l 2l

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction15

  • 7/25/2019 Lecture 2 Acoustic Theory

    16/34

    2

    1

    2

    l

    Estimating NaturalResonanceFrequencies

    Resonance

    frequencies

    occur

    where

    impedance

    (or

    admittance)

    functionequalsnatural(e.g.,opencircuit)boundaryconditions

    UG A1 A2 UL

    1l

    Y + Y = 0

    ForatwotubeapproximationitiseasiesttosolveforY1+Y2 = 0

    j

    A1tan

    1j

    A2cot

    2= 0

    c

    c c c

    sin1

    sin2 A2 cos1 cos2 = 0

    c c A1 c c

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction16

  • 7/25/2019 Lecture 2 Acoustic Theory

    17/34

    Decoupling SimpleTube Approximations

    If

    A1 A2, orA1 A2,thetubescanbedecoupledandnaturalfrequenciesofeachtubecanbecomputedindependently

    Forthevowel/i/,theformantfrequenciesareobtainedfrom:

    A1 A2

    1l

    2l

    c cfn =

    21

    n plus fn =22

    n

    At

    low

    frequencies:

    A21/2 1 1 1/2c

    f = =2 A112 2 CA1

    MA2

    This

    low

    resonance

    frequency

    is

    called

    the

    Helmholtz

    resonance

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction17

  • 7/25/2019 Lecture 2 Acoustic Theory

    18/34

    VowelProduction Example

    7 cm2

    1 cm2

    8 cm2

    1 cm2

    9 cm 8 cm 9 cm 6 cm

    ++ +

    1093 268 1944 2917972

    2917 . . .

    . . . .

    .

    .

    .

    .. . . .

    Formant Actual Estimated Formant ActualF1 789 972 F1 256F2 1276 1093 F2 1905F3

    2808

    2917

    F3

    2917

    . . . . .

    . . . . .

    Estimated268

    19442917

    .

    .

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction18

  • 7/25/2019 Lecture 2 Acoustic Theory

    19/34

    Example of Vowel Spectrograms

    kHz kHz

    Wide Band Spectrogram

    kHz kHz

    0

    1

    2

    3

    4

    5

    6

    7

    8

    0

    1

    2

    3

    4

    5

    6

    7

    8

    Time (seconds)

    0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

    kHz kHz

    0 0

    8 8

    16 16Zero Crossing Rate

    dB dBTotal Energy

    dB dBEnergy -- 125 Hz to 750 Hz

    Waveform

    0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

    kHz kHz

    Wide Band Spectrogram

    kHz kHz

    0

    1

    2

    3

    4

    5

    6

    7

    8

    0

    1

    2

    3

    4

    5

    6

    7

    8

    Time (seconds)

    0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

    kHz kHz

    0 0

    8 8

    16 16Zero Crossing Rate

    dB dBTotal Energy

    dB dBEnergy -- 125 Hz to 750 Hz

    Waveform

    0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

    /bit/ bat /

    6.345 Automatic Speech Recognition Acoustic Theory of Speech Production 19

    /

  • 7/25/2019 Lecture 2 Acoustic Theory

    20/34

    Estimating

    Anti-Resonance

    Frequencies

    (Zeros)

    Zeros

    occur

    at

    frequencies

    where

    there

    is

    no

    measurable

    output

    UN

    UG

    Ap Ao

    An

    Yp Yo

    Yn

    nl

    Ab Ac AfPsUL

    lp

    lo lb lc lf

    Fornasalconsonants,zerosinUN occurwhereYO = Forfricativesorstopconsonants,zerosinUL occurwherethe

    impedancebehindsourceisinfinite(i.e.,ahardwallatsource)

    Y = 0

    Y + Y = 01 3 4

    Zeros

    occur

    when

    measurements

    are

    made

    in

    vocal

    tract

    interior

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction20

  • 7/25/2019 Lecture 2 Acoustic Theory

    21/34

    ConsonantProduction

    Ab Ac AfPs

    lb

    lc

    lf

    POLES

    ZEROS

    + + + +

    Ab Ac Af b c f

    [g]

    5

    0.2

    4

    9

    3

    5[s]

    5

    0.5

    4

    11

    3

    2.5

    [g] [s]poles zeros poles zeros

    215

    0

    306

    0

    1750 1944 1590 15901944 2916 3180 29163888 3888 3500 3180

    .

    .

    .

    .. . . .

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction21

  • 7/25/2019 Lecture 2 Acoustic Theory

    22/34

    Example of Consonant Spectrograms

    kHz kHz

    Wide Band Spectrogram

    kHz kHz

    0

    1

    2

    3

    4

    5

    6

    7

    8

    0

    1

    2

    3

    4

    5

    6

    7

    8

    Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

    kHz kHz

    0 0

    8 8

    16 16Zero Crossing Rate

    dB dBTotal Energy

    dB dBEnergy -- 125 Hz to 750 Hz

    Waveform

    0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

    kHz kHz

    Wide Band Spectrogram

    kHz kHz

    0

    1

    2

    3

    4

    5

    6

    7

    8

    0

    1

    2

    3

    4

    5

    6

    7

    8

    Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

    kHz kHz

    0 0

    8 8

    16 16Zero Crossing Rate

    dB dBTotal Energy

    dB dBEnergy -- 125 Hz to 750 Hz

    Waveform

    0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

    /kip/ si /

    6.345 Automatic Speech Recognition Acoustic Theory of Speech Production 22

    /

  • 7/25/2019 Lecture 2 Acoustic Theory

    23/34

    A

    A Y

    j

    Yl

    PerturbationTheoryforsmall

    l

    Considerauniformtube,closedatoneendandopenattheother

    l

    x

    Reducingtheareaofasmallpieceofthetubeneartheopening

    (where

    U

    is

    max)

    has

    the

    same

    effect

    as

    keeping

    the

    area

    fixedandlengtheningthetube

    Sincelengtheningthetubelowerstheresonantfrequencies,

    narrowingthetubenearpointswhereU(x)ismaximuminthe

    standing

    wave

    pattern

    for

    a

    given

    formant

    decreases

    the

    value

    ofthatformant

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction23

  • 7/25/2019 Lecture 2 Acoustic Theory

    24/34

    A

    Perturbation

    Theory

    (contd)

    A

    Yj

    c2 for

    small

    Yl

    l

    l

    x

    Reducingtheareaofasmallpieceofthetubeneartheclosure

    (wherepismax)hasthesameeffectaskeepingtheareafixedand

    shortening

    the

    tube

    Sinceshorteningthetubewillincreasethevaluesoftheformants,

    narrowingthetubenearpointswherep(x)ismaximuminthe

    standingwavepatternforagivenformantwillincreasethevalue

    of

    that

    formant

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction24

  • 7/25/2019 Lecture 2 Acoustic Theory

    25/34

    Summary

    of

    Perturbation

    Theory

    Results

    xglottis lips

    SWP forF1

    |U(x)|

    SWP forF2

    23

    SWP forF3

    2 45 5

    xglottis lips

    F1 12

    +

    (as a consequence of decreasing A)

    F2 12

    ++

    F3 12

    ++

    +

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction25

  • 7/25/2019 Lecture 2 Acoustic Theory

    26/34

    Illustration

    of

    Perturbation

    Theory

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction26

    http://d%7C/media/6345/ah_ba_wa.wav
  • 7/25/2019 Lecture 2 Acoustic Theory

    27/34

    Illustration

    of

    Perturbation

    Theory

    Theshipwastornapartonthesharp(reef)

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction27

  • 7/25/2019 Lecture 2 Acoustic Theory

    28/34

    Illustration

    of

    Perturbation

    Theory

    (Theshipwastornapartonthesh)arpreef

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction28

  • 7/25/2019 Lecture 2 Acoustic Theory

    29/34

    Multi-Tube

    Approximation

    of

    the

    Vocal

    Tract

    We

    can

    represent

    the

    vocal

    tract

    as

    a

    concatenation

    of

    N

    lossless

    tubeswithconstantarea{Ak} andequallengthx=/N Thewavepropagationtimethrougheachtubeis= x = Ncc

    A A7

    x

    xxx

    x

    xx

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction29

  • 7/25/2019 Lecture 2 Acoustic Theory

    30/34

    Wave

    Equations

    for

    Individual

    Tube

    The

    wave

    equations

    for

    the

    kth tube

    have

    the

    form

    c x

    Akk(t

    x) + U

    cpk(x, t) = [U

    +k(t+ )]c

    Uk(x, t) =

    U

    +

    c)

    U

    c)k(t

    x k(t

    +

    x

    wherexismeasuredfromtheleft-handside(0 x x)

    + + + +Uk

    ( t ) Uk( t - ) U

    k+1( t ) U

    k+1( t - )

    - - - -

    Uk( t ) U k( t + ) U k+1( t ) U k+1 ( t + )

    Ak

    x

    x

    Ak+1

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction30

  • 7/25/2019 Lecture 2 Acoustic Theory

    31/34

    Update Expressionat Tube Boundaries

    We

    can

    solve

    update

    expressions

    using

    continuity

    constraints

    at

    tubeboundariese.g.,pk(x, t) = pk+1(0, t), andUk(x, t) = Uk+1(0, t)

    +k + 1U+

    k + 1U

    -

    kU )

    -

    kU )

    +

    1 - r

    1 + rk

    k

    rkk

    - r

    DELAY

    DELAY

    DELAY

    DELAY

    k th ( k + 1 ) st

    k(t

    ) +

    rkU

    ( t )

    ( t )( t +

    ( t -

    tube tube

    +Uk

    ( t ) U k + 1 ( t - )

    - -U

    k

    ( t ) Uk + 1

    ( t + )

    Uk+

    +1(t)

    =

    (1

    +

    rk)U

    +k+1(t)

    Uk

    (t

    +

    )=rkUk+(t ) + ( 1 rk)Uk+1(t)

    rk

    =Ak+1Ak

    note|

    rk

    |1

    Ak+1+

    Ak

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction31

  • 7/25/2019 Lecture 2 Acoustic Theory

    32/34

    Digital

    Model

    of

    Multi-Tube

    Vocal

    Tract

    Updates

    at

    tube

    boundaries

    occur

    synchronously

    every

    2

    Ifexcitationisband-limited,inputscanbesampledeveryT = 2

    Eachtubesectionhasadelayofz1/21

    + z2

    1 + rk +Uk

    ( z )

    kr

    1

    k-r

    Uk + 1

    ( z )

    - -

    Uk( z ) Uk + 1 ( z )z 2 1 - rk

    ThechoiceofN dependsonthesamplingrateT

    T = 2= 2

    =

    N =

    2

    Nc

    cT

    Seriesandshuntlossescanalsobeintroducedattubejunctions

    Bandwidthsareproportionaltoenergylosstostorageratio

    Stored

    energy

    is

    proportional

    to

    tube

    length

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction32

  • 7/25/2019 Lecture 2 Acoustic Theory

    33/34

    Assignment

    1

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction33

  • 7/25/2019 Lecture 2 Acoustic Theory

    34/34

    References

    Zue,

    6.345

    Course

    Notes

    Stevens,AcousticPhonetics,MITPress,1998.

    Rabiner&Schafer,DigitalProcessingofSpeechSignals,

    Prentice-Hall,

    1978.

    6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction34


Recommended