+ All Categories
Home > Documents > Presents Ti On

Presents Ti On

Date post: 07-Apr-2018
Category:
Upload: shrishty-deorari
View: 217 times
Download: 0 times
Share this document with a friend

of 12

Transcript
  • 8/4/2019 Presents Ti On

    1/12

    STQ Workshop, Sophia-Antipolis, February 11th, 2003

    Packet loss concealment using

    audio morphingFranck Bouteille

    Pascal ScalartBalazs Kvesi

    PRESCOM SA, Lannion, FRANCE

    France Telecom R&D, Lannion, FRANCE

  • 8/4/2019 Presents Ti On

    2/12

    MotivationIn packet data networks, excess traffic leads to delays or loss in delivery of

    information. In voice communication, long delays are intolerable and networkdelay budgets have strong influence on the design of packet voice systems.

    To increase the tolerance of packet voice systems to lost packets some

    techniques have been developed.

    These techniques do not use thea posteriori information of the nextpacket that indicates and detects the lost of one or several frames.

    However those techniques are not adapted for long lost periods

    (>15ms) because of the non long-term stationnarity of speech signal.

    This a posteriori information is generally available because of the playout

    buffer management and real time network protocol.

    The technique proposed uses the knowledge of the frame received after the

    last lost one, the models of the last received frames, and a model interpolation

    to synthesized the missing signal.

  • 8/4/2019 Presents Ti On

    3/12

  • 8/4/2019 Presents Ti On

    4/12

    Morphing audio principle Context of lost : Previous Frame

    Frame A

    Missing Signal Next Frame

    Frame B

    Voiced/Unvoiced strategy

    Pitch estimation

    Frame A : P0

    Pitch estimation

    Frame B : P1

    UVUV

    VUV

    UVV

    VV

    Frame BFrame AP0 , P1

    P0 , P1 = P0P0 = P1 , P1

    Unvoiced

    signal

    (400 Hz ) 2.5 ms 0, 1 15 ms (67 Hz) P P

    When missing signalis defined as

    unvoiced, Frame A iscopied to missingsignal or comfort

    noise is generated

  • 8/4/2019 Presents Ti On

    5/12

    Morphing audio principle Modelisation and Interpolation: P

    0 andP

    1 are used to estimate the number of necessaryintermediate blocks (NbBloc) and the size of these blocks (SizeBloc).

    max( 0, 1)SizeBloc P PNbSampleLoss

    NbBloc roundSizeBloc

    We model the last pitch period vector (X0) of the Frame A (ModP0)

    and the first pitch period vector (X1) of the Frame B (ModP1). DCT(Dicret Cosinus Transform) is used to model X0 and X1. Resolution is 120points at 8kHz of sample frequency. Intermediate blocks, , are used in order to transform, in acontinuous way, the model vector ModP0 to the model vector ModP1withlinear interpolation of model parameters.

    iBlock

    IDCT : Inverse Discrete Cosinus Transform.

    120

    1 00 *

    0 1 0 k 120 1

    0 1

    i

    ModP k ModP kBlock n IDCT ModP k i

    NbBloc

    i NbBloc

    n SizeBloc

    1

  • 8/4/2019 Presents Ti On

    6/12

    Morphing audio principle Blocks concatenation and smoothing

    Each block is then copied in the synthesis frame.

    . .

    Smoothing

    Frame A Frame B

    Synthesis

    Frame

    0Block 1Block iBlock 1NbBlocBlock

    Smoothing between blocks is realized according to:

    (0) (0)* ( 1) (1 (0))* (0)

    ( 1) : last sample of previous block (or Frame)y(0) : first sample of current block (or Frame)

    ( ) ( )* ( 1) (1 ( ))* ( )

    1( ) : Smoothing Factor ( ) 1

    x x y

    x

    x i j x i i y i

    ii i

    NbPSmoothing

    1

    0 i NbPSmoothing

  • 8/4/2019 Presents Ti On

    7/12

    Morphing audio principle Some results of concealed signal

    Conceal

    frame

    Nb sampleNb sample

    Original

    frame

    Case of voiced frames of a female speechsignal (30ms of missing signal)

  • 8/4/2019 Presents Ti On

    8/12

    Morphing audio principle Some results of concealed signal

    Behaviour of the morphing technique during a transition frame (30ms)for male speech signal.

    Original

    frame

    Nb sample

    Conceal

    frame

    Nb sample

    We can notice that the concealed speech to noise transition is more voiced

    than original frame. In an enhanced morphing technique the voiced durationcould be controlled.

  • 8/4/2019 Presents Ti On

    9/12

    Comparisons and performances

    Configuration Two speech coders (G.711 and G.723.1) were independently tested,The size frame is 30ms;

    Five concealment techniques : Previous Frame Copy: PFC, double SidedPeriodic Substitution: DSPS1, ITU-T recommended technique definedfor each specific coder: G.711 and G.723.1, GFEC technique2 and AudioMorphing;

    Two series of rate were defined: 5 % and 10 %. The losses can appearby burst, but are usually isolated ;

    The number of sentences was 15 (8 female and 7 male speech files)1 : J. Tang, "Evaluation of Double Sided Periodic Substitution (DSPS) Method for Recovering Missing

    Speech in Packet Voice Communications," IEEE Computers and Communications, pp. 454-458, 1991.

    2 : B. Kvesi, D. Massaloux, "Method of Packet Errors Cancellation Suitable for any Speech and Sound CompressionScheme", ETSI STQ Workshop, February 2003, Sophia-Antipolis

    Ten subjects were participating to an informal test: they were

    asked to listen to coded speech signals that have beencorrected by different concealment techniques

  • 8/4/2019 Presents Ti On

    10/12

    Comparisons and performances Results for G.711 codec

    0,00

    1,00

    2,00

    3,00

    4,00

    5,00

    6,00

    7,00

    Taux 5%

    Taux 10%

    Rate 5%

    Rate 10%

    Score (/15)

    PFC FECG711 DSPS GFEC MORPHING

  • 8/4/2019 Presents Ti On

    11/12

    Comparisons and performances Results for G.723.1 codec

    Note (/15) - G.723.1 - Taux de perte 5% et 10%

    0,00

    1,00

    2,00

    3,00

    4,00

    5,00

    6,00

    7,00

    RTP FECG.723.1 DSPS KB Morphing

    Taux 5%

    Taux 10%

    Rate 5%

    Rate 10%

    Score (/15)

    PFC FECG723 DSPS GFEC MORPHING

  • 8/4/2019 Presents Ti On

    12/12

    Conclusion

    Proposed technique improves the quality of theframe correction for strong lost rate (5 % and 10 %);

    Morphing audio adds latency (Frame B is required),but is acceptable for application of VoIP;

    Another modelisation are possible and voicedcondition can be controlled to improve restitution

    quality


Recommended