+ All Categories
Home > Documents > Comfort Noise Detection and GSM-FR-Codec … NOISE DETECTIONAND GSM-FR-CODEC DETECTION FOR...

Comfort Noise Detection and GSM-FR-Codec … NOISE DETECTIONAND GSM-FR-CODEC DETECTION FOR...

Date post: 03-Apr-2018
Category:
Upload: dohuong
View: 220 times
Download: 6 times
Share this document with a friend
4
COMFORT NOISE DETECTION AND GSM-FR-CODEC DETECTION FOR SPEECH-QUALITY EVALUATIONS IN TELEPHONE NETWORKS Thorsten Ludwig Institute for Circuits and Systems Theory, Faculty of Engineering, Christian-Albrechts University of Kiel, Kaiserstraße 2, 24143 Kiel, Germany Phone: +49 (431) 880-6129, Fax: +49 (431) 880-6125 E-Mail: [email protected] ABSTRACT This paper proposes two algorithms to measure special quali- ty-relevant characteristics of telephone links. The first algo- rithm presented here allows to detect the GSM-FR codec in transmission systems. For this purpose, the spectral region of the decoded signal around 2700 Hz is evaluated. The GSM-FR coding principle inserts a spectral attenuation in this frequency region that can be detected. The error rate is below 5%. The purpose of the second algorithm is to detect comfort noise in telephone connections. Therefore, frequency points of the background-noise spectrum through- out the duration of speech utterances are sampled, by making use of minimum statistics in frequency-tracks of speech seg- ments. These frequency points are compared to the noise in speech pauses in a statistical manner to evaluate differences and decide about the occurrence of comfort noise. The error rate for the used data base is below 5%, but further investi- gations are necessary to verify this algorithm. 1. INTRODUCTION An In-service, Non-intrusive Measurement Device (INMD) [1, 2, 3] extracts quality-defining parameters from an exist- ing telephone link without disturbing the link. Such a device serves as a network monitor. A large number of these de- vices, placed at switches (on PCM-coded lines) in the whole network, can observe a multitude of telephone calls. A cen- tral evaluation gives evidence about the quality-of-service parameters of the network. Classical parameters to be mea- sured are the noise level, the active speech level, and the echo loss and delay. However, in modern networks these typical parameters are not sufficient for reliable statements about the speech quality as there are additional disturbances, e.g. packet loss in IP-telephone services, frame loss in mobile communications, comfort noise inserted by DTX-Systems (Discontinuous Transmission), cascading of different digital transmission systems, and others. This paper proposes two algorithms to measure additional characteristics of telephone links for speech-quality evaluations. The first algorithm de- scribed in section 2 allows the detection of the GSM-FR [4] codec in telephone links. This codec for mobile communi- cation degrades the quality of speech more than most other standard codecs. Section 3 presents an algorithm for the de- tection of comfort noise that is inserted by DTX-Systems in speech pauses. If the inserted comfort noise is not matched to the real background noise, it is disturbing and degrades the quality of the telephone link. For both algorithms results are presented within each section. The paper finishes with a conclusion. 2. DETECTION OF THE GSM-FR CODEC Fig. 1 shows the long-term power spectrum densities of two signals, one from a pure fixed-network connection com- pared to one of a partly mobile (GSM-FR-) connection. The 0 0.5 1 1.5 2 2.5 3 3.5 4 30 35 40 45 50 55 60 65 70 Frequency [kHz] Power Spectrum Density fixed-network connection signal GSM-coded-signal Fig. 1. Power spectral density (PSD) of a solely fixed- network connection signal compared to the PSD of a mobile to fixed-network connection signal. 7 th International Conference on Spoken Language Processing [ICSLP2002] Denver, Colorado, USA September 16Ć20, 2002 ISCA Archive http://www.iscaĆspeech.org/archive
Transcript
Page 1: Comfort Noise Detection and GSM-FR-Codec … NOISE DETECTIONAND GSM-FR-CODEC DETECTION FOR SPEECH-QUALITY EVALUATIONS INTELEPHONE NETWORKS ... comfort noise inserted by DTX …

COMFORT NOISE DETECTION AND GSM-SPEECH-QUALITY EVALUATIONS IN T

Thorsten Ludwig

Institute for Circuits and Syste

Faculty of Engineering, Christian-Albrec

Kaiserstraße 2, 24143 Kiel,

Phone: +49 (431) 880-6129, Fax: +4

E-Mail: [email protected]

ABSTRACT

This paper proposes two algorithms to measure special quali-ty-relevant characteristics of telephone links. The first algo-rithm presented here allows to detect the GSM-FR codec intransmission systems. For this purpose, the spectral regionof the decoded signal around 2700 Hz is evaluated. TheGSM-FR coding principle inserts a spectral attenuation inthis frequency region that can be detected. The error rateis below 5%. The purpose of the second algorithm is todetect comfort noise in telephone connections. Therefore,frequency points of the background-noise spectrum through-out the duration of speech utterances are sampled, by makinguse of minimum statistics in frequency-tracks of speech seg-ments. These frequency points are compared to the noise inspeech pauses in a statistical manner to evaluate differencesand decide about the occurrence of comfort noise. The errorrate for the used data base is below 5%, but further investi-gations are necessary to verify this algorithm.

1. INTRODUCTION

An In-service, Non-intrusive Measurement Device (INMD)[1, 2, 3] extracts quality-defining parameters from an exist-ing telephone link without disturbing the link. Such a deviceserves as a network monitor. A large number of these de-vices, placed at switches (on PCM-coded lines) in the wholenetwork, can observe a multitude of telephone calls. A cen-tral evaluation gives evidence about the quality-of-serviceparameters of the network. Classical parameters to be mea-sured are the noise level, the active speech level, and theecho loss and delay. However, in modern networks thesetypical parameters are not sufficient for reliable statementsabout the speech quality as there are additional disturbances,e.g. packet loss in IP-telephone services, frame loss in mobilecommunications, comfort noise inserted by DTX-Systems(Discontinuous Transmission), cascading of different digitaltransmission systems, and others. This paper proposes twoalgorithms to measure additional characteristics of telephone

links fscribedcodeccationstandatectionspeechto thethe quaare preconclu

2

Fig. 1two sigpared t

030

35

40

45

50

55

60

65

70

Pow

er S

pect

rum

Den

sity

Fig. 1networto fixed

ISCA Archive����'(()))*��&������*���(�����

FR-CODEC DETECTION FORELEPHONE NETWORKS

ms Theory,

hts University of Kiel,

Germany

9 (431) 880-6125

l.de

or speech-quality evaluations. The first algorithm de-in section 2 allows the detection of the GSM-FR [4]

in telephone links. This codec for mobile communi-degrades the quality of speech more than most otherrd codecs. Section 3 presents an algorithm for the de-of comfort noise that is inserted by DTX-Systems inpauses. If the inserted comfort noise is not matched

real background noise, it is disturbing and degradeslity of the telephone link. For both algorithms resultssented within each section. The paper finishes with asion.

. DETECTION OF THE GSM-FR CODEC

shows the long-term power spectrum densities ofnals, one from a pure fixed-network connection com-o one of a partly mobile (GSM-FR-) connection. The

0.5 1 1.5 2 2.5 3 3.5 4

Frequency [kHz]

fixed−network connection signal

GSM−coded−signal

. Power spectral density (PSD) of a solely fixed-k connection signal compared to the PSD of a mobile-network connection signal.

��������������� �������������������������������������� ��������

�������� �������� �!

�����"#���$%&��������

Page 2: Comfort Noise Detection and GSM-FR-Codec … NOISE DETECTIONAND GSM-FR-CODEC DETECTION FOR SPEECH-QUALITY EVALUATIONS INTELEPHONE NETWORKS ... comfort noise inserted by DTX …

spectral reduction in the area around 2700 Hz is a featureof the GSM-FR-transmission system. The residual signalafter LPC- and LTP-analysis in the coder is band limited to1333 Hz. The area around 0 Hz has low intensity becauseof the telephone bandpass filtering in the mobile before cod-ing. The reconstruction technique in the decoder mirrorsthis spectral low-intensity part of the received residual sig-nal to the area around 2700 Hz to reconstruct an excita-tion signal for the following synthesis filters. This intensityloss results in the spectral attenuation around 2700 Hz inthe decoded signal and can be used to detect the GSM-FR-transmission system. Therefore, the signal x(k) is dividedinto overlapping signal blocks xi(k) of length 16 ms (128samples, fs = 8 kHz):

xi(k) = x(k + i · M), k = 0 . . . 2M − 1, M = 64, (1)

where i is the time index of the i′th block. Each speech-signalblock, determined by a voice-activity detector (VAD) [5], isthen windowed and transformed by a DFT to the spectraldomain to determine the magnitude spectrum

Xi(µ) = |DFT{x(k + i · M) · w(k)}|. (2)

The spectral minima in the area of interest are determinedand the evaluation of all spectral minima leads to their distri-bution for this spectral region (2400 Hz - 3000 Hz). Fig. 2shows two examples, one for a connection with a GSM-FR-codec involved and one without. Each distribution can

2400 2500 2600 2700 2800 2900 30000

0.05

0.1

0.15

Frequency [Hz]

Fre

quen

cy o

f Occ

urre

nce

GSM-FR connection

2400 2500 2600 2700 2800 2900 30000

0.05

0.1

0.15

0.2

0.25

Frequency [Hz]

Fre

quen

cy o

f Occ

urre

nce

G.729 connection

Fig. 2. Distribution of spectral minima for a gsm-fixed-network connection (left) and a solely pure-network con-nection (right).

be approximated by a polynomial of second order. For theGSM-FR-transmission system the polynomial shows a typ-ical downward-opened parabola with its maximum around2700 Hz (Fig. 2, left side). For other transmission sys-tems the maximum can be found at a higher frequency asspeech has normally low-pass character. This results inmore spectral minima with increasing frequency and leadsto an upward-opened parabola (Fig. 2, right side). So thequadratic coefficient of the polynomial decides about the oc-currence of the GSM-FR-transmission system. For the testof the GSM-FR detection, a database with approximately

2000 sused.whererangesecondspeechima di

−0.00

0.02

0.04

0.06

0.08

0.1

0.12

Fre

quen

cy o

f Occ

urre

nce

Fig. 3cients,and alldataba

Fig. 3for theall othproximthreshoaboveGSM-case otransm

GSM

3. DE

DTX-Sent reaof a VAduratioand thmobilein the

hort signals with all kinds of background noise wasThe database contains data from 7 different sources,we also considered real transmission scenarios. Theof active speech within each signal was from 4 to 16s which corresponds to only 60 - 240 overlapping-signal blocks for the evaluation of the spectral min-stributions (Fig. 2).

1 −0.008 −0.006 −0.004 −0.002 0 0.002 0.004 0.006 0.008 0.01

Value of quadratic coefficient

. Normalized distributions of the quadratic coeffi-separated for GSM-FR (dashed-dotted-line) involvedother telephone connections (solid line) for the wholese and approximations by Gaussian distributions.

shows the distributions of the quadratic coefficientswhole database, separated for GSM-FR involved ander telephone connections. Both distributions are ap-ated by Gaussian distributions. An optimal decisionld can be found at the point of intersection, slightly

zero. With this threshold the overall error rate for theFR-Coder detection is about 4%. Most errors occur inf additional complex coders following the GSM-FR-ission system.

Connection No. of Signals False decision-FR involved 1030 3.2%

other 881 4.0%

TECTION OF COMFORT NOISE INSERTEDBY DTX-SYSTEMS

ystems are placed into networks or mobiles for differ-sons. They detect speech pauses in signals by meansD and interrupt the transmission of the signal for then of the speech pauses. Thereby, bandwidth is saved

e capacity of a transmission channel is increased. Incommunication the main advantage is saving energy

mobile. As the interruption of the signal can be very

Page 3: Comfort Noise Detection and GSM-FR-Codec … NOISE DETECTIONAND GSM-FR-CODEC DETECTION FOR SPEECH-QUALITY EVALUATIONS INTELEPHONE NETWORKS ... comfort noise inserted by DTX …

disturbing, most modern DTX-Systems insert comfort noisein speech pauses [6]. The comfort noise describes the realbackground noise with only a few parameters compared tonormal coding. So still less parameters for the descriptionof the background noise are used and less often transmit-ted. If the comfort noise does not match to the real back-ground noise or the VAD does not work proberly, resultingin frequent switching points, the subscribers are heavily dis-turbed by this mismatch or they might even think that thelink is interrupted. As result the overall quality of the tele-phone link degrades. One possible approach to detect com-fort noise is to search for the points in time when the coderswitches between signal transmission and comfort noise in-sertion. Another approach that is used here compares thestructure of background noise during speech pauses with thestructure of the background noise during speech. To isolatebackground noise during speech, the following algorithm isused: The first step is to separate speech and pause seg-ments of the examined signal by means of a voice-activitydetector (e.g. GSM VAD). After excluding uncertain seg-ments and introducing safety margins before and after speechsegments, both parts are transformed with a DFT in termsof short blocks to the spectral domain and form two ma-trices Ms(µ, i) (speech-signal blocks) and Mp(µ, i) (pauseblocks), where µ corresponds to the frequency bins and i tothe time. Within each matrix time regions that do not corre-spond to speech (MS(µ, i)) or noise (Mp(µ, i)), respectively,are set to zero. To isolate background noise during speech it

Time [sec]

Fre

quen

cy [k

Hz]

30 35 40 45 500

1

2

3

4

Time [sec]

Fre

quen

cy [k

Hz]

30 35 40 45 500

1

2

3

4

Time [sec]

Fre

quen

cy [k

Hz]

30 35 40 45 500

1

2

3

4

Fig. 4. Upper: Signal clip. Middle: Isolated backgroundnoise frequency points during speech (Ms(µ, i)). Lower:Pause segments (Mp(µ, i)).

is assumed that a defined part (namely, the spectral minima)of the spectral points in each speech region belongs to the

backgrpart ofto thelocalizof theclude tor excland Mshowsthe miwith thdroppereal batrix Msegmethe begexcludvoice aHp(µ,for speµ andchosennal wi

00

1

2

3

Nor

mal

ized

No.

of O

ccur

ence

00

1

2

3

Nor

mal

ized

No.

of O

ccur

ence

Fig. 5.duringfreque

with reand 7 sin pauwas casimilarthe difthe cor

ound noise. Additionally, it is assumed that a definedthe spectral points in each frequency bin µ belongs

background noise to ensure that spectral points areed in each frequency bin. In pause segments only 1%spectral peaks (spectral maxima) are dropped to ex-ransient effects in the background noise. The droppeduded values are set to zero in both matrices Ms(µ, i)p(µ, i). Fig. 4 shows an example. The upper plota clip of the spectrum of the signal under test. In

ddle, the determined speech segments are visualizede high intensity spectral points of the pitch contourd. The remaining points are assumed to belong to theckground noise and are the non-zero values of the ma-s(µ, i). The lower picture shows the determined pausents (Mp(µ, i)). Also visible are the safety margins atinning and the end of each speech segment and someed areas to ensure that there is no false decision for thectivity. In a next step, 2D-histograms Hs(µ, γ) andγ) of the residual spectral frequency points, separatedech and pause segments, are generated over frequencymagnitude γ. The division of the magnitude axis issignal dependent. Fig. 5 shows an example for a sig-

thout comfort noise. Both pictures show similarities

5001000

15001000

20003000

x 10−3

2D−Histogramm Speech Segments

5001000

15001000

20003000

x 10−3

Frequency

2D−Histogramm Pause Segments (no Comfort Noise)

Magnitude

Upper: 2D-histogram of spectral frequency pointsspeech segments. Lower: 2D-histogram of spectral

ncy points during pause segments (no comfort noise).

gard to the distribution of the spectral points. Fig. 6how examples for signals with inserted comfort noisese segments. In Fig. 6 the real background noiser noise, in Fig. 7 it was office noise. There are stillities in some regions of the 2d-histograms but overallferences prevail. To find a mathematical expression,relation ρk between N chosen magnitude bins γk is

Page 4: Comfort Noise Detection and GSM-FR-Codec … NOISE DETECTIONAND GSM-FR-CODEC DETECTION FOR SPEECH-QUALITY EVALUATIONS INTELEPHONE NETWORKS ... comfort noise inserted by DTX …

calculated and averaged:

ρ =1N

k

ρk =1N

k

corr{Hs(µ, γk), Hp(µ, γk)}.(3)

Only those magnitude bins are considered in (3) where the∑µ

Hp(µ, γk) exceeds a predefined value to ensure that there

is a relevant number of spectral points in the magnitude bin.Then, a low correlation ρ < 0.6 indicates the detection ofcomfort noise. For a database with 106 signals (90 sig-nals with comfort noise, 16 without) and all kinds of back-ground noise (clear, office, car, street, Hoth), there wereonly 3 wrong decisions (3%). For a correlation threshold ofρ < 0.55, there is only one additional error.

0 1000 2000 3000 4000 5000 6000 70001000

200030000

0.005

0.01

2D−Histogramm Speech Segments (Car Noise)

Nor

mal

ized

No.

of O

ccur

ence

0 1000 2000 3000 4000 5000 6000 70001000

200030000

0.005

0.01

Frequency

2D−Histogramm Pause Segments (Comfort Noise)

Magnitude

Nor

mal

ized

No.

of O

ccur

ence

Fig. 6. Upper: 2D-histogram of spectral frequency pointsduring speech segments. Lower: 2D-histogram of spectralfrequency points during pause segments (car noise ↔ com-fort noise).

4. CONCLUSION

In this paper, two algorithms to measure additional char-acteristics of telephone links are proposed. The first onemakes use of a spectral attenuation introduced by the GSM-FR codec to detect this codec in a transmission system andworks with high accuracy. The decision can be used directlyin models to estimate the perceived quality of the telephonelink. The second one proposes a method to detect comfortnoise in telephone links. Therefore, spectral points duringspeech which belong to the background noise are localized.These frequency points are compared to the noise in speechpauses in a statistical manner to decide about the occurrenceof comfort noise. For the tested database, the results with

00

0.005

0.01

Nor

mal

ized

No.

of O

ccur

ence

00

0.005

0.01

Nor

mal

ized

No.

of O

ccur

ence

Fig. 7.duringtral frecomfor

error rdatabaevaluathe degfort no

The auschaft

[1] ITm

[2] ITV

[3] TTN

[4] E2

[5] E2c

[6] E20

500 1000 1500 2000 25001000

20003000

2D−Histogramm Speech Segments (Office Noise)

500 1000 1500 2000 25001000

20003000

Frequency

2D−Histogramm Pause Segments (Comfort Noise)

Magnitude

Upper: 2D-histogram of spectral frequency pointsspeech segments. Lower: 2D-histogram of spec-

quency points during pause segments (office noise ↔t noise).

ates below 4% are very good. Nevertheless a largerse is necessary to verify the algorithm. For qualitytions an additional step is necessary that judges aboutree of the perceived degradation of the inserted com-ise in comparison to the real background noise.

5. ACKNOWLEDGMENT

thor would like to thank T-Nova Innovationsgesell-mbH, Berlin, for the support of this research work.

6. REFERENCES

U-T Recommendation P.561, In-Servive, Non-Intrusive Measure-ent Device – Voice Service Measurements.

U-T Recommendation P.562,Anaylsis and Interpretation of INMDoice-Service Measurements.

. Gansler, G. Salomonsson, Nonintrusive Measurements of theelephone Channel, IEEE Transactions on Communications, Vol.47,o.1, January 1999.

TSI, European digital cellular telecommunications system (Phase); Full rate speech processing functions (GSM 06.01).

TSI, European digital cellular telecommunications system (Phase); Discontinuous Transmission (DTX) for full rate speech traffichannel (GSM 06.31).

TSI, European digital cellular telecommunications system (Phase); Comfort noise aspects for full rate speech traffic channels (GSM6.12).


Recommended