+ All Categories
Home > Documents > Speaker Identification based on the statistical analysis of F0

Speaker Identification based on the statistical analysis of F0

Date post: 07-Jan-2016
Category:
Upload: kolina
View: 36 times
Download: 6 times
Share this document with a friend
Description:
Speaker Identification based on the statistical analysis of F0. Pavel Labutin, Sergey Koval, Andrey Raev St. Petersburg, Russia [email protected]. Report overview. The problem of the F0 usage in forensic speaker identification Main challenges Proposed Method Results Conclusion. - PowerPoint PPT Presentation
21
Speech Technology Center Speaker Identification based on the statistical analysis of F0 Pavel Labutin, Sergey Koval, Andrey Raev St. Petersburg, Russia [email protected]
Transcript
Page 1: Speaker Identification based on the statistical analysis of F0

SpeechTechnologyCenter

SpeechTechnologyCenter

Speaker Identification based on the statistical analysis of F0

Pavel Labutin, Sergey Koval, Andrey RaevSt. Petersburg, Russia

[email protected]

Page 2: Speaker Identification based on the statistical analysis of F0

224.07.2007 www.speechpro.comwww.speechpro.com

Report overview

The problem of the F0 usage in forensic speaker identification Main challenges Proposed Method Results Conclusion

Page 3: Speaker Identification based on the statistical analysis of F0

324.07.2007 www.speechpro.comwww.speechpro.com

The problem of the F0 usage in forensic speaker identification

F0 analysis - obligatory stage in forensic speaker identification.

Remedial legislation demands: forensic investigation of the speech evidence must be comprehensiveBecause pitch reflects important properties of the human voice, consequently it must be investigated by forensic examination of the speech record

Typical F0 usage by speaker identificationAutomatic F0 detection

Some data smoothing

Simple F0 statistics comparison

Page 4: Speaker Identification based on the statistical analysis of F0

424.07.2007 www.speechpro.comwww.speechpro.com

Main challenges in F0 usage for forensic speaker identification

Fig.1. F0 curve for telephone conversation of the suspected person.At 15th sec he got an important information: Average F0 grew in 70Hz. Vertical axis – frequency (Hz), horizontal axis – time (sec),

Low speech quality for real police records

As usual SNR < 15 dB

Frequency range is limited

Speech signal distortions (compression, non linear FR of channel equipment, tape recorders etc.)

High inner speaker F0 variability

High dependence F0 statistics from speaker state and style of speech           

Page 5: Speaker Identification based on the statistical analysis of F0

524.07.2007 www.speechpro.comwww.speechpro.com

The method discussed

Three stages:

1. F0 reliable detection 2. F0 detection control an correction

3. F0 statistics data analysis and comparison.

F0 Detection algorithm: two-pass-method; using summation of multiple harmonics in the spectral field; Noise cancellation, adaptation for speech signals of very low quality Good results for field applications; Is implemented into expert software (SIS) and is used for real forensic examinations.

Fig.2. Waveform (upper window) and F0 curve (thin yellow curve) superimposed on cepstrogram (bottom window). On the cepstrogram picture [7] shadow degree corresponds to the signal periodicity degree at this point of frequency and time. Vertical axis – frequency (Hz), horizontal axis – time (sec).

Page 6: Speaker Identification based on the statistical analysis of F0

624.07.2007 www.speechpro.comwww.speechpro.com

F0 detection exactness control and correction

Fig.3. Waveform (upper window)and F0 curve (thin yellow line in bottom window). Correspondence between real F0 and calculated curve is unknown and uncontrolled. Vertical axis – frequency (Hz), horizontal axis – time (sec).

Page 7: Speaker Identification based on the statistical analysis of F0

724.07.2007 www.speechpro.comwww.speechpro.com

F0 detection exactness control and correction

Fig.4. Waveform (upper window), cepstrogram (signal periodicity function – in the middle) and F0 curve (thin yerllow curve) superimposed on cepstrogram (bottom window). On the cepstrogram picture [7] shadow degree corresponds to the signal periodicity degree at this point of frequency and time. Vertical axis – frequency (Hz), horizontal axis – time (sec).

Page 8: Speaker Identification based on the statistical analysis of F0

824.07.2007 www.speechpro.comwww.speechpro.com

F0 detection exactness control and correction

Fig.5. Waveform (upper window), initially detected F0 curve (yellow curve) superimposed on cepstrogram (middle window), graphically corrected by expert’s F0 curve and cepstrogram (bottom window). On the cepstrogram picture [7] shadow degree corresponds to the signal periodicity degree at this point of frequency and time. Vertical axis – frequency (Hz), horizontal axis – time (sec).

Page 9: Speaker Identification based on the statistical analysis of F0

924.07.2007 www.speechpro.comwww.speechpro.com

Statistical F0 features used

Values of pitch are transformed to a logarithmic scale, and then statistical pitch features are calculated.

The typical set of the statistical parameters: Average value, Hz; Maximum, Hz; Minimum, Hz; Maximum -3%, Hz;* Minimum +1%, Hz; Median, Hz; Percent of areas with raising pitch,%;* Pitch logarithm variation;* Pitch logarithm distribution asymmetry;* Pitch logarithm distribution excess; Average velocity of pitch change, %/sec; Pitch logarithm variation derivative; Pitch logarithm derivative distribution asymmetry; Pitch logarithm derivative distribution excess; Average velocity of pitch raise, %/sec;* Average velocity of pitch fall, %/sec.*The asterisk indicates the statistical features more heavily weighted in common

metric for speaker identification.

Page 10: Speaker Identification based on the statistical analysis of F0

1024.07.2007 www.speechpro.comwww.speechpro.com

General identification metric

The deviation of every statistical parameter was calculated for every file pair from the corpus.

The distributions of the deviations for pairs “same-different” and “same–same” were built

Functions False Acceptance (FA), False Rejection (FR) and EER (Equal Error Rate) were calculated for every statistical parameter. 

The general identification metric was constructed as a weighted sum of separate statistical parameters.

The weights were selected to minimize EER for the given speech database.

For general weighted metric FR and FA curves and ERR were calculated.

Page 11: Speaker Identification based on the statistical analysis of F0

1124.07.2007 www.speechpro.comwww.speechpro.com

Speech data base used for training and testing A speaker identification algorithm was developed and trained

using the STC corpus RUSTEN.

RUSTEN includes: 126 speakers (67 women and 59 men) in 5 sessions for 5 different analog telephone lines (including public

telephones from noisy streets and underground stations), real spontaneous dialogs

and130 speakers (61 women and 69 men)in 2 – 10 sessionsfor different digital telephone linesabout 1000 files of high quality digital phone channel

conversations.

RUSTEN: Russian Switched Telephone Network speech database (STC), 2003. S0050, ELDA - Evaluations and Language resources Distribution Agency.

Page 12: Speaker Identification based on the statistical analysis of F0

1224.07.2007 www.speechpro.comwww.speechpro.com

An example of F0 feature detection in SIS software

Fig.6. An example of working window of the SIS software with the results of F0 statistic comparison for two speakers.

Such screenshots are typically inserted into the expert examination conclusion to illustrate F0 statistical analysis results.  

Page 13: Speaker Identification based on the statistical analysis of F0

1324.07.2007 www.speechpro.comwww.speechpro.com

Pitch of the two files with differebt avaraged value

Fig.7. Cepstrograms of two compared speech files. The same speaker with different style of speech. According to pitch statistical analysis speakers are the same, although average pitch values differs significantly: 154Hz and 135Hz correspondently.

Page 14: Speaker Identification based on the statistical analysis of F0

1424.07.2007 www.speechpro.comwww.speechpro.com

Results of method testing  Tonal

speech duration

10 sec

template

20 sec template

40 sec template

80 sec template

10 sec

Test

All

Men

Women

17.7

25.2

26.6

20 sec

Test

All

Men

Women

16.7

23.7

24.9

15.2

21.7

22.6

40 sec

Test

All

Men

Women

16.1

23.0

23.8

14.4

20.6

21.1

13.2

19.1

19.0

80 sec

Test

All

Men

Women

15.6

22.1

23.1

13.6

19.5

19.8

12.3

17.8

17.5

10.9

16.2

15.0

Tables 1 shows the results of the speaker identification using F0 statistics analysis. The test data base includes about 1600 speech files of 256 speakers, real dialogs through public telephone net, both analog and digital channels.

Page 15: Speaker Identification based on the statistical analysis of F0

1524.07.2007 www.speechpro.comwww.speechpro.com

Results of speaker discrimination using only averaged F0 value.

 Tonal speech duration

10 sec

template

20 sec template

40 sec template

80 sec template

10 sec

Test

Men 32.0

20 sec

Test

Men 31.1 30.1

40 sec

Test

Men 30.5 30.1

80 sec

Test

All

Men 30.1 28.8 27.9

17.4

27.5

Tables 2 shows the results of the speaker identification using only one, usually used F0 feature: average F0 value. The test data base includes about 1600 speech files of 256 speakers, real dialogs through public telephone net, both analog and digital channels.

Page 16: Speaker Identification based on the statistical analysis of F0

1624.07.2007 www.speechpro.comwww.speechpro.com

An example of FA and FR curves. Ave F0

Page 17: Speaker Identification based on the statistical analysis of F0

1724.07.2007 www.speechpro.comwww.speechpro.com

An example of FA and FR curves. F0 min+ 3%

Page 18: Speaker Identification based on the statistical analysis of F0

1824.07.2007 www.speechpro.comwww.speechpro.com

An example of FA and FR curves.General metric

Page 19: Speaker Identification based on the statistical analysis of F0

1924.07.2007 www.speechpro.comwww.speechpro.com

CONCLUSION

The method based upon the statistical analysis of F0 for forensic speaker identification is described.

The reliability of the method is tested on a large amount of real speech material of telephone conversations.

Described really very good method to detect F0, check and correct detected F) curve for real forensic speech records.

The method is implemented into expert software (SIS) and used in everyday forensic examination practice.

Page 20: Speaker Identification based on the statistical analysis of F0

2024.07.2007 www.speechpro.comwww.speechpro.com

PERSPECTIVES

The same method of the statistical analysis of F0 is used for diagnostics of unknown speaker anthropometric features, such as age, high, weight , etc.

Preliminary results are promising.

Except the statistical F0 analysis we propose for experts in addition to perform detailed structural analysis of the F0 curve.

In particular, to measure Max, Min, Range,Timing of the F0 moving for the space of accented syllable of the phrase or for voiced hesitation pauses.

Page 21: Speaker Identification based on the statistical analysis of F0

2124.07.2007 www.speechpro.comwww.speechpro.com

Thank you for attention


Recommended