Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh –...

Post on 18-Dec-2015

218 views 4 download

transcript

Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics

Jonas Lindh – jonas.lindh@ling.gu.sehttp://www.ling.gu.se/~jonasDepartment of Linguistics, Göteborg Universityand GSLT (Graduate School of Language Technology)

IAFPA 2006

Outline• Background and Introduction

– F0 and Forensic Phonetics– Modulation theory of speech

• Hypotheses• Methods• Results

– F0 Statistics – for Young Swedish males– Robustness test– Vocal effort test.– Liveliness illustration.

• Conclusions• Future Work

Background and Introduction

• F0 a reliable parameter for speaker identification (French, 1990 ; Hollien, 1990 ; Künzel,

1987 ; Nolan, 1983 - in Braun, 1995).• Technical, physiological and psychological

factors (Braun, 1995).• Fundamental frequency measures.• Some previous studies and results.

Background and Introduction (Braun, 1995)

• Technical factors– Tape speed unfortunately still a problem. – Sample durations (50, 75, 14, 120 s?).

• Physiological factors– Age, smoking, operations. – Larynx size, shape and mass.– Between speaker variation.

• Psychological factors– Noise level, emotions, time of the day.– Vocal effort, speaking rate, F0-dynamics, voice quality– Within speaker variation

Background and Introduction

• Fundamental frequency measures– Average

– Standard deviation

– Median

– Interquartile range

– F0 mode

– Base value! Modulation theory of speech.

Modulation theory of speech• The theory /…/ considers speech signals as the result

of allowing conventional gestures to modulate a carrier signal that has the personal characteristics of the speaker. This implies that in general the conventional information can only be retrieved by demodulation. In order to perceive the phonetic quality of a speech signal, listeners evaluate the deviations of the properties of the signal (F0, formant frequencies, etc.) from those they expect of a neutral vocalization produced by the speaker with properties given by his age, sex, vocal effort, speech rate, etc. (part of abstract -Traunmüller, 1994)

F0 Liveliness

European lang. Chinese lang.

Liveliness class SD N SD N(4) Ve ry high 4.8 + +(3) High 4.0 + – –(2) Moderate 2.8 – + – – – 4.0 – –(1) Low 2.1 –

Average F0‑variation (SD in semitones) as a function of the type of speech as classified in.

Under ‘Type’, the speech samples are classified according to their expected liveliness (Traunmüller & Eriksson, 1995).

F0 Mean, SD and ‘liveliness’

Investigation Type n Sex Age F0 SD

Rappaport (1958), German 1 190 m 129 2.3Chevrie‑Muller et al. (1967),Fr 2 21 m 20–61 145 2.5Boë et al. (1975), Fr 2 30 m 118 2.8Takefuta et al. (1972), English 4 24 m 127 3.8Chen (1974), Mandarin Chinese 2 2 m 30–50 108 4.1Rose (1991), Wú 2 4 m 25–62 170 4.1Kitzing (1979), Swedish 2 51 m 21–70 110 3.0Pegoraro Krook (1988), Swedish 2 198 m 20–79 113 2.6

F0 Mean, SD and ‘liveliness’

Investigation Type n Sex Age F0 SD

Johns‑Lewis (1986), English:Conversation 2 5 m 24–49 101 3.4Reading 3 5 m 24–49 128 4.35Acting 4 5 m 24–49 142 4.85Graddol (1986), English:Reading passage A 2 12 m 25–40 119 3.6Reading passage B 3 12 m 25–40 131 4.55

Average/investigation 10 m 124 3.4Average/balanced speaker 471 m 119 2.8

F0 Liveliness (Traunmüller & Eriksson, 1995)

• The SD of F0 increases with increasing ‘liveliness’ of the discourse.

• The SD of F0 seems to be larger in tone languages than in non‑tone languages.

F0 baseline (Traunmüller & Eriksson, 1995)

• Fb = Fmean – k (F)• Where k is a constant (app. 1.43).• App. 5% F0 values below Fb . • Different liveliness, same Fb .

• Tested by changing the factor and not Fb when resynthesizing natural speech.

• ke = 0.156, 0.414, 0.704, 1.000, 1.290, 1.566, 1.830• “Det finns folkstammar som äter både kattkött och hundkött”.

Hypotheses concerning F0 for young Swedish males

• The F0 median is more robust than the F0 mean when it comes to technical factors, i.e. less sensitive to outliers.

• The base value shows least within speaker variation of presented measures within a voice modality. (creaky voice, shouting or raising one’s voice)

• The 5% limit frequency (alternative baseline) is more robust than the base value when the technical factor means positive octave jumps.

Methods

• The software Praat (Boersma & Weenink, 2005) was used to automatically extract F0 data from 109 young male speakers (20-30 years old).– The group exist as such in the Swedia database.– 62% of convicted criminals in Sweden 2004 (25-35).

• The recordings were taken from the Swedia database (<http://www.swedia.nu>) – spontaneous speech.

• Mean duration of 52.3 sec.

Methods• Edited out interviewer.• Manual check of octave jumps.• Ongoing is the collection of 5% limit frequency, F0

mode (histograms for each speaker’s F0 distribution) and interquartile range.

Methods

• A small robustness test was made by measuring F0 for simultaneous recording on four different devices (material Livijn, 2004).

– The North wind and the sun (in Swedish).

– MCA, Cassette, Mobile and digital (Reference).

Methods

• Vocal effort test.

• 5 male speakers from Eriksson & Traunmüller (2000)

• High quality recordings.

• 5 distances/subject outdoors (0,3-1,5-7,5-37,5-187,5m)

– “Jag tog ett violett, åtta svarta och sex vita.”

Methods

• A liveliness illustration

• Recordings of a simulated carrier signal + a neutral, happy, sad and angry voice.

Results

Mean distribution of F0 for YM

0 0 1

8

21

28

22

14

10

1

4

00

5

10

15

20

25

30

70 80 90 100 110 120 130 140 150 160 170 Fler

Hz

N S

pea

ker

s

• Mean of means 120,8 Hz – 65% between 100-130 Hz

Results

F0 mean trend

708090

100110120130140150160170180

0 10 20 30 40 50 60 70 80 90 100 110

Speakers

F0

mea

n (H

z)

Results

Median distribution of F0 for YM

0 0

5

10

31

22 21

10

6

2 20

0

5

10

15

20

25

30

35

70 80 90 100 110 120 130 140 150 160 170 Fler

Hz

N S

peak

ers

•Mean of medians 115,8 Hz – 68% between 100-130 Hz

Results

F0 Median trend

708090

100110120130140150160170

0 10 20 30 40 50 60 70 80 90 100 110

Speakers

Med

ian

s (H

z)

ResultsStandard deviations of F0 for YM

02

15

27

19

14 15

11

4

1 1 00

5

10

15

20

25

30

5 10 15 20 25 30 35 40 45 50 55 FlerHz

N S

peakers

•Mean of std’s 24,1 Hz – 56% between 10-25 Hz

Results

•Mean of baselines 86,3 Hz – 68% between 70-100 Hz

Baseline frequencies for YM

0 1 1 1

15 16

3127

13

3 1 00

10

20

30

40

30 40 50 60 70 80 90 100 110 120 130 Fler

Hz

N S

peak

ers

Results

F0 baseline trend

406080

100120140

0 10 20 30 40 50 60 70 80 90 100 110

Speakers

Bas

elin

es (H

z)

ResultsF0 Measure Robustness

20253035404550556065707580859095

100105110115120125130135140

REF REF_band MOB MOB_band MCA MCA_band CAS CAS_band

Recording device

Fre

quen

cy (

Hz) Mean

STD

Base

Median

Alt-IQ-base

Alt-base

Results

F0 measures of modal to shout

5

25

45

65

85

105

125

145

165

185

205

225

245

265

285

305

325

345

Harald

1

Harald

2

Harald

3

Harald

4

Harald

5

Henrik

1

Henrik

2

Henrik

3

Henrik

4

Henrik

5

Niclas

1

Niclas

2

Niclas

3

Niclas

4

Niclas

5

Peter1

Peter2

Peter3

Peter4

Peter5

Prefek

t1

Prefek

t2

Prefek

t3

Prefek

t4

Prefek

t5

Stark1

Stark2

Stark3

Stark4

Stark5

Speakers Effort 1-5

Hz

Mean

STD

Base

Median

Alt-IQ-base

Alt-base

ResultsLiveliness illustration

0

10

20

30

40

50

60

70

80

90

100

110

carrier neutral happy sad angry

Liveliness

F0

(Hz)

Mean

STD

Base

Median

Alt-IQ-base

Alt-Base

Conclusions

• The median is more robust than the mean when it comes to technical factors, i.e. less sensitive to outliers.– Yes. Manual check and results confirm this.

• The base value shows least within speaker variation of presented measures within a voice modality.– Yes. Shouting or raising one’s voice can mean raising one’s

base value.

– 68% within 30 Hz, same as median.

• The 5% limit frequency is more robust than the base value when the technical factor means positive octave jumps.– Yes. Robustness test.

Conclusions

• F0 should be measured in case work.

• If baseline values are different there should be a reasonable explanation for it not to indicate speaker difference.– Such as ‘voice modality’ (creak, shout etc.)

differences.

Future work

• F0 mode (ongoing) and individual histograms.

• More measures on different “liveliness” levels for same and different speakers on different recording devices.

• Sample size vs. content.

• Authentic case material.

• Separate study of creaky voice.

Thank you

for your attention.

Questions?

jonas@ling.gu.se

http://www.ling.gu.se/~jonas

ReferencesBoersma, P. & Weenink, D. (2005) Praat: doing phonetics by computer (Version 4.3.27)

[Computer program] Retrieved October 7, 2005, from http://www.praat.org/Braun, A. (1995) Fundamental frequency – how speaker-specific is it?, in Braun and

Köster (eds) (1995): 9-23Brottsförebyggande Rådet: [www] Retrieved November 26, 2005, from http://www.bra.se/Bruce, G. (1982) Developing the Swedish Intonation Model. In Working Papers 22 (Lund

University, Dep of Linguistics, 51-116.Jassem, W., Steffen-Batog, S., and Czajka, M. (1973) Statistical characteristics short-term

average F0 distributions as personal voice features, in W. Jassem (ed.) (1973) Speech Analysis and Synthesis vol. 3:209-25, Warsaw: Polish Academy of Science.

Kitzing, P. (1979) Glottografisk frekvensindikering: En undersökningsmetod för mätning avröstläge och röstomfång samt framställning av röstfrekvensdistributionen (Lund University,Malmö)

Nolan, F. (1983) The Phonetic Bases of Speaker Recognition, Cambridge: Cambridge University Press.

Traunmüller, H. (1994) Conventional, biological, and environmental factors in speech communication: A modulation theory. Phonetica 51: 170 - 183.

Traunmüller, H. & Eriksson, A. (1995) The frequency range of the voice fundamental in the speech of male and female adults. Unpublished Manuscript (can be retrieved from http://www.ling.su.se/staff/hartmut/aktupub.htm)

Traunmüller, H. & Eriksson, A. (1995) The perceptual evaluation of F0-excursions in speech as evidenced in liveliness estimations. J. Acoust. Soc. Am. 97: 1905 - 1915.

Hartmut Traunmüller and Anders Eriksson (2000) "Acoustic effects of variation in vocal effort by men, women, and children", J. Acoust Soc. Am. 107: 3438 - 3451.

Rose, P. (2002) Forensic Speaker Identification. New York, Taylor & Francis.Rose, P. (1991) How effective are long term mean and standard deviation as normalisation

parameters for tonal fundamental frequency?, Speech Communication 10:229-247