Date post: | 03-Jun-2018 |
Category: |
Documents |
Upload: | abhishek-goyal |
View: | 233 times |
Download: | 0 times |
of 81
8/12/2019 Tutorial Slides Eriksson Drzygajlo
1/81
Forensic Speech Science
Part I: Forensic Phonetics
Anders Eriksson
Department of Linguistics,
Gothenburg University,Gothenburg, Sweden
Historical background
Man has always had strong
intuitions about the reliability ofvoice recognition:
The voice of the speaker is as
easily distinguished by the ear as
the face is by the eye
Quintillian, 3596 AD
1
8/12/2019 Tutorial Slides Eriksson Drzygajlo
2/81
Historical background
An early court case:
In 1660, William Hulet was
accused of having executed King
Charles I. A witness, Richard
Gittens, testified that he knew that
it was Hulet by his speech.
Historical background
On March 1, 1932 the son of the famous aviator
Charles Lindbergh was kidnapped and was later
found dead. The crime has been called the Crimeof the century because of the enormous publicity
it attracted. Its interest in forensic phonetics,
however, has to do with voice recognition and
memory.
Before it became known that the boy was dead, a
ransom was paid to the kidnapper by a negotiator.On that occasion, April 2, 1932, Lindbergh heard
the kidnappers voice, but could not see him.
2
8/12/2019 Tutorial Slides Eriksson Drzygajlo
3/81
Historical background
In September 1934, 29 months afterhearing the voice of the kidnapper,
Lindbergh (in disguise) was confronted
with the suspected kidnapper, Bruno
Hauptmann, who was instructed to
repeat the phrase Lindbergh had heard.
Lindbergh then claimed that herecognized the voice as the one he had
heard 29 months earlier.
Historical background
At the trial in January 1935, Lindberghtestifies under oath that the suspects voice is
the one he had heard 29 months earlier.
3
8/12/2019 Tutorial Slides Eriksson Drzygajlo
4/81
Historical background
The invention of
the soundspectrograph meant
a breakthrough in
speech analysis. A
first model was
built at Bell Labs
in the early forties.
Historical background
The original motivation behind the
development of the spectrographwas the phonetic study of speech.
a method of approach to studies
of speech production and
measurement
Steinberg, 1934
4
8/12/2019 Tutorial Slides Eriksson Drzygajlo
5/81
Historical background
A real time spectrograph calledDirect
Translator was also produced, to be
used for pronunciation training for the
deaf and foreign language students.
Historical background
In spite of the general interest of the
spectrograph as a tool and the
suggested applications, no
publications describing the work
appeared from Bell labs until 1945.
Why?
Because the work was rated as a warproject.
5
8/12/2019 Tutorial Slides Eriksson Drzygajlo
6/81
Historical background
The reason could hardly have been
military applications of pronun-
ciation training for the deaf. It
must have been something else.
We have reasons to believe that
speaker identification by the use ofspectrograms was what gave the
research its war project rating.
Historical background
It has been suggested that one of the
intended applications was identi-
fying enemy war ships by identifyingtheir radio operators, but very little is
know about it.
The term voiceprint appears in
some publications but without
explicit reference to speaker
identification.
6
8/12/2019 Tutorial Slides Eriksson Drzygajlo
7/81
Historical background
If the people at Bell Labs, sponsored by the
military, secretly worked on voiceprints forspeaker identification purposes, as we have
reasons to believe, then the early history of
the voiceprints follows very parallel tracks
in the USSR, including the fact that we know
very little about it.
The only (?) account of the Soviet efforts we
have is the novel The First Circleby
Solzhenitsyn.
Historical background
The plot of the novel takes place within a
time-span of only three days during the
Christmas Holiday of 1949 and the setting isthe Mvrino prison at the outskirts of
Moscow, were the Stalinist regime held
unreliable scientist imprisoned.
The prison had its own acoustics laboratory
and the so called Clipped Speech Laboratorywhere work on speech coding took place.
7
8/12/2019 Tutorial Slides Eriksson Drzygajlo
8/81
Historical background
One day the focus shifted, at least
temporarily, from voice clipping to voice
recognition, when the people working in
the lab were given the task of identifying a
anonymous speaker in a tapped telephone
conversation by comparing the recorded call
with sample recordings of five suspects.
They were given only two days to complete
the task.
Historical background
Given that Siberia was a likely alternative
option, it comes as no surprise that theysucceeded with their task.
There is no detailed information in the
novel about the methods they used, but it
is obvious that they were familiar with
similar efforts outside the USSR.
8
8/12/2019 Tutorial Slides Eriksson Drzygajlo
9/81
Historical background
Based on the description in thenovel, it seems likely that the
spectrograph they used may
have been based on the description
by Steinberg published in JASA in
1934.
Historical background
This diagram in Steinbergs papers fits
the description in the novel very well.
9
8/12/2019 Tutorial Slides Eriksson Drzygajlo
10/81
Historical background
Screen shots from the American
television (1991) series based onThe First Circle.
Historical background
Two quotations from the novel:
The science of phonoscopy, born today,
December 26th 1949, does have a rationalcore.
They envisioned the system, like finger-
printing ... Any criminal conversation
would be recorded ... and the criminal
would be caught straight off, like a thiefwho had left his fingerprints on the safe
door.
10
8/12/2019 Tutorial Slides Eriksson Drzygajlo
11/81
Historical background
The term the inmates at Mvrinocoined for the use of acoustic
analysis as a means of speaker
identification wasphonoscopy. In
Russia today and many former east
European countries this is still theterm used.
Some fundamental issues
In the following sections we will
present a selection of importantissues in forensic phonetics trying
to describe problems as well as
solutions, and what we know at
present and do not yet know.
11
8/12/2019 Tutorial Slides Eriksson Drzygajlo
12/81
Voiceprints
Much of the story of voiceprintingin forensic phonetics revolves
around one particular man
Lawrence G. Kersta who was an
engineer at Bell and head of the lab
until he resigned in 1966 to start hisown company dedicated to forensic
phonetics.
Voiceprints
Between 1945, when people at
Bell started to publish again and1962 there was no mention of
voiceprints.
But in 1962 Kersta, still at Bell,
published a paper inNature titled
Voiceprint identification
12
8/12/2019 Tutorial Slides Eriksson Drzygajlo
13/81
Voiceprints
He also gave a paper at the ASAmeeting that same year called
"Voiceprint-identification
infallibility.
In both papers he described how
spectrograms could be used forspeaker identification.
Voiceprints
What made his claims so remarkable
was, however, the accuracy heclaimed for his method.
Based on visual comparison of key
words, his examiners achieved no
less than 99% correct identification
or better.
13
8/12/2019 Tutorial Slides Eriksson Drzygajlo
14/81
Voiceprints
In spite of his rather sensational claims
and the fact that his description of themethod was vague, to say the least, the
scientific community was slow to
react.
Up until 1966 when he resigned from
Bell to start his own company he
remained largely unchallenged.
Voiceprints
He therefore enjoyed some initial
success and his testimonies were
accepted as evidence by courts in
some, but not all, states.
He later began to meet with resistance,
however, when other researchers
tested the method of visual voicerecognition from spectrograms.
14
8/12/2019 Tutorial Slides Eriksson Drzygajlo
15/81
Voiceprints
Subjects in a study by Young andCampbell (1967), for example, using
the voiceprint technique, obtained
78.4% correct identifications for two
words spoken in isolation but only
38.3% when the same words weretaken from different contexts.
Voiceprints
Many others joined in as more and
more results indicated that themethod was by no means as
reliable as Kersta had claimed.
But there were also those who
supported him, most notably Tosi
who was a qualified phonetician.
15
8/12/2019 Tutorial Slides Eriksson Drzygajlo
16/81
Voiceprints
A weak point in addition to the factthat the results could not be
reproduced was that there was never
a detailed, explicit description of the
method. We may rather safely
assume, however, that it was largelyintuitively based.
Voiceprints
The controversy continued until the
late eighties and voiceprinting is stilldone by private detectives and other
non-academic experts but nobody
in the speech science community
believes in its usefulness for forensic
purposes any more.
16
8/12/2019 Tutorial Slides Eriksson Drzygajlo
17/81
Voiceprints
What we, as forensic phoneticians,may learn from this experience is not
so much that the methods were not
sufficiently reliable, but that they
were put to use in forensic field work
without having been thoroughly testedand that professional phoneticians
were far too slow to react to it.
Voice recognition and memory
As we mentioned, the Lindbergh case
raised questions about voice
recognition accuracy and memory. A
researcher who questioned whether it
would be possible to accurately
remember an unknown voice over a
period of two years was a psychologist
by the name of Francis McGehee.
17
8/12/2019 Tutorial Slides Eriksson Drzygajlo
18/81
Voice recognition and memory
In the first of her experiments the
listeners heard a speaker read a 56-word passage. They were then
assigned to groups who heard the
speaker as one of the speakers in a
voice line-up with five foils at
intervals of 1, 2, and 3 days, 1, 2and 3 weeks and 1, 3, and 5 months
respectively.
Voice recognition and memory
Recognition rate varied as a function
of time starting at a little over 80%correct identifications after a lapse of 1
day or 1 week. After 2 weeks the
recognition rate had fallen to 69%,
after a month to 57%, after 3 months
to 35% and after 5 months it was downto 13%, which is less than chance.
18
8/12/2019 Tutorial Slides Eriksson Drzygajlo
19/81
Voice recognition and memory
Later studies have in general confirmed her
findings although the precise decay rate
may vary from study to study.
0 5 10 15 20
Time lapse (weeks)
10
20
30
40
50
60
70
80
Correctidentifications(%)
Non-contemporary speech samples
The term refers to speech samples,
which are obtained at differentpoints in time and later used in an
identification process. The relevant
question in forensic phonetics is at
what separation in time between
speech samples, change over timebecomes a problematic factor.
19
8/12/2019 Tutorial Slides Eriksson Drzygajlo
20/81
Non-contemporary speech samples
In forensic cases time spans of a year
or more between a suspect recording
and a later attempt at identifying the
speaker are not unusual. It is therefore
important to know if voice changes
that take place over a period of one or
a few years may affect the accuracy of
speaker recognition.
Non-contemporary speech samples
This question has been addressed in
a series of studies by Hollien and
Schwartz (2000).
They tested latencies between
recordings from 4 weeks up to 20
years.
20
8/12/2019 Tutorial Slides Eriksson Drzygajlo
21/81
Non-contemporary speech samples
There was a drop in correct
identification from around 95% forcontemporary samples to 7085% for
latencies from 4 weeks to 6 years
(with no observable time trend in the
interval). For the 20-year latency,
however, a sharp drop down to 35%could be observed.
Non-contemporary speech samples
For similar voices,
however, there was a
dramatic effect.
Performance dropped
from around 95% for
contemporary
samples to 40% forsamples recorded
only 4 weeks later.
In the normal case, non-contemporary speech thus
seems to affect identification only marginally.
21
8/12/2019 Tutorial Slides Eriksson Drzygajlo
22/81
Other issues involving the sample
Other factors that may influence identification
accuracy are primarily sample duration and
acoustic quality.
If we first consider the influence of sample
duration, we may observe that in real life
investigations samples may be very short, often
just a few words or a phrase or two whichmeans that sample duration is on the order of a
few seconds.
Other issues involving the sample
In an early study by Pollack et al. (1954) the
authors observed that identification accuracy
increased with sample size but only up to about
1.2 seconds. For longer samples phonetic
variation took over as the most important factor.
They conclude that duration per se is
relatively unimportant, except insofar as it
admits a larger or smaller statistical sampling ofthe speakers speech repertoire.
22
8/12/2019 Tutorial Slides Eriksson Drzygajlo
23/81
Other issues involving the sample
This somewhat surprising finding has,
however, been confirmed in other
studies. Bricker and Pruzansky (1966)
presented stimuli which varied in
duration as well as phonemic variation.
They found that identification rate
increased with duration only if thelonger stimuli also contained more
phonemic variation.
Other issues involving the sample
It is important to point out, however, that
while an increase in correct identifications
is desirable it is equally desirable to keepthe number of false alarms down.
Yarmey and Matthys (1992) found that:
The facilitating effect on identification of
longer voice-sample durations was
counteracted by the high false alarm ratesin both suspect-present and suspect-absent
line-ups.
23
8/12/2019 Tutorial Slides Eriksson Drzygajlo
24/81
Other issues involving the sample
A large proportion of threats and abuse isdone over the telephone. Telephone quality
speech has therefore received some
attention in forensic phonetics studies.
An important question in the forensic
context is whether the poorer sound quality
of recorded telephone conversations
adversely affects voice identification.
Other issues involving the sample
It is a common belief that because of the
difference in sound quality, speaker
identification of voices heard over the
telephone must necessarily be performed
using voices recorded over the telephone,
the underlying assumption being that the
difference in sound quality would make
identification less reliable if directlyrecorded voice samples were used.
24
8/12/2019 Tutorial Slides Eriksson Drzygajlo
25/81
Other issues involving the sample
There are surprisingly few studies that
address this question but the results thereare indicate that the problem might not be
as serious as one might expect.
Rathborn et al.(1981) did not find any
significant differences in identification of a
target voice heard over the telephone and
tested using a taped lineup over thetelephone, in contrast to voice identification
tested directly with a taped lineup.
Other issues involving the sample
A question that has received some attention
lately is the influence on acoustic analysis
of voice samples of the band-pass filtering
that occurs in telephone transmissions.
Knzel (2001) found that the lower cut-off
frequency had the effect of shifting F1 in
German vowels upwards compared to the
corresponding tokens in a simultaneous
DAT-recording. The average frequency
shift was on the order of 6%
25
8/12/2019 Tutorial Slides Eriksson Drzygajlo
26/81
Familiarity with the speaker
Hollien et al. studied
speaker identification asa function of familiarity
under three speaking
conditions, normal,
stressed and disguised.
Listeners who were
familiar with thespeakers performed
significantly better under
all conditions.
Familiarity with the speaker
These results have generally been confirmed
in other studies.
It is important to point out, however, that
although recognition rates are generally high
for familiar speakers, recognition is by no
means always perfect. For individual
speakers and listeners the error rates can be
very high if the utterances are short andbelong to a fairly large open set. (Ladefoged
& Ladefoged, 1980)
26
8/12/2019 Tutorial Slides Eriksson Drzygajlo
27/81
Familiarity with the speaker
An influence of utterance length on therecognition of familiar speakers has also
been found in other studies.
In a series of experiments reported by Rose
and Duncan (1995), recognition of familiar
speakers varied from chance level to nearly
perfect as a function of utterance length.
Familiarity with the speaker
It has been generally assumed that in
voice recognition, discrimination
constitutes the initial step with recog-
nition occurring as a later phase.
But Van Lanckeret al. have shown
that discrimination and recognition
are not stages in one process, but aredissociated, unordered abilities.
27
8/12/2019 Tutorial Slides Eriksson Drzygajlo
28/81
Familiarity with the speaker
It is therefore entirely possible thata listener who is good at
recognizing familiar speakers may
perform badly if the task is to
discriminate between unfamiliar
speakers.
Disguise
Voice disguise, to the extent that it is
used, may be a serious problem forspeaker identification. In the extreme
end of the spectrum we find electronic
manipulation or even communicating
via speech synthesis, which would
make speaker identification virtuallyimpossible.
28
8/12/2019 Tutorial Slides Eriksson Drzygajlo
29/81
Disguise
In the world of real forensic work,
however, voice disguise tends to be of arather unsophisticated nature.
Knzel, based on experience from the
German Federal Police (BKA), notes that
falsetto, pertinent creaky voice,
whispering, faking a foreign accent, andpinching ones nose are the most
common types.
Disguise
Even unsophisticated types of disguise
may have a considerable detrimental effect
on speaker identification. In a study by
Reich and Duke all types produced
significantly fewer correct identifications.
Hypernasality produced the greatest effect.
Whisper resulted in markedly fewer
correct identifications in a study byOrchard and Yarmey.
29
8/12/2019 Tutorial Slides Eriksson Drzygajlo
30/81
Disguise
Voice disguise is not as common asone might think. Knzel reports that:
Over the last two decades, between 15
and 25 per cent of the annual cases
dealt with at the BKA speaker
identification section exhibited at leastone kind of disguise.
Disguise
Electronically manipulated messages are
still rare, but Knzel notes that there hasbeen an increase in recent years, mainly
in the form of editing recorded voices.
While at present electronic manipulation
is rare and therefore not a significant
problem, that may soon change, withincreasing availability of such devices.
30
8/12/2019 Tutorial Slides Eriksson Drzygajlo
31/81
It is generally found that foreign accent makes
identification more difficult, but the difference isoften small and not always present.
McGehee found no difference at all using
speakers with a German accent.
Doty (1998) on the other hand found substantial
differences using speakers from the US and
England speaking English as a native languageand speakers from France and Belize speaking
English as a foreign language and native speakers
of English as listeners (88% vs. 13%).
Foreign Accents
Foreign Accents
Results by Goldstein, et al. (1981) fall
somewhere in between: With relativelylong speech samples, accented voices
were no more difficult to recognize than
were unaccented voices; reducing the
speech sample duration decreased
recognition memory for accented and
unaccented voices, but the reduction wasgreater for accented voices.
31
8/12/2019 Tutorial Slides Eriksson Drzygajlo
32/81
Foreign languages
Thompson (1987) recorded six bilingualmale students reading messages in English,
Spanish, and English with a strong Spanish
accent.
Voices were best identified by monolingual
English speaking listeners when speaking
English and worst when speaking Spanish.Identification accuracy was intermediate
for the accent condition.
Schiller and Kster (1996) tested
Americans with no knowledge of
German, Americans who knew someGerman, and native German speakers
using recordings of German speakers.
Subjects with no knowledge of German
made significantly more errors than
other subjects. Subjects who knew someGerman performed similarly to native
German speakers.
Foreign languages
32
8/12/2019 Tutorial Slides Eriksson Drzygajlo
33/81
Kster and Schiller (1997) used Spanish and
Chinese listeners.
Spanish and Chinese listeners who were
familiar with German showed better
recognition rates than listeners with no
knowledge of German.
Spanish and Chinese listeners with aknowledge of German performed measurably
worse than the German and English listeners
with a knowledge of German.
Foreign languages
We may summarize the results by saying
that listeners with no knowledge of alanguage perform worse on voice
recognition than listeners with some
knowledge or native speakers, while
listeners with some knowledge of the
language tend to perform on the same level
as native speakers or only slightly below.
Foreign languages
33
8/12/2019 Tutorial Slides Eriksson Drzygajlo
34/81
Earwitnesses
Factors, which are relevant for speakerrecognition in general, like memory,
familiarity, disguise etc. are also
relevant for earwitnesses, but there are
additional factors about which we
presently do not know as much as we
would like.
Earwitnesses
The first such factor is stress.
the majority of (the relatively few)studies of earwitnessing bear little
resemblance to real-life witnessing
circumstances. Most have used
nonstressful situations with prepared
subjects participating in laboratory
situations Bull and Clifford (1984)
34
8/12/2019 Tutorial Slides Eriksson Drzygajlo
35/81
Earwitnesses
The stress that witnesses may experiencein a real life situation can never be fully
recreated in a laboratory experiment.
Neither can we, or the witness, have much
experience to draw on that will help us
determine just how and how much the
capabilities of a traumatized victim torecognize a voice or discriminate between
voices may be affected.
Earwitnesses
Another factor is familiatity.
personal experience of voice recognition,
is always of familiar voices the voices
that are notusually those to be identified in
criminal situations (Bull and Clifford)
And as we know from the work by Van
Lancker and Kreiman, recognizing a
familiar voice and discriminating between
unfamiliar ones are independent abilities.
35
8/12/2019 Tutorial Slides Eriksson Drzygajlo
36/81
Earwitnesses
A third factor is preparednessWhereas subjects in a laboratory
experiment are, to a higher or lesser
degree, prepared for the situation, real
life witnesses are in most cases not.
Studies have shown that voiceidentification accuracy under unprepared
conditions is much lower.
Earwitness line-ups
An earwitness line-up (or voice parade) is meant
to be the auditory equivalent of an eyewitness
line-up. It is used when a person has heard but notseen the perpetrator.
Recordings of a suspects voice and number of
foils are presented and the witness is to compare
the voices with the memory of the perpetrators
voice and determine if any of them matches the
memory.
36
8/12/2019 Tutorial Slides Eriksson Drzygajlo
37/81
Earwitness line-ups
Two important questions in connectionwith earwitness line-ups are
1) how many voices should be present
in the line-up?
2) how similar to the suspects voice
should the voices of the foils be?
Earwitness line-ups
It has been found that with few voices
there may be marked position effects andthat the number of correct identifications
decreases as lineup size increases. So the
question is if there is an optimal size
where the position effect is minimized
and the decrease in correct
identifications has bottomed out.
37
8/12/2019 Tutorial Slides Eriksson Drzygajlo
38/81
Earwitness line-ups
A number of studies have addressed thequestion of lineup size. They are in
reasonable agreement that the decrease in
identification accuracy bottoms out with
about 6 foils and that position effects only
appear if the target voice comes first.
Thus, as a rule of thumb at least, 5 or 6
foils should be used.
Earwitness line-ups
How similar to the target should the foils
be?
At least the two extremes must be avoided.
The target voice must not stand out as
different. The speakers must be reasonably
matched with respect characteristics like
speaker age, dialect etc.
On the other hand they should not besound-alikes.
38
8/12/2019 Tutorial Slides Eriksson Drzygajlo
39/81
Earwitness line-ups
When Rothman (1977) used sound-
alikes (brothers, fathers, sons)identification dropped from 94%
(ordinary foils) to 58% (sound-alikes).
Similar results were obtained by Hollien
and Schwartz (2000).
Thus foils should be chosen so as torepresent a reasonable degree of
variation but avoiding the extremes.
Lie detection
Attempts have been made recently to use
brain scanning methods in order to studythe possibility of consistent differences
in brain activity patterns which separate
lie or deception from truthful statements.
Although this research is only in its
infancy, some highly interesting results
have been obtained.
39
8/12/2019 Tutorial Slides Eriksson Drzygajlo
40/81
Lie detection
Langleben et al. (2002) used Functional
Magnetic Resonance Imaging (fMRI) todetect differences in brain activity when
their subjects told a lie compared to when
they told the truth. Their results indicate
that: There is a neurophysiological
difference between deception and truth at
the brain activation level that can be
detected with fMRI. Similar results have
been obtained in other studies.
Lie detection
High resolution thermal imaging
which can detect minor regionalchanges in the blood flow in the
face for example has also been used
in an attempt to develop methods to
detect lie and deception (Pavlidis
and Levine, 2002).
40
8/12/2019 Tutorial Slides Eriksson Drzygajlo
41/81
Lie detection
We should be aware that these are
very preliminary results. When, and
if, these methods can be put to use in
forensic fieldwork we will not know
for many years to come. We must
also be aware that there may be a
very long way to go between research
results and reliable field applications.
Lie detection
Unfortunately this is not always the
case. Unproven technologies arebecoming increasingly attractive to
US law enforcement and security
agencies Laboratory tools from
infrared sensors to eye trackers are
being converted into lie detectors(Knight 2004).
41
8/12/2019 Tutorial Slides Eriksson Drzygajlo
42/81
Overgeneralization, charlatanry, fraud
The most well known lie detectoris the so called Polygraph. Its first
appearance can be dated back to
1917. A more refined version was
used in a court case in 1923 and
Polygraphs have been used eversince with some refinements.
Overgeneralization, charlatanry, fraud
The basic idea behind the Polygraph is
that lying increases the level of stress andif you can register the involuntary
reactions we know to be correlated with
stress (respiration, pulse, blood pressure,
and galvanic skin respons (e.g.palm
sweat), these signs can be used to detect
lies and deception.
42
8/12/2019 Tutorial Slides Eriksson Drzygajlo
43/81
Overgeneralization, charlatanry, fraud
A typical Polygraph setup.
Overgeneralization, charlatanry, fraud
The problem with the Polygraph as a lie
detector lies in the interpretation.
Correlations between stress levels and
pulse for example are found as group
results. To generalize from group results
to individuals is, of course, not a valid
step. Neither is it a valid step to conclude
that a person who experiences stressmust necessarily be lying.
43
8/12/2019 Tutorial Slides Eriksson Drzygajlo
44/81
Overgeneralization, charlatanry, fraud
The basic idea behind lie detectorsbased on voice analysis is that there are
properties in the voice signal that may be
reliably correlated with lie or deception.
Voice stress analysis (VSA), based on the
monitoring of so called micro tremor issuch a method.
Overgeneralization, charlatanry, fraud
But whereas there are scientifically
established correlations between stress
and the indicators used by the Poly-
graph, there is no scientific basis for
the voice stress analysis whatsoever.
The few in depth studies there are of
micro tremor in the larynx indicatethat it does not even exist.
44
8/12/2019 Tutorial Slides Eriksson Drzygajlo
45/81
Overgeneralization, charlatanry, fraud
But it does make pretty diagrams!
Overgeneralization, charlatanry, fraud
So what the VSA analyzers do is
measure the variation in somethingthat isnt even there, in itself an
achievement of sorts.
If the people who use these gadgets
dont know any better we may be
generous enough to call it charlatanry,the alternative being fraud of course.
45
8/12/2019 Tutorial Slides Eriksson Drzygajlo
46/81
Overgeneralization, charlatanry, fraud
Finally an example which without theslightest doubt may be classified as fraud.
An Israeli based company markets the
most wonderful tools including both lie
detectors and love detectors. The
technique behind the lie detector is said to
be something called Layered Voice
Analysis (LVA).
Overgeneralization, charlatanry, fraud
Here is how they claim it works
every event that passes through thebrain will leave its finger prints on
the speech flow. LVA Technology
ignores what your subject is saying,
and focuses only on his brain activity.
In other words, the how it is said iscrucial and not the what.
46
8/12/2019 Tutorial Slides Eriksson Drzygajlo
47/81
Overgeneralization, charlatanry, fraud
They are careful not to explicitly call
the gadget lie detector, but there is
absolutely no question that that is what
they want us to believe it is:
LVA is capable of detecting the
intention behind the lie, and by sodoing can lead you in identifying and
revealing the lie itself.
Overgeneralization, charlatanry, fraud
There is, of course, not a shred of
evidence for a relationship between
voice and brain activity of the proposed
kind. And a thorough scrutiny of the
description of the method in the
American patent documents confirms
the suspicion that the method is pure
nonsense, perhaps best described asstatistics based on digitization artefacts.
47
8/12/2019 Tutorial Slides Eriksson Drzygajlo
48/81
Overgeneralization, charlatanry, fraud
The statistics is based upon what is defined as
thorns and plateaus which has no relevanceat all for voice analysis and is moreover
dependent on how the signal is sampled.
Overgeneralization, charlatanry, fraud
Gadgets like these do not deserve to be
taken seriously as such, but their use in
forensic investigations must be. If bogus
lie detectors like the ones described here
are used not just by shady private
investigators, but by insurance
companies, police departments and
security agencies, this poses a threat thatwe must oppose more actively.
48
8/12/2019 Tutorial Slides Eriksson Drzygajlo
49/81
1
1
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Forensic Automatic Speaker Recognition
FORENSIC SPEECH SCIENCE
Dr. Andrzej Drygajlo
Speech Processing and Biometrics Group
Signal Processing Inst itute (ITS -LIAP)
Swiss Federal Institu te of Technology Lausanne (EPFL)
School of Criminal Sciences
University of Lausanne
2
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Biometric characteristics in forensic applications
Biological traces DNA (DeoxyriboNucleic Acid), blood, saliva,etc.
Biological (physiological) characteristics fingerprints , eye irises and retinas, hand palms and
geometry, and facial geometry
Behavioral characteristics dynamic signature, gait, keystroke dynamics, lip motion
Combined voice
49
8/12/2019 Tutorial Slides Eriksson Drzygajlo
50/81
2
3
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Popular biometric characteristics (modalities)
Fingerprint
Voice
Face
Retina
Signature
Iris
4
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Forensic Biometric Applications
Forensic Biometrics Individualisation of human beings
Challenge: to automate forensic biometric methods
Existing systems and databases
Automatic Fingerprint Identification System (AFIS USmade) and fingerprints databases
DNA sequencers and DNA databases
Challenge: Large scale automatic systems anddatabases for: speech, handwriting, face images,earmarks, etc.
50
8/12/2019 Tutorial Slides Eriksson Drzygajlo
51/81
3
5
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Constraints
Systems developed according to specifiedrecommendations from:
Tool perspective (recognition and computertechnology)
Forensic expert perspective (methodology)
Criminal policy perspective (investigation)
Legal perspective (impact of the application ofthe data and privacy protection law on theefficiency of the methods used)
Judicial perspective (the role of the court)
6
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Law enforcement and forensic applications
The law enforcement applications include the use ofbiometrics to recognize individuals
Apprehended or incarcerated because of criminal activity
Suspected of criminal activity
Whose movement is restricted as a result of criminal activity
The biometric may be used to identify non-cooperative andunknown subjects, to ensure that the correct inmates arereleased, or to verify that individuals under home arrest arein compliance
51
8/12/2019 Tutorial Slides Eriksson Drzygajlo
52/81
4
7
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Forensic Speaker Recognition
Aural-perceptual methods earwitnesses, line-ups
Visual methods and voiceprint? visual comparison of spectrograms of linguistically identical
utterances (utterly misleading!)
Aural-instrumental methods analytical acoustic approach combined with an auditory phonetic
analysis
Automatic methods Speaker verification not adequate
Speaker identification not adequate
Bayesian framework for the evaluation of identity
8
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Forensic specificity
Short utterances
Questioned recording - uncontrolled environment
Investigations in controlled conditions (longer utterances)
Telephone quality (95%)
Clear understanding of the inferential process
Respective duties of the actors involved in the judicialprocess: jurists, forensic experts, judges, etc.
The forensic experts role is to testify to the worth of the evidence by using,if possible a quantitative measure of this worth.
It is up to the judge and/or the jury to use this information as an aidto their deliberations and decision.
52
8/12/2019 Tutorial Slides Eriksson Drzygajlo
53/81
5
9
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Forensic Experts Role
A forensic expert testifying in court to a conclusion in anindividual case is not an advocate, but a witness whopresents factual information and offers a professionalopinion based upon that factual information.
Expert opinion testimony is, and will remain, one of themost powerful forms of evidence in the courtroom.
In order for it to be effective, it must be carefullydocumented , and expressed with precision, but withoutoverstatement, in as neutral and objective a way as theadversary system permits.
Professional concepts must be articulated in a way laypersons (like the judge and the lawyers) can understand.
10
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Individual Case
Trace Suspect
Casework
Suspected speakerreference database
Suspected speakersingle recording
Questioned recording
53
8/12/2019 Tutorial Slides Eriksson Drzygajlo
54/81
6
11
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Adversary System
The speaker at the origin
of the questioned recording
is not the suspected speaker
The suspected speaker
is the source of the
questioned recording
12
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Outline
Automatic Speaker Recognit ion
Voice as Evidence
Bayesian Interpretation of Evidence
Corpus Based Methodology Univariate Scoring Method
Multivariate Direct Method
Strength of Evidence
Evaluation of the Strength of Evidence
Mismatched Recording Conditions Aural Speaker Recognition
54
8/12/2019 Tutorial Slides Eriksson Drzygajlo
55/81
7
13
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Automatic Speaker Recognition
Speaker recognition is the general term used to include allof the many different tasks of discriminating people based onthe sound of their voices.
Speaker identification is the task of deciding, given asample of speech, who among many candidate speakerssaid it. This is an N-class decision task, where N is thenumber of candidate speakers.
Speaker verification is the task of deciding, given a sample
of speech, whether a specified candidate speaker said it.This is a 2-class decision task and is sometimes referred toas a speaker detection task.
14
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Recognitionresults
Speech
wave
Training
Recognition
Featureextraction
Referencetemplates/modelsfor each speaker
Similarity(Distance)
Principal structure of speaker recognition systems
Decision / Interpretation ?
55
8/12/2019 Tutorial Slides Eriksson Drzygajlo
56/81
8
15
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Principal structure of speaker recognition systems
Featureextraction
Similarity(Distance)
Models foreach speaker
ScoreSpeech wave
Training
Text-dependent methods:
- Dynamic Time Warping (DTW)- Hidden Markov Models (HMMs)
Text-independent methods:
- Vector Quantization (VQ)- Gaussian Mixture Models (GMMs)
16
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Frame
Window
Feature vector
Feature Extraction
56
8/12/2019 Tutorial Slides Eriksson Drzygajlo
57/81
9
17
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Gaussian Mixture Model (GMM)
1 2
1 2
1 2
( ) ( )
(1) (1) (1)
(
(2) (2) (2)
)T
T
T
v D v D
v
v v
v
v
v
v
D
Acoust ic vectors
for training
GMM
Feature 1 Feature 2 Feature D
Histograms
score = log-likelihood (speech | model)
18
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Speaker Verif ication
The odds form of Bayes theorem H0 the speakers model ( ) and the tested
recording (T) have the same source
H1 the speakers model ( ) and the testedrecording (T) have different sources
1
0
1
0
1
0( ) ( | ) ( | )
( ) ( | ) ( | )
P P T P T
P P TH H HP
H H
T
H =
0
1
( | )
( | )
P T
P T
>
Decision threshold
Likelihood ratio
0
1
57
8/12/2019 Tutorial Slides Eriksson Drzygajlo
58/81
10
19
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Outline
Automatic Speaker Recognition
Voice as Evidence
Bayesian Interpretation of Evidence
Corpus Based Methodology Univariate Scoring Method
Multivariate Direct Method
Strength of Evidence
Evaluation of the Strength of Evidence Mismatched Recording Conditions Aural Speaker Recognition
20
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Interpretation of Evidence
Bayesian interpretation (BI)Principle
The Bayesian model, proposed for forensic speaker recognitionby Lewis in 1984, allows for revision based on new information ofa measure of uncertainty (likelihood ratio of the evidence(province of the forensic expert)) which is applied to the pair ofcompeting hypotheses.
The Bayesian model shows how new data (questioned recording)can be combined with prior background knowledge (prior odds(province of the court)) to give posterior odds (province of thecourt) for judicial outcomes or issues.
prior odds x ? = posterior odds
58
8/12/2019 Tutorial Slides Eriksson Drzygajlo
59/81
11
21
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Strength of Evidence
Bayesian interpretation (BI)
( )( )
( )( )
( )( )
0 0
1 1
0
1
P E P E P
P P EH H
H HH
P EH =
prior
background
knowledge
posterior
knowledge
on the issue
New
Data
Prior odds Posterior oddsLikelihood
Ratio (LR)
province of the court province of the courtprovince of the
forensic expert
22
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Voice as Evidence
In the case of questioned recording (trace),the evidence does not consist in speechitself, but in the quantified degree ofsimilarity between speaker dependent
features extracted from the trace, andspeaker dependent features extracted fromrecorded speech of a suspect, representedby his/her model.
59
8/12/2019 Tutorial Slides Eriksson Drzygajlo
60/81
12
23
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Voice as Evidence
Featureextraction
Similarity(Distance)
Models foreach speaker
Score
Suspected speakerreference database (R)
Suspect
TraceEvidence (E)
Suspected speaker model
Signification ?
Bayesian InterpretationQuestioned recording
24
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Outline
Automatic Speaker Recognition
Voice as Evidence
Bayesian Interpretation of Evidence
Corpus Based Methodology Univariate Scoring Method
Multivariate Direct Method
Strength of Evidence
Evaluation of the Strength of Evidence
Mismatched Recording Conditions Aural Speaker Recognition
60
8/12/2019 Tutorial Slides Eriksson Drzygajlo
61/81
13
25
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Bayesian Interpretation of Evidence
The odds form of Bayes theorem H0 the suspected speaker is the source of the
questioned recording (within-source variability)
H1 the speaker at the origin of the questionedrecording is not the suspected speaker(between-sources variability)
1
0
1
0
1
0( ) ( | ) ( | )
( ) ( | ) ( | )
P P E P E
P P EH H HP
H H
E
H =
0
1
( | )( | )
P E
P HE
HLikelihood ratio Strength of evidence
similarity
typicality
26
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Outline
Automatic Speaker Recognition
Voice as Evidence
Bayesian Interpretation of Evidence
Corpus Based Methodology Univariate Scoring Method
Multivariate Direct Method
Strength of Evidence
Evaluation of the Strength of Evidence
Mismatched Recording Conditions Aural Speaker Recognition
61
8/12/2019 Tutorial Slides Eriksson Drzygajlo
62/81
14
27
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Uni- and Multivariate Methods
Scoring Method: Likelihoodcalculated from distribution ofscores modeling within-sourceand between-sources variability
H0 : distribution of scores ofwithin-source variability
H1 : distribution of scores ofbetween-sources variability
3 databases: Suspect Reference Database
(R)
Potential Population
Database (P) Suspect Control Database(C)
Direct Method: Likelihooddirectly calculated from GMM ofthe suspect and GMM of thepotential population
H0 : GMM of the suspect
H1 : GMMs of the potentialpopulation
2 databases : Suspect Reference Database
(R)
Potential Population Database(P)
Databases Used: R= 5 utterances per speaker (2-3 min each)
P = 100 speakers (2-3 min each)
C = 30-40 utterances per speaker (10-20 seceach)
28
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Corpus Based Methodology
3 databases (DBs) Potential population database (P)
Large-scale database used to model the potentialpopulation of speakers to evaluate the between-sourcesvariability
Suspected speaker reference database (R) Database recorded with the suspected speaker to model
her/his speech
Suspected speaker control database (C)
Database recorded with the suspected speaker toevaluate her/his within-source variability
62
8/12/2019 Tutorial Slides Eriksson Drzygajlo
63/81
15
29
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Scoring Method
Trace
Relevant population
Suspect
Casework
Suspected speakerreference database (R)
Suspected speakercontrol database (C)
Potential populationdatabase (P)
30
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Within-source variability
Featureextraction
Similarity(Distance)
Models foreach speaker
Scores
Suspected speakerreference database (R)
Suspect
Suspected speaker model
Distribution of thewithin-source variability
Suspect
Suspected speakercontrol database (C)
63
8/12/2019 Tutorial Slides Eriksson Drzygajlo
64/81
16
31
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Between-sources Variability
Featureextraction
Similarity(Distance)
Models foreach speaker
ScoresTrace
Speaker models of thepotential population
Questioned recording
Potential populationdatabase (P)
Distribution of thebetween-sources variability
32
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Evaluation of the within-source variability
O c
c u re n c e
s
Similarityscores
Comparison of the suspected speaker modelswith the utterances of his control database (C)
64
8/12/2019 Tutorial Slides Eriksson Drzygajlo
65/81
17
33
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Evaluation of the between-sources variability
O c
c u re n c
e s
Similarity scores
Comparison of the trace with the speaker models ofpotential population database (P)
34
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Likelihood ratio
P (E | H1) / P (E | H2) = 0.15 / 0.002 = 75
Similarity scores
E s
t i m a
t e d
p r o
b a
b i l
i t y
E = 6
65
8/12/2019 Tutorial Slides Eriksson Drzygajlo
66/81
18
35
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Outline
Automatic Speaker Recognition
Voice as Evidence
Bayesian Interpretation of Evidence
Corpus Based Methodology Univariate Scoring Method
Multivariate Direct Method
Strength of Evidence
Evaluation of the Strength of Evidence Mismatched Recording Conditions Aural Speaker Recognition
36
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Strength of Evidence - Likel ihood ratio
A likelihood ratio of 9.16obtained means that it is9.16 times more li kely
to observe the score (E)given the hypothesis H0(the suspect is the sourceof the questioned
recording) than given thehypothesis H1 (thatanother speaker from therelevant population is thesource of the questionedrecording).
66
8/12/2019 Tutorial Slides Eriksson Drzygajlo
67/81
19
37
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
DET (Detection Curve)
DET curve can be computed from distributions of scores with a variable threshold
38
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Analysis and compar ison
TracePotentialpopulationdatabase (P)
Featureextraction
Feature extractionand modelling
Featureextraction
Feature extractionand modelling
Suspectedspeakercontrol
database (C)
Suspectedspeaker
referencedatabase (R)
Features
Suspected
speakermodel Features
Relevant
speakersmodels
Comparativeanalysis
Comparativeanalysis
Comparativeanalysis
Similarityscores
Similarityscores
Evidence (E)
67
8/12/2019 Tutorial Slides Eriksson Drzygajlo
68/81
20
39
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Interpretation of the evidence
Similarityscores
Similarityscores
Evidence (E)
Modelling of thewithin-source variability
Modelling of thebetween-sources variability
Numerator of thelikelihood ratio
Denominator of thelikelihood ratio
Likelihood ratio (LR)
Distribution of thewithin-source variability
Distribution of thebetween-sources variability
40
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Individual Case
Trace Suspect
Casework
Suspected speakerreference database
Suspected speakersingle recording
Questioned recording
68
8/12/2019 Tutorial Slides Eriksson Drzygajlo
69/81
21
41
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Scoring Method with Limited Suspect Data
1
0
1
0
1
0( ) ( | ) ( | )
( ) ( | ) ( | )
P P E P E
P P EH H HP
H H
E
H =
The odds form of Bayes theorem H0 the two recordings have the same source
H1 the two recordings have different sources
Likelihood ratioStrength of evidencewith respect to new hypotheses0
1( | )( | )
P E
P HE
H
42
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Direct Method
The odds form of Bayes theorem H0 the speakers model ( ) and the
questioned recording (T) have the same source
H1 the speakers model ( ) and thequestioned recording (T) have different sources
1
0
1
0
1
0( ) ( | ) ( | )
( ) ( | ) ( | )
P P T P T
P P TH H HP
H H
T
H =
0
1
( | )
( | )
P T
P T
Likelihood ratio
0
1
Strength of evidence ?
69
8/12/2019 Tutorial Slides Eriksson Drzygajlo
70/81
22
43
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Multivariate (Direct) Method LR Numerator
Featureextraction
Similarity(Distance)
Models foreach speaker
Score
Suspected speakerreference database (R)
Suspect
Trace
Suspected speaker model
Numerator of the likelihood ratioQuestioned recording
score = log-likelihood (trace | H0)
44
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Featureextraction
Similarity(Distance)
Model ofall speakers
ScoreTrace
Model of the
potential population
Questioned recording
Potential populationdatabase (P)
Multivariate (Direct) Method LR Denominator
Denominator of the likelihood ratio
score = log-likelihood (trace | H1)
70
8/12/2019 Tutorial Slides Eriksson Drzygajlo
71/81
8/12/2019 Tutorial Slides Eriksson Drzygajlo
72/81
24
47
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Evaluation of the Strength of Evidence
Univariate (Scoring) Method
48
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Cumulative Density Functions
72
8/12/2019 Tutorial Slides Eriksson Drzygajlo
73/81
25
49
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Tippett plots (reliability-survival functions)
Univariate (Scoring) Method
50
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Evaluation of the Strength of Evidence
Multivariate (Direct) Method
73
8/12/2019 Tutorial Slides Eriksson Drzygajlo
74/81
26
51
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Tippett plots (reliability-survival functions)
Multivariate (Direct) Method
52
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Outline
Automatic Speaker Recognition
Voice as Evidence
Bayesian Interpretation of Evidence
Corpus Based Methodology Univariate Scoring Method
Multivariate Direct Method
Strength of Evidence
Evaluation of the Strength of Evidence
Mismatched Recording Conditions Aural Speaker Recognition
74
8/12/2019 Tutorial Slides Eriksson Drzygajlo
75/81
27
53
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Using databases with mismatched recording conditions
FBI NIST 2002 Database : 2
conditions (Microphone -
Telephone)
The extent of mismatch can be measured using statistical testing
54
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Compensating for Mismatch
E
H1scores(matched conditions)
Pot Pop. H1scores(mismatched conditions)
Ho scores(matched conditions)
Not compensating for mismatch can be the dif ference between
an LR < 1 and an LR > 1
75
8/12/2019 Tutorial Slides Eriksson Drzygajlo
76/81
28
55
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Outline
Automatic Speaker Recognition
Voice as Evidence
Bayesian Interpretation of Evidence
Corpus Based Methodology Univariate Scoring Method
Multivariate Direct Method
Strength of Evidence
Evaluation of the Strength of Evidence Mismatched Recording Conditions Aural Speaker Recognition
56
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Experimental Framework
Listeners
90 listeners whose mother-tongue is French
Laypersons with no phonetic training
Same computer and headphones
Training
No limitation on the number of listening trials
Testing
Verbal scores scale from 1 through 7
Perceptual cues
AuralAural Speaker RecognitionSpeaker Recognition
76
8/12/2019 Tutorial Slides Eriksson Drzygajlo
77/81
29
57
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Perceptual Verbal Scale and Perceptual CuesPerceptual Verbal Scale and Perceptual Cues
Score 1 I am sure that the two speakers are not the same Score 2 I am almost sure that the two speakers are not the same Score 3 It is possible that the two speakers are not the same Score 4 I cannot decide Score 5 It is possible that the two speakers are the same Score 6 I am almost sure that the two speakers are the same Score 7 I am sure that the two speakers are the same
Perceptual Verbal Scale
58
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Strength of Evidence for Aural RecognitionStrength of Evidence for Aural Recognition
0.0
0.2
0.4
0.6
1 2 3 4 5 6 7
Estimate
dProbability
H1 Ho
)(
)(
1
0
HEP
HEPLR =
EPerceptual Verbal Score
Likelihood Ratio (LR) = Ratio of the heights on the histograms for the
two hypotheses at the point "E"
Discrete scoresHistograms used to estimatethe probabilities of
scores for each hypothesis
77
8/12/2019 Tutorial Slides Eriksson Drzygajlo
78/81
30
59
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Evaluating Strength of Evidence in Matched ConditionsEvaluating Strength of Evidence in Matched Conditions
AuralAutomat ic
Similar separations between curves for aural and automatic systems
Ref. PSTN vs Traces PSTN
60
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Evaluating Strength of Evidence in Mismatched ConditionsEvaluating Strength of Evidence in Mismatched Conditions
Aural
Automat ic
Better curve separation in
aural recogniti onBetter evaluation of LR for aural
recognition in mismatched conditions
Ref. PSTN vs Traces Noisy PSTN
78
8/12/2019 Tutorial Slides Eriksson Drzygajlo
79/81
31
61
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
10-2
10-1
100
101
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Likelihood Ratio (LR)
EstimatedProbability
H0 Aural
H1 Aural
Automatic-Adapted
Automatic-Adapted
Evaluating Strength of Evidence in Adapted ConditionsEvaluating Strength of Evidence in Adapted Conditions
Adaptation for noisy conditions results in the improvementof performance of automatic recognition
Ref. PSTN vs Traces Adapted Noisy PSTN
62
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Admissibi lity of Scient if ic Evidence (USA)
Daubert criteria: whether the theory or technique can be, and has been
tested,
whether the technique has been published or subjectedto peer review,
whether actual or potential error rates have beenconsidered,
whether standards exist and are maintained to controlthe operation of the technique,
whether the technique is widely accepted within therelevant scientific community.
79
8/12/2019 Tutorial Slides Eriksson Drzygajlo
80/81
32
63
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
References
Ph. Rose, Forensic Speaker Identification, Taylor and Francis, London,2002.
D. Meuwly, A. Drygajlo, "Forensic Speaker Recognition Based on aBayesian Framework and Gaussian Mixture Modelling (GMM)", TheWorkshop on Speaker Recognition 2001: A Speaker Odyssey, Crete,Greece, June, 2001, pp. 145-150 .
A. Drygajlo, D. Meuwly, A. Alexander, "StatisticalMethods and Bayesian Interpretation of Evidence inForensic Automatic Speaker Recognition",EUROSPEECH'2003, Geneva, Switzerland, Sept. 2003,pp. 689-692.
A. Alexander, A. Drygajlo, "Scoring and Direct Methodsfor the Interpretation of Evidence in Forensic SpeakerRecognition, ICSLP 2004, Jeju, Korea, 2004.
64
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
References
F. Botti., A. Alexander, and A. Drygajlo, An interpretation framework for theevaluation of evidence in forensic automatic speaker recognition with limitedsuspect data, Odyssey 2004, The Speaker and Language RecognitionWorkshop, Toledo, Spain, 2004, pp. 6368.
A. Alexander, F. Botti, and A. Drygajlo, Handling Mismatch in Corpus-BasedForensic Speaker Recognition, Odyssey 2004, The Speaker and LanguageRecognition Workshop, Toledo, Spain, May 2004, pp. 6974
A. Alexander, F. Botti, D. Dessimoz, A. Drygajlo, "The Effect of MismatchedRecording Conditions on Human and Automatic Speaker Recognition in ForensicApplications", Forensic Science International, 146S (2004), pp. S95-S99.
D. Meuwly, A. Drygajlo, "A Bayesian Interpretation of Evidence in ForensicAutomatic Speaker Recognition", to be published in Forensic ScienceInternational.
J. Gonzalez-Rodriguez, A. Drygajlo, D. Ramos-Castro, M. Garcia-Gomar, J.Ortega-Garcia, "Robust Estimation, Interpretation and Assessment of LikelihoodRatios in Forensic Speaker Recognition", to be published in Computer Speechand Language.
80
8/12/2019 Tutorial Slides Eriksson Drzygajlo
81/81
65
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Conclusions
The Bayes model, current interpretation framework usedin forensic science, is adapted for forensic automaticspeaker recognition
The corpus based methodology provides a coherentway of assessing and presenting the evidence ofquestioned recording
Distributions of likelihood ratios can be used for the
evaluation of the performance of automatic and auralmethods in forensic speaker recognition applications
66
While there is certainly no perfect solution available in the field of forensicspeaker recognition at present, the scientific community is under a moralobligation to contribute whatever possible to aid the course of justice toestablish scientifically founded methodology and techniques
What is clearly needed is joint research initiatives of forensic scientists
and speech engineers in order to study problems arising from the actualtechnology and from practical work of forensic experts and gain a morecomplete insight into the concept of the individuality of voice
Conclusions