Home >Documents >Tutorial Slides Eriksson Drzygajlo

Tutorial Slides Eriksson Drzygajlo

Date post:03-Jun-2018
Category:
View:221 times
Download:0 times
Share this document with a friend
Transcript:
  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    1/81

    Forensic Speech Science

    Part I: Forensic Phonetics

    Anders Eriksson

    Department of Linguistics,

    Gothenburg University,Gothenburg, Sweden

    Historical background

    Man has always had strong

    intuitions about the reliability ofvoice recognition:

    The voice of the speaker is as

    easily distinguished by the ear as

    the face is by the eye

    Quintillian, 3596 AD

    1

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    2/81

    Historical background

    An early court case:

    In 1660, William Hulet was

    accused of having executed King

    Charles I. A witness, Richard

    Gittens, testified that he knew that

    it was Hulet by his speech.

    Historical background

    On March 1, 1932 the son of the famous aviator

    Charles Lindbergh was kidnapped and was later

    found dead. The crime has been called the Crimeof the century because of the enormous publicity

    it attracted. Its interest in forensic phonetics,

    however, has to do with voice recognition and

    memory.

    Before it became known that the boy was dead, a

    ransom was paid to the kidnapper by a negotiator.On that occasion, April 2, 1932, Lindbergh heard

    the kidnappers voice, but could not see him.

    2

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    3/81

    Historical background

    In September 1934, 29 months afterhearing the voice of the kidnapper,

    Lindbergh (in disguise) was confronted

    with the suspected kidnapper, Bruno

    Hauptmann, who was instructed to

    repeat the phrase Lindbergh had heard.

    Lindbergh then claimed that herecognized the voice as the one he had

    heard 29 months earlier.

    Historical background

    At the trial in January 1935, Lindberghtestifies under oath that the suspects voice is

    the one he had heard 29 months earlier.

    3

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    4/81

    Historical background

    The invention of

    the soundspectrograph meant

    a breakthrough in

    speech analysis. A

    first model was

    built at Bell Labs

    in the early forties.

    Historical background

    The original motivation behind the

    development of the spectrographwas the phonetic study of speech.

    a method of approach to studies

    of speech production and

    measurement

    Steinberg, 1934

    4

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    5/81

    Historical background

    A real time spectrograph calledDirect

    Translator was also produced, to be

    used for pronunciation training for the

    deaf and foreign language students.

    Historical background

    In spite of the general interest of the

    spectrograph as a tool and the

    suggested applications, no

    publications describing the work

    appeared from Bell labs until 1945.

    Why?

    Because the work was rated as a warproject.

    5

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    6/81

    Historical background

    The reason could hardly have been

    military applications of pronun-

    ciation training for the deaf. It

    must have been something else.

    We have reasons to believe that

    speaker identification by the use ofspectrograms was what gave the

    research its war project rating.

    Historical background

    It has been suggested that one of the

    intended applications was identi-

    fying enemy war ships by identifyingtheir radio operators, but very little is

    know about it.

    The term voiceprint appears in

    some publications but without

    explicit reference to speaker

    identification.

    6

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    7/81

    Historical background

    If the people at Bell Labs, sponsored by the

    military, secretly worked on voiceprints forspeaker identification purposes, as we have

    reasons to believe, then the early history of

    the voiceprints follows very parallel tracks

    in the USSR, including the fact that we know

    very little about it.

    The only (?) account of the Soviet efforts we

    have is the novel The First Circleby

    Solzhenitsyn.

    Historical background

    The plot of the novel takes place within a

    time-span of only three days during the

    Christmas Holiday of 1949 and the setting isthe Mvrino prison at the outskirts of

    Moscow, were the Stalinist regime held

    unreliable scientist imprisoned.

    The prison had its own acoustics laboratory

    and the so called Clipped Speech Laboratorywhere work on speech coding took place.

    7

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    8/81

    Historical background

    One day the focus shifted, at least

    temporarily, from voice clipping to voice

    recognition, when the people working in

    the lab were given the task of identifying a

    anonymous speaker in a tapped telephone

    conversation by comparing the recorded call

    with sample recordings of five suspects.

    They were given only two days to complete

    the task.

    Historical background

    Given that Siberia was a likely alternative

    option, it comes as no surprise that theysucceeded with their task.

    There is no detailed information in the

    novel about the methods they used, but it

    is obvious that they were familiar with

    similar efforts outside the USSR.

    8

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    9/81

    Historical background

    Based on the description in thenovel, it seems likely that the

    spectrograph they used may

    have been based on the description

    by Steinberg published in JASA in

    1934.

    Historical background

    This diagram in Steinbergs papers fits

    the description in the novel very well.

    9

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    10/81

    Historical background

    Screen shots from the American

    television (1991) series based onThe First Circle.

    Historical background

    Two quotations from the novel:

    The science of phonoscopy, born today,

    December 26th 1949, does have a rationalcore.

    They envisioned the system, like finger-

    printing ... Any criminal conversation

    would be recorded ... and the criminal

    would be caught straight off, like a thiefwho had left his fingerprints on the safe

    door.

    10

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    11/81

    Historical background

    The term the inmates at Mvrinocoined for the use of acoustic

    analysis as a means of speaker

    identification wasphonoscopy. In

    Russia today and many former east

    European countries this is still theterm used.

    Some fundamental issues

    In the following sections we will

    present a selection of importantissues in forensic phonetics trying

    to describe problems as well as

    solutions, and what we know at

    present and do not yet know.

    11

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    12/81

    Voiceprints

    Much of the story of voiceprintingin forensic phonetics revolves

    around one particular man

    Lawrence G. Kersta who was an

    engineer at Bell and head of the lab

    until he resigned in 1966 to start hisown company dedicated to forensic

    phonetics.

    Voiceprints

    Between 1945, when people at

    Bell started to publish again and1962 there was no mention of

    voiceprints.

    But in 1962 Kersta, still at Bell,

    published a paper inNature titled

    Voiceprint identification

    12

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    13/81

    Voiceprints

    He also gave a paper at the ASAmeeting that same year called

    "Voiceprint-identification

    infallibility.

    In both papers he described how

    spectrograms could be used forspeaker identification.

    Voiceprints

    What made his claims so remarkable

    was, however, the accuracy heclaimed for his method.

    Based on visual comparison of key

    words, his examiners achieved no

    less than 99% correct identification

    or better.

    13

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    14/81

    Voiceprints

    In spite of his rather sensational claims

    and the fact that his description of themethod was vague, to say the least, the

    scientific community was slow to

    react.

    Up until 1966 when he resigned from

    Bell to start his own company he

    remained largely unchallenged.

    Voiceprints

    He therefore enjoyed some initial

    success and his testimonies were

    accepted as evidence by courts in

    some, but not all, states.

    He later began to meet with resistance,

    however, when other researchers

    tested the method of visual voicerecognition from spectrograms.

    14

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    15/81

    Voiceprints

    Subjects in a study by Young andCampbell (1967), for example, using

    the voiceprint technique, obtained

    78.4% correct identifications for two

    words spoken in isolation but only

    38.3% when the same words weretaken from different contexts.

    Voiceprints

    Many others joined in as more and

    more results indicated that themethod was by no means as

    reliable as Kersta had claimed.

    But there were also those who

    supported him, most notably Tosi

    who was a qualified phonetician.

    15

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    16/81

    Voiceprints

    A weak point in addition to the factthat the results could not be

    reproduced was that there was never

    a detailed, explicit description of the

    method. We may rather safely

    assume, however, that it was largelyintuitively based.

    Voiceprints

    The controversy continued until the

    late eighties and voiceprinting is stilldone by private detectives and other

    non-academic experts but nobody

    in the speech science community

    believes in its usefulness for forensic

    purposes any more.

    16

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    17/81

    Voiceprints

    What we, as forensic phoneticians,may learn from this experience is not

    so much that the methods were not

    sufficiently reliable, but that they

    were put to use in forensic field work

    without having been thoroughly testedand that professional phoneticians

    were far too slow to react to it.

    Voice recognition and memory

    As we mentioned, the Lindbergh case

    raised questions about voice

    recognition accuracy and memory. A

    researcher who questioned whether it

    would be possible to accurately

    remember an unknown voice over a

    period of two years was a psychologist

    by the name of Francis McGehee.

    17

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    18/81

    Voice recognition and memory

    In the first of her experiments the

    listeners heard a speaker read a 56-word passage. They were then

    assigned to groups who heard the

    speaker as one of the speakers in a

    voice line-up with five foils at

    intervals of 1, 2, and 3 days, 1, 2and 3 weeks and 1, 3, and 5 months

    respectively.

    Voice recognition and memory

    Recognition rate varied as a function

    of time starting at a little over 80%correct identifications after a lapse of 1

    day or 1 week. After 2 weeks the

    recognition rate had fallen to 69%,

    after a month to 57%, after 3 months

    to 35% and after 5 months it was downto 13%, which is less than chance.

    18

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    19/81

    Voice recognition and memory

    Later studies have in general confirmed her

    findings although the precise decay rate

    may vary from study to study.

    0 5 10 15 20

    Time lapse (weeks)

    10

    20

    30

    40

    50

    60

    70

    80

    Correctidentifications(%)

    Non-contemporary speech samples

    The term refers to speech samples,

    which are obtained at differentpoints in time and later used in an

    identification process. The relevant

    question in forensic phonetics is at

    what separation in time between

    speech samples, change over timebecomes a problematic factor.

    19

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    20/81

    Non-contemporary speech samples

    In forensic cases time spans of a year

    or more between a suspect recording

    and a later attempt at identifying the

    speaker are not unusual. It is therefore

    important to know if voice changes

    that take place over a period of one or

    a few years may affect the accuracy of

    speaker recognition.

    Non-contemporary speech samples

    This question has been addressed in

    a series of studies by Hollien and

    Schwartz (2000).

    They tested latencies between

    recordings from 4 weeks up to 20

    years.

    20

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    21/81

    Non-contemporary speech samples

    There was a drop in correct

    identification from around 95% forcontemporary samples to 7085% for

    latencies from 4 weeks to 6 years

    (with no observable time trend in the

    interval). For the 20-year latency,

    however, a sharp drop down to 35%could be observed.

    Non-contemporary speech samples

    For similar voices,

    however, there was a

    dramatic effect.

    Performance dropped

    from around 95% for

    contemporary

    samples to 40% forsamples recorded

    only 4 weeks later.

    In the normal case, non-contemporary speech thus

    seems to affect identification only marginally.

    21

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    22/81

    Other issues involving the sample

    Other factors that may influence identification

    accuracy are primarily sample duration and

    acoustic quality.

    If we first consider the influence of sample

    duration, we may observe that in real life

    investigations samples may be very short, often

    just a few words or a phrase or two whichmeans that sample duration is on the order of a

    few seconds.

    Other issues involving the sample

    In an early study by Pollack et al. (1954) the

    authors observed that identification accuracy

    increased with sample size but only up to about

    1.2 seconds. For longer samples phonetic

    variation took over as the most important factor.

    They conclude that duration per se is

    relatively unimportant, except insofar as it

    admits a larger or smaller statistical sampling ofthe speakers speech repertoire.

    22

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    23/81

    Other issues involving the sample

    This somewhat surprising finding has,

    however, been confirmed in other

    studies. Bricker and Pruzansky (1966)

    presented stimuli which varied in

    duration as well as phonemic variation.

    They found that identification rate

    increased with duration only if thelonger stimuli also contained more

    phonemic variation.

    Other issues involving the sample

    It is important to point out, however, that

    while an increase in correct identifications

    is desirable it is equally desirable to keepthe number of false alarms down.

    Yarmey and Matthys (1992) found that:

    The facilitating effect on identification of

    longer voice-sample durations was

    counteracted by the high false alarm ratesin both suspect-present and suspect-absent

    line-ups.

    23

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    24/81

    Other issues involving the sample

    A large proportion of threats and abuse isdone over the telephone. Telephone quality

    speech has therefore received some

    attention in forensic phonetics studies.

    An important question in the forensic

    context is whether the poorer sound quality

    of recorded telephone conversations

    adversely affects voice identification.

    Other issues involving the sample

    It is a common belief that because of the

    difference in sound quality, speaker

    identification of voices heard over the

    telephone must necessarily be performed

    using voices recorded over the telephone,

    the underlying assumption being that the

    difference in sound quality would make

    identification less reliable if directlyrecorded voice samples were used.

    24

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    25/81

    Other issues involving the sample

    There are surprisingly few studies that

    address this question but the results thereare indicate that the problem might not be

    as serious as one might expect.

    Rathborn et al.(1981) did not find any

    significant differences in identification of a

    target voice heard over the telephone and

    tested using a taped lineup over thetelephone, in contrast to voice identification

    tested directly with a taped lineup.

    Other issues involving the sample

    A question that has received some attention

    lately is the influence on acoustic analysis

    of voice samples of the band-pass filtering

    that occurs in telephone transmissions.

    Knzel (2001) found that the lower cut-off

    frequency had the effect of shifting F1 in

    German vowels upwards compared to the

    corresponding tokens in a simultaneous

    DAT-recording. The average frequency

    shift was on the order of 6%

    25

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    26/81

    Familiarity with the speaker

    Hollien et al. studied

    speaker identification asa function of familiarity

    under three speaking

    conditions, normal,

    stressed and disguised.

    Listeners who were

    familiar with thespeakers performed

    significantly better under

    all conditions.

    Familiarity with the speaker

    These results have generally been confirmed

    in other studies.

    It is important to point out, however, that

    although recognition rates are generally high

    for familiar speakers, recognition is by no

    means always perfect. For individual

    speakers and listeners the error rates can be

    very high if the utterances are short andbelong to a fairly large open set. (Ladefoged

    & Ladefoged, 1980)

    26

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    27/81

    Familiarity with the speaker

    An influence of utterance length on therecognition of familiar speakers has also

    been found in other studies.

    In a series of experiments reported by Rose

    and Duncan (1995), recognition of familiar

    speakers varied from chance level to nearly

    perfect as a function of utterance length.

    Familiarity with the speaker

    It has been generally assumed that in

    voice recognition, discrimination

    constitutes the initial step with recog-

    nition occurring as a later phase.

    But Van Lanckeret al. have shown

    that discrimination and recognition

    are not stages in one process, but aredissociated, unordered abilities.

    27

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    28/81

    Familiarity with the speaker

    It is therefore entirely possible thata listener who is good at

    recognizing familiar speakers may

    perform badly if the task is to

    discriminate between unfamiliar

    speakers.

    Disguise

    Voice disguise, to the extent that it is

    used, may be a serious problem forspeaker identification. In the extreme

    end of the spectrum we find electronic

    manipulation or even communicating

    via speech synthesis, which would

    make speaker identification virtuallyimpossible.

    28

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    29/81

    Disguise

    In the world of real forensic work,

    however, voice disguise tends to be of arather unsophisticated nature.

    Knzel, based on experience from the

    German Federal Police (BKA), notes that

    falsetto, pertinent creaky voice,

    whispering, faking a foreign accent, andpinching ones nose are the most

    common types.

    Disguise

    Even unsophisticated types of disguise

    may have a considerable detrimental effect

    on speaker identification. In a study by

    Reich and Duke all types produced

    significantly fewer correct identifications.

    Hypernasality produced the greatest effect.

    Whisper resulted in markedly fewer

    correct identifications in a study byOrchard and Yarmey.

    29

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    30/81

    Disguise

    Voice disguise is not as common asone might think. Knzel reports that:

    Over the last two decades, between 15

    and 25 per cent of the annual cases

    dealt with at the BKA speaker

    identification section exhibited at leastone kind of disguise.

    Disguise

    Electronically manipulated messages are

    still rare, but Knzel notes that there hasbeen an increase in recent years, mainly

    in the form of editing recorded voices.

    While at present electronic manipulation

    is rare and therefore not a significant

    problem, that may soon change, withincreasing availability of such devices.

    30

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    31/81

    It is generally found that foreign accent makes

    identification more difficult, but the difference isoften small and not always present.

    McGehee found no difference at all using

    speakers with a German accent.

    Doty (1998) on the other hand found substantial

    differences using speakers from the US and

    England speaking English as a native languageand speakers from France and Belize speaking

    English as a foreign language and native speakers

    of English as listeners (88% vs. 13%).

    Foreign Accents

    Foreign Accents

    Results by Goldstein, et al. (1981) fall

    somewhere in between: With relativelylong speech samples, accented voices

    were no more difficult to recognize than

    were unaccented voices; reducing the

    speech sample duration decreased

    recognition memory for accented and

    unaccented voices, but the reduction wasgreater for accented voices.

    31

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    32/81

    Foreign languages

    Thompson (1987) recorded six bilingualmale students reading messages in English,

    Spanish, and English with a strong Spanish

    accent.

    Voices were best identified by monolingual

    English speaking listeners when speaking

    English and worst when speaking Spanish.Identification accuracy was intermediate

    for the accent condition.

    Schiller and Kster (1996) tested

    Americans with no knowledge of

    German, Americans who knew someGerman, and native German speakers

    using recordings of German speakers.

    Subjects with no knowledge of German

    made significantly more errors than

    other subjects. Subjects who knew someGerman performed similarly to native

    German speakers.

    Foreign languages

    32

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    33/81

    Kster and Schiller (1997) used Spanish and

    Chinese listeners.

    Spanish and Chinese listeners who were

    familiar with German showed better

    recognition rates than listeners with no

    knowledge of German.

    Spanish and Chinese listeners with aknowledge of German performed measurably

    worse than the German and English listeners

    with a knowledge of German.

    Foreign languages

    We may summarize the results by saying

    that listeners with no knowledge of alanguage perform worse on voice

    recognition than listeners with some

    knowledge or native speakers, while

    listeners with some knowledge of the

    language tend to perform on the same level

    as native speakers or only slightly below.

    Foreign languages

    33

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    34/81

    Earwitnesses

    Factors, which are relevant for speakerrecognition in general, like memory,

    familiarity, disguise etc. are also

    relevant for earwitnesses, but there are

    additional factors about which we

    presently do not know as much as we

    would like.

    Earwitnesses

    The first such factor is stress.

    the majority of (the relatively few)studies of earwitnessing bear little

    resemblance to real-life witnessing

    circumstances. Most have used

    nonstressful situations with prepared

    subjects participating in laboratory

    situations Bull and Clifford (1984)

    34

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    35/81

    Earwitnesses

    The stress that witnesses may experiencein a real life situation can never be fully

    recreated in a laboratory experiment.

    Neither can we, or the witness, have much

    experience to draw on that will help us

    determine just how and how much the

    capabilities of a traumatized victim torecognize a voice or discriminate between

    voices may be affected.

    Earwitnesses

    Another factor is familiatity.

    personal experience of voice recognition,

    is always of familiar voices the voices

    that are notusually those to be identified in

    criminal situations (Bull and Clifford)

    And as we know from the work by Van

    Lancker and Kreiman, recognizing a

    familiar voice and discriminating between

    unfamiliar ones are independent abilities.

    35

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    36/81

    Earwitnesses

    A third factor is preparednessWhereas subjects in a laboratory

    experiment are, to a higher or lesser

    degree, prepared for the situation, real

    life witnesses are in most cases not.

    Studies have shown that voiceidentification accuracy under unprepared

    conditions is much lower.

    Earwitness line-ups

    An earwitness line-up (or voice parade) is meant

    to be the auditory equivalent of an eyewitness

    line-up. It is used when a person has heard but notseen the perpetrator.

    Recordings of a suspects voice and number of

    foils are presented and the witness is to compare

    the voices with the memory of the perpetrators

    voice and determine if any of them matches the

    memory.

    36

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    37/81

    Earwitness line-ups

    Two important questions in connectionwith earwitness line-ups are

    1) how many voices should be present

    in the line-up?

    2) how similar to the suspects voice

    should the voices of the foils be?

    Earwitness line-ups

    It has been found that with few voices

    there may be marked position effects andthat the number of correct identifications

    decreases as lineup size increases. So the

    question is if there is an optimal size

    where the position effect is minimized

    and the decrease in correct

    identifications has bottomed out.

    37

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    38/81

    Earwitness line-ups

    A number of studies have addressed thequestion of lineup size. They are in

    reasonable agreement that the decrease in

    identification accuracy bottoms out with

    about 6 foils and that position effects only

    appear if the target voice comes first.

    Thus, as a rule of thumb at least, 5 or 6

    foils should be used.

    Earwitness line-ups

    How similar to the target should the foils

    be?

    At least the two extremes must be avoided.

    The target voice must not stand out as

    different. The speakers must be reasonably

    matched with respect characteristics like

    speaker age, dialect etc.

    On the other hand they should not besound-alikes.

    38

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    39/81

    Earwitness line-ups

    When Rothman (1977) used sound-

    alikes (brothers, fathers, sons)identification dropped from 94%

    (ordinary foils) to 58% (sound-alikes).

    Similar results were obtained by Hollien

    and Schwartz (2000).

    Thus foils should be chosen so as torepresent a reasonable degree of

    variation but avoiding the extremes.

    Lie detection

    Attempts have been made recently to use

    brain scanning methods in order to studythe possibility of consistent differences

    in brain activity patterns which separate

    lie or deception from truthful statements.

    Although this research is only in its

    infancy, some highly interesting results

    have been obtained.

    39

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    40/81

    Lie detection

    Langleben et al. (2002) used Functional

    Magnetic Resonance Imaging (fMRI) todetect differences in brain activity when

    their subjects told a lie compared to when

    they told the truth. Their results indicate

    that: There is a neurophysiological

    difference between deception and truth at

    the brain activation level that can be

    detected with fMRI. Similar results have

    been obtained in other studies.

    Lie detection

    High resolution thermal imaging

    which can detect minor regionalchanges in the blood flow in the

    face for example has also been used

    in an attempt to develop methods to

    detect lie and deception (Pavlidis

    and Levine, 2002).

    40

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    41/81

    Lie detection

    We should be aware that these are

    very preliminary results. When, and

    if, these methods can be put to use in

    forensic fieldwork we will not know

    for many years to come. We must

    also be aware that there may be a

    very long way to go between research

    results and reliable field applications.

    Lie detection

    Unfortunately this is not always the

    case. Unproven technologies arebecoming increasingly attractive to

    US law enforcement and security

    agencies Laboratory tools from

    infrared sensors to eye trackers are

    being converted into lie detectors(Knight 2004).

    41

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    42/81

    Overgeneralization, charlatanry, fraud

    The most well known lie detectoris the so called Polygraph. Its first

    appearance can be dated back to

    1917. A more refined version was

    used in a court case in 1923 and

    Polygraphs have been used eversince with some refinements.

    Overgeneralization, charlatanry, fraud

    The basic idea behind the Polygraph is

    that lying increases the level of stress andif you can register the involuntary

    reactions we know to be correlated with

    stress (respiration, pulse, blood pressure,

    and galvanic skin respons (e.g.palm

    sweat), these signs can be used to detect

    lies and deception.

    42

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    43/81

    Overgeneralization, charlatanry, fraud

    A typical Polygraph setup.

    Overgeneralization, charlatanry, fraud

    The problem with the Polygraph as a lie

    detector lies in the interpretation.

    Correlations between stress levels and

    pulse for example are found as group

    results. To generalize from group results

    to individuals is, of course, not a valid

    step. Neither is it a valid step to conclude

    that a person who experiences stressmust necessarily be lying.

    43

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    44/81

    Overgeneralization, charlatanry, fraud

    The basic idea behind lie detectorsbased on voice analysis is that there are

    properties in the voice signal that may be

    reliably correlated with lie or deception.

    Voice stress analysis (VSA), based on the

    monitoring of so called micro tremor issuch a method.

    Overgeneralization, charlatanry, fraud

    But whereas there are scientifically

    established correlations between stress

    and the indicators used by the Poly-

    graph, there is no scientific basis for

    the voice stress analysis whatsoever.

    The few in depth studies there are of

    micro tremor in the larynx indicatethat it does not even exist.

    44

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    45/81

    Overgeneralization, charlatanry, fraud

    But it does make pretty diagrams!

    Overgeneralization, charlatanry, fraud

    So what the VSA analyzers do is

    measure the variation in somethingthat isnt even there, in itself an

    achievement of sorts.

    If the people who use these gadgets

    dont know any better we may be

    generous enough to call it charlatanry,the alternative being fraud of course.

    45

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    46/81

    Overgeneralization, charlatanry, fraud

    Finally an example which without theslightest doubt may be classified as fraud.

    An Israeli based company markets the

    most wonderful tools including both lie

    detectors and love detectors. The

    technique behind the lie detector is said to

    be something called Layered Voice

    Analysis (LVA).

    Overgeneralization, charlatanry, fraud

    Here is how they claim it works

    every event that passes through thebrain will leave its finger prints on

    the speech flow. LVA Technology

    ignores what your subject is saying,

    and focuses only on his brain activity.

    In other words, the how it is said iscrucial and not the what.

    46

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    47/81

    Overgeneralization, charlatanry, fraud

    They are careful not to explicitly call

    the gadget lie detector, but there is

    absolutely no question that that is what

    they want us to believe it is:

    LVA is capable of detecting the

    intention behind the lie, and by sodoing can lead you in identifying and

    revealing the lie itself.

    Overgeneralization, charlatanry, fraud

    There is, of course, not a shred of

    evidence for a relationship between

    voice and brain activity of the proposed

    kind. And a thorough scrutiny of the

    description of the method in the

    American patent documents confirms

    the suspicion that the method is pure

    nonsense, perhaps best described asstatistics based on digitization artefacts.

    47

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    48/81

    Overgeneralization, charlatanry, fraud

    The statistics is based upon what is defined as

    thorns and plateaus which has no relevanceat all for voice analysis and is moreover

    dependent on how the signal is sampled.

    Overgeneralization, charlatanry, fraud

    Gadgets like these do not deserve to be

    taken seriously as such, but their use in

    forensic investigations must be. If bogus

    lie detectors like the ones described here

    are used not just by shady private

    investigators, but by insurance

    companies, police departments and

    security agencies, this poses a threat thatwe must oppose more actively.

    48

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    49/81

    1

    1

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Forensic Automatic Speaker Recognition

    FORENSIC SPEECH SCIENCE

    Dr. Andrzej Drygajlo

    [email protected]

    Speech Processing and Biometrics Group

    Signal Processing Inst itute (ITS -LIAP)

    Swiss Federal Institu te of Technology Lausanne (EPFL)

    School of Criminal Sciences

    University of Lausanne

    2

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Biometric characteristics in forensic applications

    Biological traces DNA (DeoxyriboNucleic Acid), blood, saliva,etc.

    Biological (physiological) characteristics fingerprints , eye irises and retinas, hand palms and

    geometry, and facial geometry

    Behavioral characteristics dynamic signature, gait, keystroke dynamics, lip motion

    Combined voice

    49

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    50/81

    2

    3

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Popular biometric characteristics (modalities)

    Fingerprint

    Voice

    Face

    Retina

    Signature

    Iris

    4

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Forensic Biometric Applications

    Forensic Biometrics Individualisation of human beings

    Challenge: to automate forensic biometric methods

    Existing systems and databases

    Automatic Fingerprint Identification System (AFIS USmade) and fingerprints databases

    DNA sequencers and DNA databases

    Challenge: Large scale automatic systems anddatabases for: speech, handwriting, face images,earmarks, etc.

    50

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    51/81

    3

    5

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Constraints

    Systems developed according to specifiedrecommendations from:

    Tool perspective (recognition and computertechnology)

    Forensic expert perspective (methodology)

    Criminal policy perspective (investigation)

    Legal perspective (impact of the application ofthe data and privacy protection law on theefficiency of the methods used)

    Judicial perspective (the role of the court)

    6

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Law enforcement and forensic applications

    The law enforcement applications include the use ofbiometrics to recognize individuals

    Apprehended or incarcerated because of criminal activity

    Suspected of criminal activity

    Whose movement is restricted as a result of criminal activity

    The biometric may be used to identify non-cooperative andunknown subjects, to ensure that the correct inmates arereleased, or to verify that individuals under home arrest arein compliance

    51

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    52/81

    4

    7

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Forensic Speaker Recognition

    Aural-perceptual methods earwitnesses, line-ups

    Visual methods and voiceprint? visual comparison of spectrograms of linguistically identical

    utterances (utterly misleading!)

    Aural-instrumental methods analytical acoustic approach combined with an auditory phonetic

    analysis

    Automatic methods Speaker verification not adequate

    Speaker identification not adequate

    Bayesian framework for the evaluation of identity

    8

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Forensic specificity

    Short utterances

    Questioned recording - uncontrolled environment

    Investigations in controlled conditions (longer utterances)

    Telephone quality (95%)

    Clear understanding of the inferential process

    Respective duties of the actors involved in the judicialprocess: jurists, forensic experts, judges, etc.

    The forensic experts role is to testify to the worth of the evidence by using,if possible a quantitative measure of this worth.

    It is up to the judge and/or the jury to use this information as an aidto their deliberations and decision.

    52

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    53/81

    5

    9

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Forensic Experts Role

    A forensic expert testifying in court to a conclusion in anindividual case is not an advocate, but a witness whopresents factual information and offers a professionalopinion based upon that factual information.

    Expert opinion testimony is, and will remain, one of themost powerful forms of evidence in the courtroom.

    In order for it to be effective, it must be carefullydocumented , and expressed with precision, but withoutoverstatement, in as neutral and objective a way as theadversary system permits.

    Professional concepts must be articulated in a way laypersons (like the judge and the lawyers) can understand.

    10

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Individual Case

    Trace Suspect

    Casework

    Suspected speakerreference database

    Suspected speakersingle recording

    Questioned recording

    53

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    54/81

    6

    11

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Adversary System

    The speaker at the origin

    of the questioned recording

    is not the suspected speaker

    The suspected speaker

    is the source of the

    questioned recording

    12

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Outline

    Automatic Speaker Recognit ion

    Voice as Evidence

    Bayesian Interpretation of Evidence

    Corpus Based Methodology Univariate Scoring Method

    Multivariate Direct Method

    Strength of Evidence

    Evaluation of the Strength of Evidence

    Mismatched Recording Conditions Aural Speaker Recognition

    54

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    55/81

    7

    13

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Automatic Speaker Recognition

    Speaker recognition is the general term used to include allof the many different tasks of discriminating people based onthe sound of their voices.

    Speaker identification is the task of deciding, given asample of speech, who among many candidate speakerssaid it. This is an N-class decision task, where N is thenumber of candidate speakers.

    Speaker verification is the task of deciding, given a sample

    of speech, whether a specified candidate speaker said it.This is a 2-class decision task and is sometimes referred toas a speaker detection task.

    14

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Recognitionresults

    Speech

    wave

    Training

    Recognition

    Featureextraction

    Referencetemplates/modelsfor each speaker

    Similarity(Distance)

    Principal structure of speaker recognition systems

    Decision / Interpretation ?

    55

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    56/81

    8

    15

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Principal structure of speaker recognition systems

    Featureextraction

    Similarity(Distance)

    Models foreach speaker

    ScoreSpeech wave

    Training

    Text-dependent methods:

    - Dynamic Time Warping (DTW)- Hidden Markov Models (HMMs)

    Text-independent methods:

    - Vector Quantization (VQ)- Gaussian Mixture Models (GMMs)

    16

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Frame

    Window

    Feature vector

    Feature Extraction

    56

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    57/81

    9

    17

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Gaussian Mixture Model (GMM)

    1 2

    1 2

    1 2

    ( ) ( )

    (1) (1) (1)

    (

    (2) (2) (2)

    )T

    T

    T

    v D v D

    v

    v v

    v

    v

    v

    v

    D

    Acoust ic vectors

    for training

    GMM

    Feature 1 Feature 2 Feature D

    Histograms

    score = log-likelihood (speech | model)

    18

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Speaker Verif ication

    The odds form of Bayes theorem H0 the speakers model ( ) and the tested

    recording (T) have the same source

    H1 the speakers model ( ) and the testedrecording (T) have different sources

    1

    0

    1

    0

    1

    0( ) ( | ) ( | )

    ( ) ( | ) ( | )

    P P T P T

    P P TH H HP

    H H

    T

    H =

    0

    1

    ( | )

    ( | )

    P T

    P T

    >

    Decision threshold

    Likelihood ratio

    0

    1

    57

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    58/81

    10

    19

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Outline

    Automatic Speaker Recognition

    Voice as Evidence

    Bayesian Interpretation of Evidence

    Corpus Based Methodology Univariate Scoring Method

    Multivariate Direct Method

    Strength of Evidence

    Evaluation of the Strength of Evidence Mismatched Recording Conditions Aural Speaker Recognition

    20

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Interpretation of Evidence

    Bayesian interpretation (BI)Principle

    The Bayesian model, proposed for forensic speaker recognitionby Lewis in 1984, allows for revision based on new information ofa measure of uncertainty (likelihood ratio of the evidence(province of the forensic expert)) which is applied to the pair ofcompeting hypotheses.

    The Bayesian model shows how new data (questioned recording)can be combined with prior background knowledge (prior odds(province of the court)) to give posterior odds (province of thecourt) for judicial outcomes or issues.

    prior odds x ? = posterior odds

    58

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    59/81

    11

    21

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Strength of Evidence

    Bayesian interpretation (BI)

    ( )( )

    ( )( )

    ( )( )

    0 0

    1 1

    0

    1

    P E P E P

    P P EH H

    H HH

    P EH =

    prior

    background

    knowledge

    posterior

    knowledge

    on the issue

    New

    Data

    Prior odds Posterior oddsLikelihood

    Ratio (LR)

    province of the court province of the courtprovince of the

    forensic expert

    22

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Voice as Evidence

    In the case of questioned recording (trace),the evidence does not consist in speechitself, but in the quantified degree ofsimilarity between speaker dependent

    features extracted from the trace, andspeaker dependent features extracted fromrecorded speech of a suspect, representedby his/her model.

    59

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    60/81

    12

    23

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Voice as Evidence

    Featureextraction

    Similarity(Distance)

    Models foreach speaker

    Score

    Suspected speakerreference database (R)

    Suspect

    TraceEvidence (E)

    Suspected speaker model

    Signification ?

    Bayesian InterpretationQuestioned recording

    24

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Outline

    Automatic Speaker Recognition

    Voice as Evidence

    Bayesian Interpretation of Evidence

    Corpus Based Methodology Univariate Scoring Method

    Multivariate Direct Method

    Strength of Evidence

    Evaluation of the Strength of Evidence

    Mismatched Recording Conditions Aural Speaker Recognition

    60

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    61/81

    13

    25

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Bayesian Interpretation of Evidence

    The odds form of Bayes theorem H0 the suspected speaker is the source of the

    questioned recording (within-source variability)

    H1 the speaker at the origin of the questionedrecording is not the suspected speaker(between-sources variability)

    1

    0

    1

    0

    1

    0( ) ( | ) ( | )

    ( ) ( | ) ( | )

    P P E P E

    P P EH H HP

    H H

    E

    H =

    0

    1

    ( | )( | )

    P E

    P HE

    HLikelihood ratio Strength of evidence

    similarity

    typicality

    26

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Outline

    Automatic Speaker Recognition

    Voice as Evidence

    Bayesian Interpretation of Evidence

    Corpus Based Methodology Univariate Scoring Method

    Multivariate Direct Method

    Strength of Evidence

    Evaluation of the Strength of Evidence

    Mismatched Recording Conditions Aural Speaker Recognition

    61

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    62/81

    14

    27

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Uni- and Multivariate Methods

    Scoring Method: Likelihoodcalculated from distribution ofscores modeling within-sourceand between-sources variability

    H0 : distribution of scores ofwithin-source variability

    H1 : distribution of scores ofbetween-sources variability

    3 databases: Suspect Reference Database

    (R)

    Potential Population

    Database (P) Suspect Control Database(C)

    Direct Method: Likelihooddirectly calculated from GMM ofthe suspect and GMM of thepotential population

    H0 : GMM of the suspect

    H1 : GMMs of the potentialpopulation

    2 databases : Suspect Reference Database

    (R)

    Potential Population Database(P)

    Databases Used: R= 5 utterances per speaker (2-3 min each)

    P = 100 speakers (2-3 min each)

    C = 30-40 utterances per speaker (10-20 seceach)

    28

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Corpus Based Methodology

    3 databases (DBs) Potential population database (P)

    Large-scale database used to model the potentialpopulation of speakers to evaluate the between-sourcesvariability

    Suspected speaker reference database (R) Database recorded with the suspected speaker to model

    her/his speech

    Suspected speaker control database (C)

    Database recorded with the suspected speaker toevaluate her/his within-source variability

    62

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    63/81

    15

    29

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Scoring Method

    Trace

    Relevant population

    Suspect

    Casework

    Suspected speakerreference database (R)

    Suspected speakercontrol database (C)

    Potential populationdatabase (P)

    30

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Within-source variability

    Featureextraction

    Similarity(Distance)

    Models foreach speaker

    Scores

    Suspected speakerreference database (R)

    Suspect

    Suspected speaker model

    Distribution of thewithin-source variability

    Suspect

    Suspected speakercontrol database (C)

    63

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    64/81

    16

    31

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Between-sources Variability

    Featureextraction

    Similarity(Distance)

    Models foreach speaker

    ScoresTrace

    Speaker models of thepotential population

    Questioned recording

    Potential populationdatabase (P)

    Distribution of thebetween-sources variability

    32

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Evaluation of the within-source variability

    O c

    c u re n c e

    s

    Similarityscores

    Comparison of the suspected speaker modelswith the utterances of his control database (C)

    64

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    65/81

    17

    33

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Evaluation of the between-sources variability

    O c

    c u re n c

    e s

    Similarity scores

    Comparison of the trace with the speaker models ofpotential population database (P)

    34

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Likelihood ratio

    P (E | H1) / P (E | H2) = 0.15 / 0.002 = 75

    Similarity scores

    E s

    t i m a

    t e d

    p r o

    b a

    b i l

    i t y

    E = 6

    65

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    66/81

    18

    35

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Outline

    Automatic Speaker Recognition

    Voice as Evidence

    Bayesian Interpretation of Evidence

    Corpus Based Methodology Univariate Scoring Method

    Multivariate Direct Method

    Strength of Evidence

    Evaluation of the Strength of Evidence Mismatched Recording Conditions Aural Speaker Recognition

    36

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Strength of Evidence - Likel ihood ratio

    A likelihood ratio of 9.16obtained means that it is9.16 times more li kely

    to observe the score (E)given the hypothesis H0(the suspect is the sourceof the questioned

    recording) than given thehypothesis H1 (thatanother speaker from therelevant population is thesource of the questionedrecording).

    66

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    67/81

    19

    37

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    DET (Detection Curve)

    DET curve can be computed from distributions of scores with a variable threshold

    38

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Analysis and compar ison

    TracePotentialpopulationdatabase (P)

    Featureextraction

    Feature extractionand modelling

    Featureextraction

    Feature extractionand modelling

    Suspectedspeakercontrol

    database (C)

    Suspectedspeaker

    referencedatabase (R)

    Features

    Suspected

    speakermodel Features

    Relevant

    speakersmodels

    Comparativeanalysis

    Comparativeanalysis

    Comparativeanalysis

    Similarityscores

    Similarityscores

    Evidence (E)

    67

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    68/81

    20

    39

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Interpretation of the evidence

    Similarityscores

    Similarityscores

    Evidence (E)

    Modelling of thewithin-source variability

    Modelling of thebetween-sources variability

    Numerator of thelikelihood ratio

    Denominator of thelikelihood ratio

    Likelihood ratio (LR)

    Distribution of thewithin-source variability

    Distribution of thebetween-sources variability

    40

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Individual Case

    Trace Suspect

    Casework

    Suspected speakerreference database

    Suspected speakersingle recording

    Questioned recording

    68

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    69/81

    21

    41

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Scoring Method with Limited Suspect Data

    1

    0

    1

    0

    1

    0( ) ( | ) ( | )

    ( ) ( | ) ( | )

    P P E P E

    P P EH H HP

    H H

    E

    H =

    The odds form of Bayes theorem H0 the two recordings have the same source

    H1 the two recordings have different sources

    Likelihood ratioStrength of evidencewith respect to new hypotheses0

    1( | )( | )

    P E

    P HE

    H

    42

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Direct Method

    The odds form of Bayes theorem H0 the speakers model ( ) and the

    questioned recording (T) have the same source

    H1 the speakers model ( ) and thequestioned recording (T) have different sources

    1

    0

    1

    0

    1

    0( ) ( | ) ( | )

    ( ) ( | ) ( | )

    P P T P T

    P P TH H HP

    H H

    T

    H =

    0

    1

    ( | )

    ( | )

    P T

    P T

    Likelihood ratio

    0

    1

    Strength of evidence ?

    69

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    70/81

    22

    43

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Multivariate (Direct) Method LR Numerator

    Featureextraction

    Similarity(Distance)

    Models foreach speaker

    Score

    Suspected speakerreference database (R)

    Suspect

    Trace

    Suspected speaker model

    Numerator of the likelihood ratioQuestioned recording

    score = log-likelihood (trace | H0)

    44

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Featureextraction

    Similarity(Distance)

    Model ofall speakers

    ScoreTrace

    Model of the

    potential population

    Questioned recording

    Potential populationdatabase (P)

    Multivariate (Direct) Method LR Denominator

    Denominator of the likelihood ratio

    score = log-likelihood (trace | H1)

    70

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    71/81

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    72/81

    24

    47

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Evaluation of the Strength of Evidence

    Univariate (Scoring) Method

    48

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Cumulative Density Functions

    72

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    73/81

    25

    49

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Tippett plots (reliability-survival functions)

    Univariate (Scoring) Method

    50

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Evaluation of the Strength of Evidence

    Multivariate (Direct) Method

    73

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    74/81

    26

    51

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Tippett plots (reliability-survival functions)

    Multivariate (Direct) Method

    52

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Outline

    Automatic Speaker Recognition

    Voice as Evidence

    Bayesian Interpretation of Evidence

    Corpus Based Methodology Univariate Scoring Method

    Multivariate Direct Method

    Strength of Evidence

    Evaluation of the Strength of Evidence

    Mismatched Recording Conditions Aural Speaker Recognition

    74

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    75/81

    27

    53

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Using databases with mismatched recording conditions

    FBI NIST 2002 Database : 2

    conditions (Microphone -

    Telephone)

    The extent of mismatch can be measured using statistical testing

    54

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Compensating for Mismatch

    E

    H1scores(matched conditions)

    Pot Pop. H1scores(mismatched conditions)

    Ho scores(matched conditions)

    Not compensating for mismatch can be the dif ference between

    an LR < 1 and an LR > 1

    75

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    76/81

    28

    55

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Outline

    Automatic Speaker Recognition

    Voice as Evidence

    Bayesian Interpretation of Evidence

    Corpus Based Methodology Univariate Scoring Method

    Multivariate Direct Method

    Strength of Evidence

    Evaluation of the Strength of Evidence Mismatched Recording Conditions Aural Speaker Recognition

    56

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Experimental Framework

    Listeners

    90 listeners whose mother-tongue is French

    Laypersons with no phonetic training

    Same computer and headphones

    Training

    No limitation on the number of listening trials

    Testing

    Verbal scores scale from 1 through 7

    Perceptual cues

    AuralAural Speaker RecognitionSpeaker Recognition

    76

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    77/81

    29

    57

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Perceptual Verbal Scale and Perceptual CuesPerceptual Verbal Scale and Perceptual Cues

    Score 1 I am sure that the two speakers are not the same Score 2 I am almost sure that the two speakers are not the same Score 3 It is possible that the two speakers are not the same Score 4 I cannot decide Score 5 It is possible that the two speakers are the same Score 6 I am almost sure that the two speakers are the same Score 7 I am sure that the two speakers are the same

    Perceptual Verbal Scale

    58

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Strength of Evidence for Aural RecognitionStrength of Evidence for Aural Recognition

    0.0

    0.2

    0.4

    0.6

    1 2 3 4 5 6 7

    Estimate

    dProbability

    H1 Ho

    )(

    )(

    1

    0

    HEP

    HEPLR =

    EPerceptual Verbal Score

    Likelihood Ratio (LR) = Ratio of the heights on the histograms for the

    two hypotheses at the point "E"

    Discrete scoresHistograms used to estimatethe probabilities of

    scores for each hypothesis

    77

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    78/81

    30

    59

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Evaluating Strength of Evidence in Matched ConditionsEvaluating Strength of Evidence in Matched Conditions

    AuralAutomat ic

    Similar separations between curves for aural and automatic systems

    Ref. PSTN vs Traces PSTN

    60

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Evaluating Strength of Evidence in Mismatched ConditionsEvaluating Strength of Evidence in Mismatched Conditions

    Aural

    Automat ic

    Better curve separation in

    aural recogniti onBetter evaluation of LR for aural

    recognition in mismatched conditions

    Ref. PSTN vs Traces Noisy PSTN

    78

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    79/81

    31

    61

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    10-2

    10-1

    100

    101

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Likelihood Ratio (LR)

    EstimatedProbability

    H0 Aural

    H1 Aural

    Automatic-Adapted

    Automatic-Adapted

    Evaluating Strength of Evidence in Adapted ConditionsEvaluating Strength of Evidence in Adapted Conditions

    Adaptation for noisy conditions results in the improvementof performance of automatic recognition

    Ref. PSTN vs Traces Adapted Noisy PSTN

    62

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Admissibi lity of Scient if ic Evidence (USA)

    Daubert criteria: whether the theory or technique can be, and has been

    tested,

    whether the technique has been published or subjectedto peer review,

    whether actual or potential error rates have beenconsidered,

    whether standards exist and are maintained to controlthe operation of the technique,

    whether the technique is widely accepted within therelevant scientific community.

    79

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    80/81

    32

    63

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    References

    Ph. Rose, Forensic Speaker Identification, Taylor and Francis, London,2002.

    D. Meuwly, A. Drygajlo, "Forensic Speaker Recognition Based on aBayesian Framework and Gaussian Mixture Modelling (GMM)", TheWorkshop on Speaker Recognition 2001: A Speaker Odyssey, Crete,Greece, June, 2001, pp. 145-150 .

    A. Drygajlo, D. Meuwly, A. Alexander, "StatisticalMethods and Bayesian Interpretation of Evidence inForensic Automatic Speaker Recognition",EUROSPEECH'2003, Geneva, Switzerland, Sept. 2003,pp. 689-692.

    A. Alexander, A. Drygajlo, "Scoring and Direct Methodsfor the Interpretation of Evidence in Forensic SpeakerRecognition, ICSLP 2004, Jeju, Korea, 2004.

    64

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    References

    F. Botti., A. Alexander, and A. Drygajlo, An interpretation framework for theevaluation of evidence in forensic automatic speaker recognition with limitedsuspect data, Odyssey 2004, The Speaker and Language RecognitionWorkshop, Toledo, Spain, 2004, pp. 6368.

    A. Alexander, F. Botti, and A. Drygajlo, Handling Mismatch in Corpus-BasedForensic Speaker Recognition, Odyssey 2004, The Speaker and LanguageRecognition Workshop, Toledo, Spain, May 2004, pp. 6974

    A. Alexander, F. Botti, D. Dessimoz, A. Drygajlo, "The Effect of MismatchedRecording Conditions on Human and Automatic Speaker Recognition in ForensicApplications", Forensic Science International, 146S (2004), pp. S95-S99.

    D. Meuwly, A. Drygajlo, "A Bayesian Interpretation of Evidence in ForensicAutomatic Speaker Recognition", to be published in Forensic ScienceInternational.

    J. Gonzalez-Rodriguez, A. Drygajlo, D. Ramos-Castro, M. Garcia-Gomar, J.Ortega-Garcia, "Robust Estimation, Interpretation and Assessment of LikelihoodRatios in Forensic Speaker Recognition", to be published in Computer Speechand Language.

    80

  • 8/12/2019 Tutorial Slides Eriksson Drzygajlo

    81/81

    65

    Speech Processing and Biometrics Group (GTPB)

    Signal Processing Institute (ITS), LIAP

    Conclusions

    The Bayes model, current interpretation framework usedin forensic science, is adapted for forensic automaticspeaker recognition

    The corpus based methodology provides a coherentway of assessing and presenting the evidence ofquestioned recording

    Distributions of likelihood ratios can be used for the

    evaluation of the performance of automatic and auralmethods in forensic speaker recognition applications

    66

    While there is certainly no perfect solution available in the field of forensicspeaker recognition at present, the scientific community is under a moralobligation to contribute whatever possible to aid the course of justice toestablish scientifically founded methodology and techniques

    What is clearly needed is joint research initiatives of forensic scientists

    and speech engineers in order to study problems arising from the actualtechnology and from practical work of forensic experts and gain a morecomplete insight into the concept of the individuality of voice

    Conclusions

Embed Size (px)
Recommended