+ All Categories
Home > Documents > Talker familiarity effects on speech-in-speech perception ·...

Talker familiarity effects on speech-in-speech perception ·...

Date post: 23-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
1
Par$cipants 43 American English listeners (Exp. 1, n=24; Exp. 2, n=19) S$muli Talker Training 15 sentences extracted from Le Pe-t Prince produced by 3 female American English talkers SpeechinSpeech Task Experiment 1: Target: 60 novel HINT sentences produced by 1 female talker not heard in training Background: 48 different HINT sentences produced by 2 training talkers and 2 novel talkers Talker familiarity effects on speech-in-speech perception Angela Cooper & Ann Bradlow Department of LinguisUcs, Northwestern University Previous research Talker familiarity can facilitate speech recogniUon in broadband white noise [1]. Speechinspeech percepUon, involving the teasing apart of at least 2 speech streams, is mediated by similarity of target and compeUng speech, including their voice characterisUcs (e.g. F0, vocal tract length) [2] and linguisUc informaUon (e.g. target and masker languages) [3]. However, not all targetmasker differences are uUlized for stream segregaUon by listeners (e.g. clear speech) [4]. What influence does talker familiarity have on the processing of a=ended and una=ended speech streams? Familiarity with the target talker was found to enhance speech shadowing accuracy; no familiarity effect was found for background talker [5]. Target familiarity effect only found when listeners were told that the target was someone they knew (i.e. their professor). The current study We examined the effect of familiarity with the target and compeUng voices on sentence recogniUon by providing talker training prior to a sentence recogniUon task in 1talker babble Three Possible Hypotheses A. If enhancing dissimilarity between target and masker facilitates stream segregaUon, and if familiarity with a talker’s voice characterisUcs results in it becoming more disUncUve, then any situaUon where the target and background talkers mismatch in familiarity should facilitate recogniUon accuracy relaUve to when they are matched for familiarity. Predic$on: Exp. 1 Be_er target recogniUon with familiar background talkers; Exp. 2 Be_er recogniUon with familiar target talkers. B. A familiar talker may acquire a general processing advantage for the listener and become easier to process in the target but more difficult to ignore in the background. Predic$on: Exp. 1 Worse target recogniUon with familiar background talkers; Exp. 2 Be_er target recogniUon for familiar target talkers. C. Talker familiarity only affects speech recogniUon aber stream segregaUon Predic$on: Exp 1 No difference in target recogniUon as a funcUon of background talker familiarity; Exp 2 Be_er target recogniUon for familiar target talkers. Enhanced speech recogniUon was found for target talkers with which listeners were implicitly familiar in compeUng speech by an unfamiliar talker In line with previous work [1,5], even a brief period of familiarizaUon with a talker’s speech pa_erns can yield significant improvements in extracUng linguisUc content from a speech signal, even under masking condiUons involving both energeUc and informaUonal masking Consistent with [5], familiarity with the speech pa_erns of the compeUng talkers had no (facilitaUve or inhibitory) effect on unfamiliar target recogniUon Provides support for Hypothesis C: talker familiarity affects processing at a level relevant for phoneUc processing but not for stream segregaUon. While the iniUal segregaUon of auditory streams may rely on automaUc, lowlevel signaldriven processes, higherlevel listenerdependent informaUon may only be uUlized aber the streams have been separated. At this point, talkerconUngent phoneUc processing for recognizing words is only relevant in relaUon to the a_ended speech stream While listener sensiUvity to talkerspecificity in the signal facilitates speech recogniUon, the cues underlying selecUve a_enUon to a target rather than a background voice are more resistant to subjecUve, listenerspecific influences. References [1] Nygaard, L. C. and Pisoni, D. B. (1998). “Talkerspecific learning in speech percepUon,” PercepUon Psychophys. 60, 355–76. [2] Darwin, C. J., Brungart, D. S. and Simpson, B.D. (2003). “Effects of fundamental frequency and vocaltract length changes on a_enUon to one of two simultaneous talkers,” J. Acoust. Soc. Amer. 114, 2913–2922. [3] Van Engen, K. J. and Bradlow, A. R. (2007). “Sentence recogniUon in naUve and foreign language mulUtalker background noise,” J. Acoust. Soc. Amer. 121, 519–526. [4] Calandruccio, L., Van Engen, K. J., Dhar, S. and Bradlow, A. R. (2010). “The effecUveness of clear speech as a masker,” J. Speech Lang. Hear. Res. 53, 1458–71. [5] Newman, R. S. and Evers, S. (2007). “The effect of talker familiarity on stream segregaUon,” J. PhoneUcs 35, 85–103. [6] Nilsson, M., Soli, S. D. and Sullivan, J. A. (1994). “Development of the hearing in noise test for the measurement of speech recepUon thresholds in quiet and in noise,” J. Acoust. Soc. Amer. 95, 10851099. Acknowledgments We would like to thank Pamela Souza, Chun Liang Chan and Vanessa Dopker for their invaluable research and technical support. This research was supported in part by NIHNIDCD grant R01DC005794. Results [email protected] h_p://akcooper.wordpress.com Methods SpeechinSpeech Task instructed to listen to sober voice and transcribe target sentence background talker started 300 ms before target talker 2 blocks of 30 trials: Block 1 at 10 dB SNR, Block 2 at 15 dB SNR Iden-fica-on Test 10 Le Pe-t Prince sentences + 12 HearinginNoise Test (HINT) sentences [6] produced by training talkers Exp. 1 – Background Familiarity Exp. 2 – Target Familiarity Significant main effect of SNR: Listeners had significantly worse sentence recogniUon scores in 15 dB SNR relaUve to 10 dB SNR across familiarity condiUons (p<.01) No significant main effect of background familiarity: Listeners performed similarly when background talker was familiar or unfamiliar (p>.05) No significant background familiarity x SNR interacUon (p>.05) Significant main effect of SNR: Listeners had significantly worse sentence recogniUon scores in 15 dB SNR relaUve to 10 dB SNR across familiarity condiUons (p<.02) A significant main effect of target familiarity: Listeners performed significantly be_er when the target talker was familiar vs. unfamiliar (p<.05) No significant background familiarity x SNR interacUon (p>.05) Target Background Unfamiliar ‘X’ Familiar ‘A’ Unfamiliar ‘C’ Familiar ‘B’ Unfamiliar ‘D’ Target Background Familiar ‘A’ Unfamiliar ‘X’ Unfamiliar ‘C’ Familiar ‘B’ Unfamiliar ‘D’ Experiment 1 Experiment 2 FamiliarizaUon RecogniUon Passive listening with talker names provided 45 trials Talker idenUficaUon with feedback 45 trials Talker Training (following [1]) IdenUficaUon Test Talker idenUficaUon without feedback 66 trials SNR -10 dB SNR -15 dB RAU 10 20 30 40 50 60 70 80 Familiar Unfamiliar Experiment 1 – Unfamiliar Target; Familiar or Unfamiliar Background -20 0 20 40 60 80 -20 0 20 40 60 80 Unfamiliar (RAU) Familiar (RAU) -20 0 20 40 60 80 -20 0 20 40 60 80 Unfamiliar (RAU) Familiar (RAU) Mean RAU score (+/ 1 standard error) by SNR and talker familiarity Individual mean target recogni-on RAU score for unfamiliar background talker (xaxis) and familiar background talker (yaxis). Points in the shaded area indicate a familiarity advantage. 10 dB SNR 15 dB SNR SNR -10 dB SNR -15 dB RAU 10 20 30 40 50 60 70 80 Familiar Unfamiliar Experiment 2 – Familiar or Unfamiliar Target; Unfamiliar Background -20 0 20 40 60 80 -20 0 20 40 60 80 Unfamiliar (RAU) Familiar (RAU) -20 0 20 40 60 80 -20 0 20 40 60 80 Unfamiliar (RAU) Familiar (RAU) 10 dB SNR 15 dB SNR Experiment 2: Target: 60 novel HINT sentences produced by 2 training talkers and 2 novel talkers Background: 48 different HINT sentences produced by 1 female talker not heard in training enUre sentence had to be accurate to be considered correct converted to raUonalized arcsine units (RAU) for analysis needed to achieve at least 70% correct on talker ID test (M=95%) to be included for analysis SPEECHINSPEECH SENTENCE RECOGNITION TASK Mean RAU score (+/ 1 standard error) by SNR and talker familiarity Individual mean target recogni-on RAU score for unfamiliar background talker (xaxis) and familiar background talker (yaxis). Points in the shaded area indicate a familiarity advantage.
Transcript
Page 1: Talker familiarity effects on speech-in-speech perception · Parcipants!!43!American!English!listeners!(Exp.!1,!n=24;!Exp.!2,!n=19)! + Smuli TalkerTraining!!15!sentences!extracted!from!Le’Pe-tPrince’

Par$cipants  Ø   43  American  English  listeners  (Exp.  1,  n=24;  Exp.  2,  n=19)    S$muli  Talker  Training  Ø   15  sentences  extracted  from  Le  Pe-t  Prince  produced  by  3  female    American  English  talkers  Speech-­‐in-­‐Speech  Task  Ø   Experiment  1:  

•  Target:  60  novel  HINT  sentences  produced  by  1  female  talker  not  heard  in  training  

•  Background:  48  different  HINT  sentences  produced  by  2    training  talkers  and  2  novel  talkers  

Talker familiarity effects on speech-in-speech perception�Angela  Cooper  &  Ann  Bradlow  Department  of  LinguisUcs,  Northwestern  University  

Previous  research  Ø   Talker  familiarity  can  facilitate  speech  recogniUon  in  broadband  white  noise  [1].  Ø    Speech-­‐in-­‐speech   percepUon,   involving   the   teasing   apart   of   at   least   2   speech   streams,   is   mediated   by   similarity   of   target   and  compeUng  speech,  including  their  voice  characterisUcs  (e.g.  F0,  vocal  tract  length)  [2]  and  linguisUc  informaUon  (e.g.  target  and  masker  languages)  [3].  However,  not  all  target-­‐masker  differences  are  uUlized  for  stream  segregaUon  by  listeners  (e.g.  clear  speech)  [4].  

What  influence  does  talker  familiarity  have  on  the  processing  of  a=ended  and  una=ended  speech  streams?  Ø   Familiarity  with  the  target  talker  was  found  to  enhance  speech  shadowing  accuracy;  no  familiarity  effect  was  found  for  background  talker  [5].  

•  Target  familiarity  effect  only  found  when  listeners  were  told  that  the  target  was  someone  they  knew  (i.e.  their  professor).  The  current  study  Ø   We  examined  the  effect  of  familiarity  with  the  target  and  compeUng  voices  on  sentence  recogniUon  by  providing  talker  training  prior  to  a  sentence  recogniUon  task  in  1-­‐talker  babble  Three  Possible  Hypotheses  A.  If  enhancing  dissimilarity  between  target  and  masker  facilitates  stream  segregaUon,    and  if  familiarity  with  a  talker’s  voice  characterisUcs  results  in  it  becoming  more  disUncUve,    then  any  situaUon  where  the  target  and  background  talkers  mismatch  in  familiarity  should  facilitate  recogniUon  accuracy  relaUve  to  when  they  are  matched  for  familiarity.  

Predic$on:  Exp.  1  -­‐  Be_er  target  recogniUon  with  familiar  background  talkers;    Exp.  2  -­‐  Be_er  recogniUon  with  familiar  target  talkers.  

B.  A  familiar  talker  may  acquire  a  general  processing  advantage  for  the  listener    and  become  easier  to  process  in  the  target  but  more  difficult  to  ignore  in  the  background.  

Predic$on:  Exp.  1  -­‐  Worse  target  recogniUon  with  familiar  background  talkers;    Exp.    2  -­‐  Be_er  target  recogniUon  for  familiar  target  talkers.  

C.  Talker  familiarity  only  affects  speech  recogniUon  aber  stream  segregaUon  Predic$on:  Exp  1  -­‐  No  difference  in  target  recogniUon  as  a  funcUon  of  background    talker  familiarity;  Exp  2  -­‐  Be_er  target  recogniUon  for  familiar  target  talkers.  

 

 Ø Enhanced  speech  recogniUon  was  found  for  target  talkers  with  which  listeners  were  implicitly  familiar  in  compeUng  speech  by  an  unfamiliar  talker  

•  In  line  with  previous  work  [1,5],  even  a  brief  period  of  familiarizaUon  with  a  talker’s  speech  pa_erns  can  yield  significant  improvements  in  extracUng  linguisUc  content  from  a  speech  signal,  even  under  masking  condiUons  involving  both  energeUc  and  informaUonal  masking  

Ø Consistent  with  [5],  familiarity  with  the  speech  pa_erns  of  the  compeUng  talkers  had  no  (facilitaUve  or  inhibitory)  effect  on  unfamiliar  target  recogniUon  Ø Provides  support  for  Hypothesis  C:  talker  familiarity  affects  processing  at  a  level  relevant  for  phoneUc  processing  but  not  for  stream  segregaUon.    Ø While  the  iniUal  segregaUon  of  auditory  streams  may  rely  on  automaUc,  low-­‐level  signal-­‐driven  processes,  higher-­‐level  listener-­‐dependent  informaUon  

may  only  be  uUlized  aber  the  streams  have  been  separated.  •  At  this  point,  talker-­‐conUngent  phoneUc  processing  for  recognizing  words  is  only  relevant  in  relaUon  to  the  a_ended  speech  stream  

Ø While  listener  sensiUvity  to  talker-­‐specificity  in  the  signal  facilitates  speech  recogniUon,  the  cues  underlying  selecUve  a_enUon  to  a  target  rather  than  a  background  voice  are  more  resistant  to  subjecUve,  listener-­‐specific  influences.  

References  [1]  Nygaard,  L.  C.  and  Pisoni,  D.  B.  (1998).  “Talker-­‐specific  learning  in  speech  percepUon,”  PercepUon  Psychophys.  60,  355–76.    [2]  Darwin,  C.  J.,  Brungart,  D.  S.  and  Simpson,  B.D.  (2003).  “Effects  of  fundamental  frequency  and  vocal-­‐tract  length  changes  on  a_enUon  to  one  of  two  simultaneous  talkers,”  J.  Acoust.  Soc.  Amer.  114,  2913–2922.  [3]  Van  Engen,  K.  J.  and  Bradlow,  A.  R.  (2007).  “Sentence  recogniUon  in  naUve-­‐  and  foreign-­‐  language  mulU-­‐talker  background  noise,”  J.  Acoust.  Soc.  Amer.  121,  519–526.    [4]  Calandruccio,  L.,  Van  Engen,  K.  J.,  Dhar,  S.  and  Bradlow,  A.  R.  (2010).  “The  effecUveness  of  clear  speech  as  a  masker,”  J.  Speech  Lang.  Hear.  Res.  53,  1458–71.  [5]  Newman,  R.  S.  and  Evers,  S.  (2007).  “The  effect  of  talker  familiarity  on  stream  segregaUon,”  J.  PhoneUcs  35,  85–103.    [6]  Nilsson,  M.,  Soli,  S.  D.  and  Sullivan,  J.  A.  (1994).  “Development  of  the  hearing  in  noise  test  for  the  measurement  of  speech  recepUon  thresholds  in  quiet  and  in  noise,”  J.  Acoust.  Soc.  Amer.  95,  1085-­‐1099.      Acknowledgments  We  would  like  to  thank  Pamela  Souza,  Chun  Liang  Chan  and  Vanessa  Dopker  for  their  invaluable  research  and  technical  support.  This  research  was  supported  in  part  by  NIH-­‐NIDCD  grant  R01-­‐DC005794.    

Results  

[email protected]  h_p://akcooper.wordpress.com  

 

Methods  

             Speech-­‐in-­‐Speech  Task  

Ø   instructed  to  listen  to  sober  voice  and  transcribe  target  sentence  •  background  talker  started  300  ms  before  target  talker  

Ø   2  blocks  of  30  trials:  Block  1  at  -­‐10  dB  SNR,  Block  2  at  -­‐15  dB  SNR  

Iden-fica-on  Test  Ø   10  Le  Pe-t  Prince  sentences  +  12  Hearing-­‐in-­‐Noise  Test  (HINT)  sentences  [6]  produced  by  training  talkers  

 

Exp.  1  –  Background  Familiarity   Exp.  2  –  Target  Familiarity  Ø   Significant  main  effect  of  SNR:  Listeners  had  significantly  worse  sentence  recogniUon  scores  in  -­‐15  dB  SNR  relaUve  to  -­‐10  dB  SNR  across  familiarity  condiUons  (p<.01)  Ø   No  significant  main  effect  of  background  familiarity:  Listeners  performed  similarly  when  background  talker  was  familiar  or  unfamiliar  (p>.05)  Ø   No  significant  background  familiarity  x  SNR  interacUon  (p>.05)  

Ø   Significant  main  effect  of  SNR:  Listeners  had  significantly  worse  sentence  recogniUon  scores  in  -­‐15  dB  SNR  relaUve  to  -­‐10  dB  SNR  across  familiarity  condiUons  (p<.02)  Ø   A  significant  main  effect  of  target  familiarity:  Listeners  performed  significantly  be_er  when  the  target  talker  was  familiar  vs.  unfamiliar  (p<.05)  Ø   No  significant  background  familiarity  x  SNR  interacUon  (p>.05)  

Target   Background      Unfamiliar  ‘X’  

Familiar  ‘A’  Unfamiliar  ‘C’  Familiar  ‘B’  Unfamiliar  ‘D’  

Target   Background  Familiar  ‘A’    

 Unfamiliar  ‘X’  

Unfamiliar  ‘C’  Familiar  ‘B’  Unfamiliar  ‘D’  

Experiment  1  

Experiment  2  

FamiliarizaUon   RecogniUon  

Passive  listening  with  talker  names  provided  45  trials  

Talker  idenUficaUon  with  feedback  45  trials  

Talker  Training  (following  [1])  

IdenUficaUon  Test  

Talker  idenUficaUon  without  feedback  66  trials  

SNR -10 dB SNR -15 dB

RAU

10

20

30

40

50

60

70

80FamiliarUnfamiliar

Experiment  1  –  Unfamiliar  Target;  Familiar  or  Unfamiliar  Background    

-20 0 20 40 60 80

-20

020

4060

80

Unfamiliar (RAU)

Fam

iliar

(RA

U)

-20 0 20 40 60 80

-20

020

4060

80

Unfamiliar (RAU)

Fam

iliar

(RA

U)

Mean  RAU  score  (+/-­‐  1  standard  error)  by  SNR  and  talker  familiarity       Individual  mean  target  recogni-on  RAU  score  for  unfamiliar  background  talker  (x-­‐axis)  and  familiar  background  talker  (y-­‐axis).  Points  in  the  shaded  area  indicate  a  familiarity  advantage.  

-­‐10  dB  SNR   -­‐15  dB  SNR  

SNR -10 dB SNR -15 dB

RAU

10

20

30

40

50

60

70

80FamiliarUnfamiliar

Experiment  2  –  Familiar  or  Unfamiliar  Target;  Unfamiliar  Background  

-20 0 20 40 60 80

-20

020

4060

80

Unfamiliar (RAU)

Fam

iliar

(RA

U)

-20 0 20 40 60 80

-20

020

4060

80

Unfamiliar (RAU)

Fam

iliar

(RA

U)

-­‐10  dB  SNR   -­‐15  dB  SNR  

Ø   Experiment  2:  •  Target:  60  novel  HINT  sentences  produced  by  2  training  talkers  and  2  novel  talkers  

•  Background:  48  different  HINT  sentences  produced  by  1  female  talker  not  heard  in  training  

Ø   enUre  sentence  had  to  be  accurate  to  be  considered  correct  Ø   converted  to  raUonalized  arcsine  units  (RAU)  for  analysis  Ø   needed  to  achieve  at  least  70%  correct  on  talker  ID  test  (M=95%)  to  be  included  for  analysis  

SPEECH-­‐IN-­‐SPEECH  SENTENCE  RECOGNITION  TASK  

Mean  RAU  score  (+/-­‐  1  standard  error)  by  SNR  and  talker  familiarity      Individual  mean  target  recogni-on  RAU  score  for  unfamiliar  background  talker  (x-­‐axis)  and  familiar  background  talker  (y-­‐axis).  Points  in  the  shaded  area  indicate  a  familiarity  advantage.  

Recommended