Download - Talker familiarity effects on speech-in-speech perception · Parcipants!!43!American!English!listeners!(Exp.!1,!n=24;!Exp.!2,!n=19)! + Smuli TalkerTraining!!15!sentences!extracted!from!Le’Pe-tPrince’

Par$cipants Ø  43 American English listeners (Exp. 1, n=24; Exp. 2, n=19) S$muli Talker Training Ø  15 sentences extracted from Le Pe-t Prince produced by 3 female American English talkers Speech-‐in-‐Speech Task Ø  Experiment 1:

•  Target: 60 novel HINT sentences produced by 1 female talker not heard in training

•  Background: 48 different HINT sentences produced by 2 training talkers and 2 novel talkers

Talker familiarity effects on speech-in-speech perception�Angela Cooper & Ann Bradlow Department of LinguisUcs, Northwestern University

Previous research Ø  Talker familiarity can facilitate speech recogniUon in broadband white noise [1]. Ø  Speech-‐in-‐speech percepUon, involving the teasing apart of at least 2 speech streams, is mediated by similarity of target and compeUng speech, including their voice characterisUcs (e.g. F0, vocal tract length) [2] and linguisUc informaUon (e.g. target and masker languages) [3]. However, not all target-‐masker differences are uUlized for stream segregaUon by listeners (e.g. clear speech) [4].

What influence does talker familiarity have on the processing of a=ended and una=ended speech streams? Ø  Familiarity with the target talker was found to enhance speech shadowing accuracy; no familiarity effect was found for background talker [5].

•  Target familiarity effect only found when listeners were told that the target was someone they knew (i.e. their professor). The current study Ø  We examined the effect of familiarity with the target and compeUng voices on sentence recogniUon by providing talker training prior to a sentence recogniUon task in 1-‐talker babble Three Possible Hypotheses A. If enhancing dissimilarity between target and masker facilitates stream segregaUon, and if familiarity with a talker’s voice characterisUcs results in it becoming more disUncUve, then any situaUon where the target and background talkers mismatch in familiarity should facilitate recogniUon accuracy relaUve to when they are matched for familiarity.

Predic$on: Exp. 1 -‐ Be_er target recogniUon with familiar background talkers; Exp. 2 -‐ Be_er recogniUon with familiar target talkers.

B. A familiar talker may acquire a general processing advantage for the listener and become easier to process in the target but more difficult to ignore in the background.

Predic$on: Exp. 1 -‐ Worse target recogniUon with familiar background talkers; Exp. 2 -‐ Be_er target recogniUon for familiar target talkers.

C. Talker familiarity only affects speech recogniUon aber stream segregaUon Predic$on: Exp 1 -‐ No difference in target recogniUon as a funcUon of background talker familiarity; Exp 2 -‐ Be_er target recogniUon for familiar target talkers.

Ø Enhanced speech recogniUon was found for target talkers with which listeners were implicitly familiar in compeUng speech by an unfamiliar talker

•  In line with previous work [1,5], even a brief period of familiarizaUon with a talker’s speech pa_erns can yield significant improvements in extracUng linguisUc content from a speech signal, even under masking condiUons involving both energeUc and informaUonal masking

Ø Consistent with [5], familiarity with the speech pa_erns of the compeUng talkers had no (facilitaUve or inhibitory) effect on unfamiliar target recogniUon Ø Provides support for Hypothesis C: talker familiarity affects processing at a level relevant for phoneUc processing but not for stream segregaUon. Ø While the iniUal segregaUon of auditory streams may rely on automaUc, low-‐level signal-‐driven processes, higher-‐level listener-‐dependent informaUon

may only be uUlized aber the streams have been separated. •  At this point, talker-‐conUngent phoneUc processing for recognizing words is only relevant in relaUon to the a_ended speech stream

Ø While listener sensiUvity to talker-‐specificity in the signal facilitates speech recogniUon, the cues underlying selecUve a_enUon to a target rather than a background voice are more resistant to subjecUve, listener-‐specific influences.

References [1] Nygaard, L. C. and Pisoni, D. B. (1998). “Talker-‐specific learning in speech percepUon,” PercepUon Psychophys. 60, 355–76. [2] Darwin, C. J., Brungart, D. S. and Simpson, B.D. (2003). “Effects of fundamental frequency and vocal-‐tract length changes on a_enUon to one of two simultaneous talkers,” J. Acoust. Soc. Amer. 114, 2913–2922. [3] Van Engen, K. J. and Bradlow, A. R. (2007). “Sentence recogniUon in naUve-‐ and foreign-‐ language mulU-‐talker background noise,” J. Acoust. Soc. Amer. 121, 519–526. [4] Calandruccio, L., Van Engen, K. J., Dhar, S. and Bradlow, A. R. (2010). “The effecUveness of clear speech as a masker,” J. Speech Lang. Hear. Res. 53, 1458–71. [5] Newman, R. S. and Evers, S. (2007). “The effect of talker familiarity on stream segregaUon,” J. PhoneUcs 35, 85–103. [6] Nilsson, M., Soli, S. D. and Sullivan, J. A. (1994). “Development of the hearing in noise test for the measurement of speech recepUon thresholds in quiet and in noise,” J. Acoust. Soc. Amer. 95, 1085-‐1099. Acknowledgments We would like to thank Pamela Souza, Chun Liang Chan and Vanessa Dopker for their invaluable research and technical support. This research was supported in part by NIH-‐NIDCD grant R01-‐DC005794.

Results

[email protected] h_p://akcooper.wordpress.com

Methods

Speech-‐in-‐Speech Task

Ø  instructed to listen to sober voice and transcribe target sentence •  background talker started 300 ms before target talker

Ø  2 blocks of 30 trials: Block 1 at -‐10 dB SNR, Block 2 at -‐15 dB SNR

Iden-fica-on Test Ø  10 Le Pe-t Prince sentences + 12 Hearing-‐in-‐Noise Test (HINT) sentences [6] produced by training talkers

Exp. 1 – Background Familiarity Exp. 2 – Target Familiarity Ø  Significant main effect of SNR: Listeners had significantly worse sentence recogniUon scores in -‐15 dB SNR relaUve to -‐10 dB SNR across familiarity condiUons (p<.01) Ø  No significant main effect of background familiarity: Listeners performed similarly when background talker was familiar or unfamiliar (p>.05) Ø  No significant background familiarity x SNR interacUon (p>.05)

Ø  Significant main effect of SNR: Listeners had significantly worse sentence recogniUon scores in -‐15 dB SNR relaUve to -‐10 dB SNR across familiarity condiUons (p<.02) Ø  A significant main effect of target familiarity: Listeners performed significantly be_er when the target talker was familiar vs. unfamiliar (p<.05) Ø  No significant background familiarity x SNR interacUon (p>.05)

Target Background Unfamiliar ‘X’

Familiar ‘A’ Unfamiliar ‘C’ Familiar ‘B’ Unfamiliar ‘D’

Target Background Familiar ‘A’

Unfamiliar ‘X’

Unfamiliar ‘C’ Familiar ‘B’ Unfamiliar ‘D’

Experiment 1

Experiment 2

FamiliarizaUon RecogniUon

Passive listening with talker names provided 45 trials

Talker idenUficaUon with feedback 45 trials

Talker Training (following [1])

IdenUficaUon Test

Talker idenUficaUon without feedback 66 trials

SNR -10 dB SNR -15 dB

RAU

10

20

30

40

50

60

70

80FamiliarUnfamiliar

Experiment 1 – Unfamiliar Target; Familiar or Unfamiliar Background

-20 0 20 40 60 80

-20

020

4060

80

Unfamiliar (RAU)

Fam

iliar

(RA

U)

-20 0 20 40 60 80

-20

020

4060

80

Unfamiliar (RAU)

Fam

iliar

(RA

U)

Mean RAU score (+/-‐ 1 standard error) by SNR and talker familiarity Individual mean target recogni-on RAU score for unfamiliar background talker (x-‐axis) and familiar background talker (y-‐axis). Points in the shaded area indicate a familiarity advantage.

-‐10 dB SNR -‐15 dB SNR

SNR -10 dB SNR -15 dB

RAU

10

20

30

40

50

60

70

80FamiliarUnfamiliar

Experiment 2 – Familiar or Unfamiliar Target; Unfamiliar Background

-20 0 20 40 60 80

-20

020

4060

80

Unfamiliar (RAU)

Fam

iliar

(RA

U)

-20 0 20 40 60 80

-20

020

4060

80

Unfamiliar (RAU)

Fam

iliar

(RA

U)

-‐10 dB SNR -‐15 dB SNR

Ø  Experiment 2: •  Target: 60 novel HINT sentences produced by 2 training talkers and 2 novel talkers

•  Background: 48 different HINT sentences produced by 1 female talker not heard in training

Ø  enUre sentence had to be accurate to be considered correct Ø  converted to raUonalized arcsine units (RAU) for analysis Ø  needed to achieve at least 70% correct on talker ID test (M=95%) to be included for analysis

SPEECH-‐IN-‐SPEECH SENTENCE RECOGNITION TASK

Mean RAU score (+/-‐ 1 standard error) by SNR and talker familiarity Individual mean target recogni-on RAU score for unfamiliar background talker (x-‐axis) and familiar background talker (y-‐axis). Points in the shaded area indicate a familiarity advantage.