+ All Categories
Home > Documents > Perception of major acoustic cues

Perception of major acoustic cues

Date post: 12-Jan-2016
Category:
Upload: tawana
View: 32 times
Download: 1 times
Share this document with a friend
Description:
Perception of major acoustic cues. Astrid van Wieringen 5th European Master school on Language and Speech Bonn, 12-16 July 2004. Content of Tutorial. - PowerPoint PPT Presentation
31
Laboratory for Experimental ORL KULeuven Perception of major acoustic cues Astrid van Wieringen 5th European Master school on Language and Speech Bonn, 12-16 July 2004
Transcript
Page 1: Perception of major acoustic cues

Laboratory for Experimental ORLKULeuven

Perception of major acoustic cues

Astrid van Wieringen

5th European Master school on Language and Speech

Bonn, 12-16 July 2004

Page 2: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 2

Lab Exp ORL

KULeuven

Content of Tutorial

• In order to understand why certain speech sounds are not perceived/recognized by hearing-impaired or automatic speech recogniser, one should understand:

– major categories + acoustic properties of speech sounds

– different types of tests & speech materials

– how to assess transmission of robust spectral and temporal cues by means of analytical (phoneme) tests

– data collection and analyses

• hearing loss (with focus on cochlear implants)

• Practical part:

– Test perception of filtered speech sounds

Page 3: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 3

Lab Exp ORL

KULeuven

Speech sounds - major categories

• Vowel and consonant phonemes are classified in terms of

• Manner of articulation

– concerns how the vocal tract restricts airflow

• completely stopping of airflow by an occlusion creates a plosive (stop consonant)

• vocal tract constrictions of varying degree occur in liquids, fricatives, glides and vowels

• lowering the velum causes nasal sounds

• Place of articulation

• refers to the location in the vocal tract

• Voicing

• presence/absence of vocal fold vibration

Page 4: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 4

Lab Exp ORL

KULeuven

Manner of articulation of most consonants

• Stop consonants (plosives): complete closure and subsequent release of a vocal

tract obstruction. Pressure build-up followed by burst. • Liquids: like vowels, but tongue is used for some degree of obstruction. For /l/ air

escapes around the tip of tongue or dorsum. The /r/ has more variable articulation

• Nasals: a lowering of the velum. Airflow out of the nostrils. In English only nasalized consonants (oral tract completely closed), in French also nasalized vowels (air escapes through oral tract and nasal cavities). Vowels may be nasalized in English, but the distinction is not phonemic (= vowel identity does not change). In French there are pairs of vowels that differ only in the presence or absence of vowel nasalization.

• Fricatives: narrow constriction in the oral tract (for some language in the pharynx and in the glottis). If the pressure behind the constriction is high enough and the passage sufficiently narrow, airflow becomes fast enough to generate turbulence at the end of the constriction

• Strident fricatives: noise amplitude is enhanced by airflow striking a surface: (shy)

• Affricate= stop + fricative: d (gin)

Page 5: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 5

Lab Exp ORL

KULeuven

Place of articulation (varies per language)

Page 6: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 6

Lab Exp ORL

KULeuven

Place of articulation

• Labials

– bilabial: if both lips constrict

– labiodental: if the lower lip contacts the upper teeth

• Dental: the tongue tip or blade touches the edge or back of upper teeth

– interdental: if the tip protrudes between the upper and lower teeth (‘the’)

• Alveolar: the tongue tip or blade touches the alveolar ridge

• Palatals: the tongue blade or dorsum constricts with the hard palate

– retroflex: if the tongue tip curls up

• Velar: the dorsum approaches the soft palate

• Uvular: the dorsum approaches the uvula

• Pharyngeal: constriction in the pharynx

• Glottal: vocal folds close or constrict

Page 7: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 7

Lab Exp ORL

KULeuven

Dutch vowel triangle

12001000800600400200

3000

2500

2000

1500

1000

500

0

W D - fem ale

M D - m ale

JW - m ale

A G - fem ale

a

aa

a

i

i

i

i

I

I

I

o

oo

yy

y

e

ee

u

u

u

u

Seco

nd f

orm

ant f

requ

ency

(H

z)

F irst fo rm ant frequency (H z)

Page 8: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 8

Lab Exp ORL

KULeuven

Major acoustic cues of stop consonants

• /p, t, k, b, d, g/

• Phonetic features– Manner: stop (plosive)– Place (bilabial, alveolar, velar)– Acoustic cues

• Silence (corresponds to the period of oral constriction = stop gap)» Voiced stops: low energy, also called voice bar

• Burst: corresponds to the articulatory release of the oral constriciton and to aerodynamic release (due to build-up of pressure). Bursts occur in initial and medial position, rarely found in final position. Place of articulation may be signaled by spectrum of burst, but

– Transition is also very important. Transition corresponds to the articulatory movement from oral constriction for the stop to the more open tract for a following sound (usually vowel). Easy to identify for voiced than for voiceless sounds.

• Most important features:• stop gap• release burst• presence/absence of voice onset time• transition• voicing features

Page 9: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 9

Lab Exp ORL

KULeuven

Duration of stop consonants

• Stop gap: 50-100 ms

• Burst: 5-40 ms (a ‘transient’ = disappears immediately, shortest event in speech!)

• CV (consonant - vowel) and VC (vowel consonant) transitions: 10 - 40 ms. Reflects changes in the vocal tract. Very difficult to measure/analyze such a short event. However, perceptually very important!

Page 10: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 10

Lab Exp ORL

KULeuven

Time (s)0 0.677007

-0.2109

0.251

0

/aba/

Time (s)0 0.700023

-0.2187

0.2548

0

/apa/

Time (s)0 0.704014

-0.2195

0.296

0

/ada/

Time (s)0 0.675011

-0.2594

0.2924

0

/ata/

Time (s)0 0.690023

-0.244

0.3234

0

/aka/

Time-signals of Dutch plosives

Page 11: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 11

Lab Exp ORL

KULeuven

•Time (s)•0 •0.0687982

•-0.1792

•0.1817

•0

•Onset of •/•b•/• in •/•aba•/

•Time (s)•0 •0.0817687

•-0.2195

•0.296

•0

•Onset of •/•d•/• in •/•ada•/

•Time (s)•0 •0.111791

•-0.2101

•0.276

•0

•Onset of •/•t•/• in •/•ata•/

•Time (s)•0 •0.0812925

•-0.2187

•0.221

•0

•Onset of •/•p•/• in •/•apa•/

Initial part of Dutch plosives

Page 12: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 12

Lab Exp ORL

KULeuven

•Time (s)•0 •0.677007

•0

•10•4•/•aba•/

•Time (s)•0 •0.700023

•0

•10•4•/•apa•/

•Time (s)•0 •0.704014

•0

•10•4•/•ada•/

•Time (s)•0 •0.675011

•0

•10•4•/•ata•/

•Time (s)•0 •0.690023

•0

•10•4•/•aka•/

Spectrogram of Dutch plosives

Page 13: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 13

Lab Exp ORL

KULeuven

Fricatives

• Phonemes:

– voiced /, , , /– voiceless: /, , , , , /

• Phonetic features:

– manner: frication

– place: labiodental, linguadental, alveolar, palatal, glottal

• Acoustic cues:

– voicing

– frication noise: noise generated as air is forced through a narrow constriction. Then filtered by the vocal tract.

– transitions to and from the vowels due to changes in the vocal tract

– sibilants/ stridents have intense noise energy

– non sibilants: weak noise energy

Page 14: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 14

Lab Exp ORL

KULeuven

Spectrograms of a few Dutch fricatives

•Time (s)•0 •0.716032

•0

•10•4•/•afa•/

•Time (s)•0 •0.754014

•0

•10•4•/•ava•/

•Time (s)•0 •0.738005

•0

•10•4•/•asa•/

•Time (s)•0 •0.728027

•0

•10•4•/•aza•/

Page 15: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 15

Lab Exp ORL

KULeuven

Nasals

• Phonemes: /m, n, /• Phonetic features:

– manner: nasal

– place: bilabial, alveolar, velar

• Acoustic features:

– murmur: as a result of nasal radiation of acoustical energy. The spectrum is dominated by low-freq. energy (< 500 Hz). Murmur cues of three different nasals are not exactly alike, but difficult as a distinctive cue

– transitions: preceding and following vowels will be nasalized. Cues to place of articulation

– voicing is always present (except during whispering)

• Spectrum of nasals reflects a combination of formants and antiformants

Page 16: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 16

Lab Exp ORL

KULeuven

Spectrograms of a few Dutch nasals

Time (s)0 0.762018

0

6000/ama/

Time (s)0 0.764014

0

6000/ana/

Time (s)0 0.749025

0

6000/anga/

Page 17: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 17

Lab Exp ORL

KULeuven

Glides

• also called ‘ approximants ’ and semivowels:

– gradual articulatory movement

– vocal tract narrowed, not closed

• Phonemes: /j/ & /w/

• Phonetic features

– Manner: glide or semivowel

– Place: palata l or labiovelar

• Acoustic cues

– A relatively slow transition (75-150 ms)

– F1 of both sounds starts at very low value (a little higher than for stops)

– F2 of /w/: 800 Hz Compare with /b/!!, F3 of /w/: 2200 Hz

– F2 of /j/: 2200 Hz (compare wih /d/!!), F3 is 3000 Hz

• longer glides: vowel-vowel sequences!:

– [bi] - [wi]- [ui] and

– [du] - [ju] - [iu]

Page 18: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 18

Lab Exp ORL

KULeuven

Spectrograms of 2 Dutch glides

Time (s)0 0.754014

0

5000/w/ from /awa/

Time (s)0 0.748027

0

5000/j/ from /aja/

Page 19: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 19

Lab Exp ORL

KULeuven

Liquids

• Phonemes: /l/ & /r/

• Phonetic features:– Manner: lateral or rhotic

– Place: alveolar for /l/, palatal for /r/

• Acoustic cues: rather complex:– both relatively fast formant transitions

– similarity with glides: well-defined formant structure (less constriction than stops, fricatives, and affricates)

– /l/: energy mainly in the low frequencies. Resonances and antiresonances due to divided vocal tract. Resembles /n/. F1: 360 Hz, F2: 1300 Hz, F3: 2700 Hz

– /r/: similar for F1

• F2 somewhat lower than for /l/

• F3 especially lower (1650 Hz). Durations of formant transitions somewhat longer for /r/ than for /l/

– temporal cues:

• /r/: F1 has a short steady-state + relatively long transition

• /l/: F1 has a long steady-state + relatively short transition

Page 20: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 20

Lab Exp ORL

KULeuven

Spectrograms of 2 Dutch liquids

Time (s)0 0.678005

0

5000/l/ from /ala/

Time (s)0 0.737029

0

5000/r/ from /ara/

•no clear distinction between vowel and consonant

•F3 of /r/ lower

Page 21: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 21

Lab Exp ORL

KULeuven

Speech perception assessment for the hearing-impaired

Page 22: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 22

Lab Exp ORL

KULeuven

Speech perception assessment

• Required for diagnostic purposes

• monitoring progress in a rehabilitation programme

• comparison of different speech processing strategies (hearing aids and ochlear implants)

• understand “limited” technology/number of channels available for hearing impaired or implantees

– hearing aid: speech divided into frequency bands. Acoustically enhanced

– cochlear implant: acoustical sound is picked up by microphone, analyzed into frequency bands, coded and sent to limited number of electrode pairs in the inner ear (electrical stimulation)

Page 23: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 23

Lab Exp ORL

KULeuven

How a cochlear implant works... (MedEL)

• (1)Sounds are picked up by a microphone and turned into an electrical signal.

(2) This signal goes to the speech processor where it is "coded" (turned into a special pattern of electrical pulses).

(3) These pulses are sent to the coil and are then transmitted across the intact skin (by radio waves) to the implant.

(4) The implant sends a pattern of electrical pulses to the electrodes in the cochlea.(5) The auditory nerve picks up these tiny electrical pulses and sends them to the brain. (6) The brain recognizes these signals as sound.

Page 24: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 24

Lab Exp ORL

KULeuven

Tutorial article on cochlear implants that appeared in the IEEE Signal Processing Magazine, pages 101-130, September 1998.Introduction to cochlear implants

Philipos C. Loizou

Page 25: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 25

Lab Exp ORL

KULeuven

Figure of electrode array in the cochlea...

• Necessary to ‘map’ (fit) acoustical information to electrical information....

Page 26: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 26

Lab Exp ORL

KULeuven

Top: Output of the CIS algorithm for the word ‘som’. Pulse channels reflect the envelopes of the bandpass filter output

100 200 300 400 500 600 700 800 900

1

2

3

4

5

6

7

8

time (ms)

am

plit

ud

e p

er

cha

nn

el

A

Page 27: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 27

Lab Exp ORL

KULeuven

Transmission of AMA & ASA by a CI device

Page 28: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 28

Lab Exp ORL

KULeuven

• Many types of speech tests to evaluate CI performance– detection of environmental sounds– identification of male/female voice– identification of vowels and consonants (V & C) in nonsense

cont.– words– sentences

• Each type of test triggers a different level of performance.

• Why is a carefully balanced V & C test important?– /paat/, /pit/, /poot/, etc., or /apa/, /ara/, /ana/,

– it gives important information on the transmission of speech features via the implant and hearing aid (e.g. voiced- voicelessness, nasality of /m/ or high frequency frication/turbulence of /s/)

• analytical: no contextual information• therefore, information can guide the fitting of an implant

Analytical tests: purpose and performance

Page 29: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 29

Lab Exp ORL

KULeuven

• speech stimuli should be– carefully pronounced and, if possible, adjusted to the same RMS level (so

that other cues are kept in hand)– presented via hard disc of PC, CD or tape (recorded at highest level of

quality)– administered to the subject in a quiet room (if presented acoustically)– presented a sufficient number of times to obtain a reliable score– Note: an analytical test does not replace other tests, but it measures

speech perception based on auditory information alone. Can be used for several languages.

• At the Lab. Exp. ORL recordings were made of Dutch vowels and consonants in different contexts. These were carefully selected from different tokens, segmented (with an additional hamming window to avoid on- and offset clicks), equalized in RMS (root mean square) and partly analyzed (with regard to their main spectral and temporal properties). – /aCa/: /p, t, k, b, d, r, l, m, n, s, f, z, v, w, j/. – /pVt/:/oe, ie, i, oo, o, ee, e, u, aa, a/

• All speech sounds were analyzed (frequency, duration, energy, …)

Choice of test depends on objectives, BUT

Page 30: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 30

Lab Exp ORL

KULeuven

Confusion matrix

aPa aBa aMa aKa aZa aSaapa 4 4 4 0 0 0 12aba 1 5 5 0 0 1 12ama 0 0 12 0 0 0 12aka 0 0 0 12 0 0 12aza 0 0 0 0 9 3 12asa 0 0 0 0 3 9 12

In this example consonant identification is 71% (51/72). Note that this score shouldalways be considered together with the chance performance of the closed-set test (here 17%). In this example it is clear that a score of 71% is considered significantly abovechance (p< 0.05).

• Distribution of errors even more interesting

– not random

– can be quantified by means of an information transmission algorithm (Miller and Nicely, 1955)

Page 31: Perception of major acoustic cues

Astrid van Wieringen 12-16 July 2004 31

Lab Exp ORL

KULeuven

Effect of filtering

• Loss of auditory information can be examined in normal-hearing persons by filtering away acoustical information: to allow certain frequencies to be transmitted while attenuating others.– a high-pass filter allow all components above a cutoff frequency to be

transmitted

– a band-pass filter allows frequencies within a certain band to pass

– low-pass filter allow all components below a cutoff frequency to be transmitted

• Demonstration of loss of acoustical cues!


Recommended