+ All Categories
Home > Documents > USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and...

USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and...

Date post: 13-Jan-2016
Category:
Upload: alicia-goodwin
View: 217 times
Download: 2 times
Share this document with a friend
27
USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^ , Erik Bresch * , Stephen Tobin ^ , Dani Byrd ^ , Krishna Nayak * , Jon Nielsen * * USC Viterbi School of Engineering ^ USC Department of Linguistics Supported by NIH. Our thanks to the USC Imaging Science
Transcript
Page 1: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Linguistics

Resonance tuning in soprano singing and vocal tract shaping: Comparison

of sung and spoken vowels

2pSC29

Shrikanth Narayanan*^, Erik Bresch*, Stephen Tobin^, Dani Byrd^, Krishna Nayak*, Jon Nielsen*

*USC Viterbi School of Engineering ^USC Department of Linguistics

Supported by NIH. Our thanks to the USC Imaging Science Center.

Page 2: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Background: Singing Acoustics

• Signal characteristics of the singing voice • J. Sundberg, “The Acoustics of Singing Voice,” Scientific

American 236, 1977.

• Measure vocal tract resonances with external excitation

• Show tuning of F1 to F0 for softly sung vowels • E. Joliveau, J. Smith, and J. Wolfe, “Vocal tract resonances in

singing: The soprano voice,” JASA 116, Oct. 2004.

• Estimating formants at high pitch from audio waveform is problematic • H. Traunmueller, A. Erikson, “A method of measuring

formant frequencies at high fundamental frequencies,” Proc. EuroSpeech’97, Vol.1:477-480.

Page 3: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Problem statement

• Long term project goal: Investigate relation between vocal tract shaping and source control in sung and spoken productions

• Specific focus: Soprano challenge• investigate vocal tract shaping for

different vowels with increasing pitch

Page 4: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Data collection

• Subject in MRI scanner in supine position for approx. 60min

• Soprano, trained western opera singer• sang various 30s pieces• spoke utterances “la”, “le”, “li”, “lo”, “lu”

(3 realizations each)• Sang two-octave b-flat major scales “la”,

“le”, “li”, “lo”, “lu” (one realization each)

Page 5: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Real-time MR imaging

• GE 1.5T scanner• Custom head/neck receiver coil• RTHawk software

• Santos et al., Proc. IEEE EMBS, 26th Annual Meeting

• 13 interleaf spiral pulse sequence• TR = 6.5ms• true frame rate 11fps• sliding window reconstruction 22fps• slice thickness approx. 5mm, mid-sagittal plane• resolution approx. 3mm/pixel• resulting image size 68x68 pixels

Page 6: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Synchronized audio acquisition

• Phone-OR optical microphone• Laptop with National Instruments 16bit DAQ card• Sampling rate 100kHz (5x oversampling)• Custom FPGA-based sync hardware

Page 7: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Synchronized audio acquisition

• Offline gradient noise cancellation• employs adaptive FIR filter and

normalized LMS algorithm• achieves approx. 30dB SNR improvement

Page 8: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Image analysis

• Manual tracking of MR images• vocal tract outline for each

individual frame• from larynx to lips

• Computation of midline• finding start and end point at

larynx and lips• repeated recursive bi-section• smoothing spline fit

Page 9: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Image analysis

• Find aperture cross sections perpendicular to smooth midline

• Computation of final midline• along midpoints of cross sections• coordinate system based on midline• coordinate origin to be anchored in

the future to anatomical landmark, currently above epiglottis

Page 10: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Image analysis

• Final result: Aperture function with midline-based coordinate system

frontback

cons

tric

tion

degr

ee (

aper

ture

)

Page 11: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Acoustic analysis

• Using real-time noise-cancelled audio• Pitch estimation using PRAAT• Format analysis using PRAAT for spoken

utterances and for low pitch notes• Formant analysis difficult for high pitch

utterances (example /i/ on next slide)

Page 12: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Acoustic analysis

• Formant analysis from audio is difficult at high pitch.

• “li” note 1 (F0 = 233Hz), note 5 (F0 = 349Hz): clear formant structure

• “li” note 11 (F0 = 622Hz), note 15 (F0 = 932Hz): formants are harder to identify

F0=233 Hz F0=349 Hz

F0=622 Hz F0=932 Hz

5 kHz 5 kHz

5 kHz 5 kHz

Sung Vowel: /i/

Page 13: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Aperture function analysis

At high pitches, the acoustic identity of the vowels are “sacrificed,” i.e. they converge acoustically. However, in their articulation...

• while the front half of the aperture function converges for all vowels at high pitch, the

• back half of aperture function maintains a vowel-dependent shape.

Page 14: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Aperture functions for vowels sung at different pitches

F0 = 233Hz F0 = 349Hz

F0 = 622Hz F0 = 932Hz

Page 15: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Larynx position analysis results

Larynx raising with higher pitch for /e/, /i/, /o/, /u/

pitch increasing ------------------------->

Page 16: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Vocal tract length analysis results

Vocal tract length decreases with pitch for /e/, /i/, /o/

spoken vowels

pitch increasing ------------------------->

Page 17: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Minimum aperture analysis results

Minimum aperture value increases with pitch for all vowels

Minimum aperture location varies with

pitch for /a/, /o/

spoken vowels

pitch increasing -------------------------> pitch increasing ------------------------->

spoken vowels

Page 18: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Image analysis examples /a/

• F0 = 932Hz

• Note 15

• F0 = 622Hz• Note 11

• F0 = 349Hz• Note 5

• F0 = 233Hz• Note 1

• spoken

Page 19: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Image analysis examples /i/

• F0 = 932Hz

• Note 15

• F0 = 622Hz• Note 11

• F0 = 349Hz• Note 5

• F0 = 233Hz• Note 1

• spoken

Page 20: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Sung vowels

Resonance tuning can be shown for vowels with low F1.

Page 21: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Vocal tract shapes in comparison

• /a/ • /e/ • /i/ • /o/ • /u/

sp

oken

F0 =

233H

zF0 =

932H

z

Page 22: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Aperture functions in comparison

• /a/ • /e/ • /i/ • /o/ • /u/

sp

oken

F0 =

233H

zF0 =

932H

z

Page 23: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Discussion

• Several challenges in analysis:• Vocal tract resonances are difficult to estimate from the

acoustic output at high pitch.• We plan in the future to estimate vocal tract resonances

from MR-derived area function data.cf. Joliveau et.al. estimated resonances directly by acoustic methods

• 3 jointly controlled goals:• pitch: critical goal; not compromised as evidenced in audio

• one component of implementation: raised larynx (except low vowel)

• intensity: another important goal; increases with pitch• one component of implementation: open front cavity/cone effect

• vowel identity: acoustic identity lost at high pitches• front cavity shaping compromised, but back cavity distinction still

maintained; effect depends on vowel (low vs. high for example)

Page 24: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Discussion

• Strategy for joint control and relative weighting of the goals is unknown.

• It appears that vowel identity is compromised but not completely ignored at high pitch.• Joliveau et.al. data acquired at soft intensity: opening

of front cavity for cone effect may have been minimized

• Generalizability of results limited• Need: Data from more subjects needed and direct

acoustic modeling for estimating vocal tract resonances

• Ongoing work: We have collected data from 5 more sopranos.

Page 25: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Image analysis examples /a/

• Some /a/ images:

• F0 = 932Hz

• F0 = 622Hz

• F0 = 349Hz

• F0 = 233Hz

• spoken

Page 26: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Image analysis examples /i/

• Some /i/ images:

• spoken

• F0 = 932Hz

• F0 = 622Hz

• F0 = 349Hz

• F0 = 233Hz

Page 27: USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span

Pitch and power estimation

Average power increases with

pitch.

Pitch follows the nominal values very

closely.


Recommended