Date post: | 12-Jan-2016 |
Category: |
Documents |
Upload: | winfred-cobb |
View: | 217 times |
Download: | 0 times |
The role of prosody in dialect synthesis and authentication
Kyuchul YoonDivision of English
Kyungnam UniversitySpring 2008 Joint Conference of KSPS & KASS
2
Goals
1. Synthesize Masan utterances from matching Seoul utterances by prosody cloning
2. Examine the role of prosody in the authentication of synthetic Masan utterances (Listening experiment)
3
Background• Differences among dialects
– Segmental differences• Fricative differences in the time domain (Lee, 2002)
– Busan fricatives have shorter frication/aspiration intervals than for Seoul
• Fricative differences in the frequency domain (Kim et al., 2002)– The low cutoff frequency of Kyungsang fricatives was higher than for
Cholla fricatives (> 1,000 Hz)
– Non-segmental or prosodic differences• Intonation or fundamental frequency (F0) contour difference
• Intensity contour difference
• Segment durational difference
• Voice quality difference
4
Synthesis
• Simulating (by prosody cloning) Masan dialect from Seoul dialect
• The simulated Masan utterances will have– the speech segments of Seoul dialect– the prosody of Masan dialect
• F0 contour
• Intensity contour
• Segmental duration
5
Evaluation• Through a listening experiment
• Stimuli consist of– #1. Authentic, but synthetic, Masan utterance
– #2. Seoul utterance with Masan segmental durations (D)
– #3. Seoul utterance with Masan F0 contour (F)
– #4. Seoul utterance with Masan intensity contour (I)
– #5. Seoul utterance with Masan durations and F0 contour (D+F)
– #6. Seoul utterance with Masan durations and intensity contour (D+I)
– #7. Seoul utterance with Masan F0 contour and intensity contour (F+I)
– #8. Seoul utterance with Masan durations, F0 contour and intensity contour (D+F+I)
(1) 동대구에 볼 일이 없습니다 . (2) 바다에 보물섬이 없다
6
Prosody transfer (PSOLA algorithm)
• Three aspects of the prosody– Fundamental frequency (F0) contour
– Intensity contour
– Segmental durations
• Pitch-Synchronous OverLap and Add (PSOLA) algorithm (Mouline & Charpentier, 1990)– Implemented in Praat (Boersma, 2005)
– Use of a script for semi-automatic segment-by-segment manipulation (Yoon, 2007)
7
Prosody transfer (PSOLA algorithm)
• Procedures for full prosody transfer– Align segments btw/ Masan and Seoul utterances– Make the segment durations of the two identical– Make the two F0 contours identical– Make the two intensity contours identical
8
Prosody transfer (PSOLA algorithm)
Align segments btw/ Masan and Seoul utterances Make the segment durations of the two utterances identical
ㅂ ㅏ ㄹ ㅏ ㅁ “… 바람…”Masan
ㅏ ㅏSeoul
stre
tch sh
rin
k
ㅂ ㄹ ㅁ
9
Prosody transfer (PSOLA algorithm)
ㅂ ㅏ ㄹ ㅏ ㅁMasan
Seoul ㅂ ㅏ ㄹ ㅏ ㅁ
Masan F0
Seoul F0
Make the two F0 contours identical
10
Prosody transfer (PSOLA algorithm)
Seoul intensity
ㅂ ㅏ ㄹ ㅏ ㅁMasan
Seoul ㅂ ㅏ ㄹ ㅏ ㅁ
Masan intensity
Make the two intensity contours identical
11
Synthetic (simulated) Masan stimulus
12
Synthetic authentic Masan stimulus
13
Listening experiment
• 16 stimuli (8 + 8)
• Presented to 13 Masan/Changwon listeners– On a scale of 1 (worst) to 10 (best)– Used Praat ExperimentMFC object– Allowed repetition of stimulus: up to 10 times
14
Listening experiment
15
Results & ConclusionHistogram of listener responses
16
Results & ConclusionF0 contour transfer
1 …
list
ener
res
pons
es …
10
17
Results & Conclusion
Seoul utterances with Masan prosody
D
F
I
DF
DI
FI DFI
Masan
18
Results & Conclusion• Main effects of
– Segmental durations; F(1,12)=11.53, p=0.005– F0 contour; F(1,12)=141.12, p=0.00000005
• Regression analysis
19
Results & Conclusion
• Prosody cloning not sufficient for dialect simulation– (Sub)Segmental differences may be at work– Quality of synthetic stimuli
• F0 contour transfer (from Masan to Seoul)– Most influential on shifting perception from
Seoul to Masan utterances
20
References
[1] Kyung-Hee Lee, “Comparison of acoustic characteristics between Seoul and Busan
dialect on fricatives”, Speech Sciences, Vol.9/3, pp.223-235, 2002.
[2] Hyun-Gi Kim, Eun-Young Lee, and Ki-Hwan Hong, “Experimental phonetic study of Kyungsang and Cholla dialect using power spectrum and laryngealfiberscope”, Speech Sciences, Vol.9/2, pp.25-47, 2002.
[3] Kyuchul Yoon, “Imposing native speakers’ prosody on non-native speakers’ utterances: The technique of cloning prosody”, Journal of the Modern British & American Language & Literature, Vol.25(4). pp.197-215, 2007.
[4] E. Moulines and F. Charpentier, “Pitch synchronouswaveform processing techniquesfor text-to-speech synthesis using diphones”, Speech Communication, 9 5-6, 1990.
[5] P. Boersma, “Praat, a system for doing phonetics by computer”, Glot International,Vol.5, 9/10, pp.341-345, 2005.