An Objective Identification of Spectral Distinctiveness on Acoustic cue to Subjects with Hearing Loss
Adviser: Dr. Yeou -Jiunn Chen
Presenter: Ming –Da Lee
Outline
1. Introduction1.1 Motivation 1.2 Types of articulation errors1.3 Paper review1.4 Purpose2. Methods and results2.1 Linguistic Analysis2.2 Speech Cue Identification
DTW Identification for Acoustic cue
3.Experimental Results and Discussions 3.1 Materials 3.2 Manipulation of Acoustic Cue 3.3 Subjective Measurements 3.4 Results of Subjective Measurements4.Future work5. References
Motivation
Hearing loss conductive hearing loss sensory-neural hearing loss mixed hearing loss
Articulation Disorder Hard to talk with each other Hard to learn
Speech-language pathologists usually supervise an auditory training
Introduction
Types of Articulation errors
替代音 (Substitution) 如:『公公 』 說成 『咚咚』 省略音 (Omission) 如:鞋 說成 ㄧㄝˊ 添加音 (Addition) 如:『吃飯 』 說成 『ㄔㄨ飯』 聲調錯誤 (Tone Errors) 歪曲音 (Distortion) 整體性的語音不清 (parsing ambiguities)
Types of articulation errors 前置化(或稱舌尖音):發音位置在前面的發音方法。 後置化(或稱舌根化):發音位置在舌根後的發音方法。 不送氣化:以不送氣音取代送氣音。 塞音化:以 ㄅ、ㄆ、ㄉ、ㄊ、ㄍ、ㄎ 替代其他音。 塞擦音化:以 ㄐ、ㄓ、ㄗ 替代 ㄒ、ㄕ、ㄙ。
省略聲母:ㄧ ㄨㄚ 代替 ㄒㄧ ㄍㄨㄚ 。 省略界音:ㄚ 代替 ㄨㄚ。 韻母簡化:
去聲隨鼻韻(鼻音缺乏)。 複韻母簡化:以單韻母替代複韻母。
Paper review
F. Li and J.B. Allen, (2011), “Manipulation of consonants in natural speech,” IEEE Transactions on Audio, Speech, and Language Processing 19, 496–504.
F. Li, A. Menon, and J.B. Allen, (2010), “A psychoacoustic method to find the perceptual cues of stop consonants in natural speech,” J. Acoust. Soc. Amer., vol. 127, no. 4, pp. 2599–2610.
Paper review
Automatic speech recognition (ASR)MFCCPLPRASTA
DTW
Purpose
Speech cue identificationObjectiveDistinctiveness on Acoustic cue
Methods and results 2.1 Linguistic Analysis
Consonant Affected By Vowel
DTW
Butterworth Order-6 Band 500Hz Overlap 250hz
FFT Frame 400 Overlap 200
DTW
Euclidean Distancedist(x, y) = |x - y| = [ (x1-y1)2 + (x2-y2)2 + ... + (xn-
yn)2 ]1/2
Boundary conditions(p1, q1) = (1, 1), (pk, qk) = (m, n)
D(X, Y) = i(X) - j(Y) + min{D(X -1, Y ), ∣ ∣D(X -1, Y-1), D(X, Y -1)}
da
510
1520
25
20406080
100
120
140
160
180
200
ga
5 10 15 20 25 30 35
20
40
60
80
100
120
140
160
180
200
MFCC DTW da ga
5 10 15 20 25 30 35
5
10
15
20
25
da
510
1520
25
5101520253035
ga
5 10 15 20 25 30 35
5
10
15
20
25
30
35
MFCC DTW da ga
5 10 15 20 25 30 35
5
10
15
20
25
DTW Path
MFCC DTWPath da ga
5 10 15 20 25 30 35
5
10
15
20
25
MFCC DTW da ga
5 10 15 20 25 30 35
5
10
15
20
25
da
510
1520
25
20406080
100
120
140
160
180
200
ga
5 10 15 20 25 30 35
20
40
60
80
100
120
140
160
180
200
MFCC DTWPath da ga
5 10 15 20 25 30 35
5
10
15
20
25
Distance Table
MFCC DTWPath da ga
5 10 15 20 25 30 35
5
10
15
20
25
MFCC DTW da ga
5 10 15 20 25 30 35
5
10
15
20
25
MFCC DTW da ga
5 10 15 20 25 30 35
5
10
15
20
25
MFCC DTW da ga
5 10 15 20 25 30 35
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
30
MFCC da
5 10 15 20 25
5
10
15
20
25
30
da
5 10 15 20 25
20
40
60
80
100
120
140
160
180
200
MFCC da
5 10 15 20 25
20
40
60
80
100
120
140
160
180
200
da
5 10 15 20 25
20
40
60
80
100
120
140
160
180
200
Identification for Acoustic cueMFCC da
5 10 15 20 25
20
40
60
80
100
120
140
160
180
200
MFCC da
5 10 15 20 25
20
40
60
80
100
120
140
160
180
200
Materials
TIMIT16K Hz16 bit
da
510
1520
25
20406080
100
120
140
160
180
200
ga
5 10 15 20 25 30 35
20
40
60
80
100
120
140
160
180
200
MFCC DTW da ga
5 10 15 20 25 30 35
5
10
15
20
25
Manipulation of Speech Cue
da
5 10 15 20 25
50
100
150
200
ga
5 10 15 20 25 30 35
50
100
150
200
Manipulation of da
5 10 15 20 25
50
100
150
200
MFCC DTWTable da ga
5 10 15 20 25 30 35
5
10
15
20
25
auto da
5 10 15 20 25
50
100
150
200
MFCC da
5 10 15 20 25
50
100
150
200
MFCC DTWPath da ga
5 10 15 20 25 30 35
5
10
15
20
25
MFCC
PLPda
5 10 15 20 25
50
100
150
200
ga
5 10 15 20 25 30 35
50
100
150
200
Manipulation of da
5 10 15 20 25
50
100
150
200
PLP DTWTable da ga
5 10 15 20 25 30 35
5
10
15
20
25
auto da
5 10 15 20 25
50
100
150
200
PLP da
5 10 15 20 25
50
100
150
200
PLP DTWPath da ga
5 10 15 20 25 30 35
5
10
15
20
25
ga
5 10 15 20 25 30 35
50
100
150
200
da
5 10 15 20 25
50
100
150
200
Manipulation of ga
5 10 15 20 25 30 35
50
100
150
200
MFCC DTWTable ga da
5 10 15 20 25 30 35
5
10
15
20
25
auto ga
5 10 15 20 25 30 35
50
100
150
200
MFCC ga
5 10 15 20 25 30 35
50
100
150
200
MFCC DTWPath ga da
5 10 15 20 25
5
10
15
20
25
30
35
MFCC
ga
5 10 15 20 25 30 35
50
100
150
200
da
5 10 15 20 25
50
100
150
200
Manipulation of ga
5 10 15 20 25 30 35
50
100
150
200
PLP DTWTable ga da
5 10 15 20 25 30 35
5
10
15
20
25
auto ga
5 10 15 20 25 30 35
50
100
150
200
PLP ga
5 10 15 20 25 30 35
50
100
150
200
PLP DTWPath ga da
5 10 15 20 25
5
10
15
20
25
30
35
PLP
Results of Subjective Measurements(MFCC)
da ga ka za sa ta ba pa fa
da == == == <= =>
ga == ==
ka == == =>
za == <=
sa => => == =>
ta <= <= <= ==
ba == =>
pa <= ==
fa ==
Results of Subjective Measurements(PLP) da ga ka za sa ta ba pa fa
da == => == <= =>
ga <= ==
ka == == =>
za ==
sa => => == ==
ta <= <= ==
ba == =>
pa <= ==
fa ==
Future work
Speech Cue Enhancement Speech Cue Verification
References 李育穎 (1999) 語障者語音溝通輔助器的研究及製作,碩士論文,國立成功大學電機工程學系。 葉向林 (2003) 聽障者之語音增強與轉換,碩士論文,國立清華大學電機工程學系。 陳明聰 (2000) ,身心障礙者中文替代鍵盤與輸入法輔助學習系統之設計及應用成效研究,碩士
論文,國立臺灣師範大學特殊教育研究所。 陳雅菁 (1995) 類神經網路在兒童構音異常診斷上之應用,碩士論文,國立成功大學工程科學
研究所。 M.S. Glassman, and M.B. Starkey (1988), "Speech therapy using computer based minimal consonant
pair discrimination," Proceedings of the Annual International Conference of the IEEE in Engineering in Medicine and Biology Society, vol. 4-7, pp1421-1422.
陳信全 (2003) 可攜式語障者發音訓練器之研製,碩士論文,南台科技大學電機工程系。 F. Li and J.B. Allen, (2011), “Manipulation of consonants in natural speech,” IEEE Transactions on
Audio, Speech, and Language Processing 19, 496–504. F. Li, A. Menon, and J.B. Allen, (2010), “A psychoacoustic method to find the perceptual cues of stop
consonants in natural speech,” J. Acoust. Soc. Amer., vol. 127, no. 4, pp. 2599–2610. Bocklet, T.; Maier, A.; Eysholdt, U.; Noth, E.,"Improvement of a speech recognizer for standardized
medical assessment of children's speech by integration of prior knowledge",Spoken Language Technology Workshop (SLT), 2010 IEEE, Page(s): 259 – 264.
Park, S.H.Dept. of Electr. Eng., Yonsei Univ., Seoul Kim,D.J.Lee, J.H.;Yoon, T.S. “Integrated speech training system for hearing impaired”, IEEE,Transactions on Rehabilitation Engineering, 1994
林寶貴,語音障礙與矯治,五南圖書出版社, 1994 。 賴湘君,構音異常的診斷及矯治,語言治療教育專題研討專輯,台北市政府教育局, pp.123~133 ,
1990 。 湯士民,應用錯誤型態分析於英語發音輔助學習,碩士論文,國立成功大學學資訊工程學
系, 2004 。 王小川,語音訊號處理,全華出版社, 2004 K. Resmi, S. Kumar, H.K. Sardana, and R. Chhabra, “Graphical Speech Training system for hearing
impaired,” in Proc. of International Conference of Image Information Processing, pp. 1-6, 2011. E. Eriksson, O. Balter, O. Engwall, A.M. Öster, and H. Kjellström, “Design Recommendations for a
Computer-Based Speech Training System Based on End-User Interviews,” in Proc. Tenth International Conference on Speech and Computers, Patras, Greece, pp. 483-486, 2005.
R. Sirichokswad, P. Chanyagorn, W. Charoensuk, K. Lertsukprasert, K. Wanichthanarak, P. Boonpramook, “Development of Auditory Training Program for Hearing Impaired Children,” in Proc. of The 3rd International Symposium on Biomedical Engineering, Bangkok, Thailand, 2008.
V.S. Selvam, V. Thulasibai, R. Rohini, “Speech Training System Based on Resonant Frequencies of Vocal Tract,” in Proc. of International Conference on Advanced Communication Technology,pp. 674-679, 2011.
American Speech-Language-Hearing Association(2005),”Roles and responsibilities of speech-language pathologists with respect to alternative communication: Position statement Retrieved”, May 31, 2009
Beukelman, D., & Mirenda, P. ,Augmentative and alternative communication: Supporting children & adults with complex communication needs (3rd ed.),”Baltimore, Maryland: Paul H. Brookes,2005