+ All Categories
Home > Documents > ICA 2010 : 20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

ICA 2010 : 20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

Date post: 23-Feb-2016
Category:
Upload: mignon
View: 39 times
Download: 0 times
Share this document with a friend
Description:
ICA 2010 : 20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia [Wed, 25 th Aug, R. 201 Speech processing & communication systems 2, 15.40] Enhancement of Electrolaryngeal Speech by Spectral Subtraction, Spectral Compensation, and Introduction of Jitter and Shimmer - PowerPoint PPT Presentation
Popular Tags:
22
♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl . ♠♠ ◄◄ ►► 1/19 I I T B o m b a y ICA 2010 : 20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia [Wed, 25 th Aug, R. 201 Speech processing & communication systems 2, 15.40] Enhancement of Electrolaryngeal Speech by Spectral Subtraction, Spectral Compensation, and Introduction of Jitter and Shimmer Prem C. Pandey S. Khadar Basha {pcpandey, basha}ee.iitb.ac.in http://www.ee.iitb.ac.in/~spilab IIT Bombay, India
Transcript
Page 1: ICA 2010 :  20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 1/19

IIT

Bom

b ay

ICA 2010 : 20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia[Wed, 25th Aug, R. 201 Speech processing & communication systems 2, 15.40]

Enhancement of Electrolaryngeal Speech by Spectral Subtraction, Spectral Compensation,

and Introduction of Jitter and Shimmer

Prem C. PandeyS. Khadar Basha

{pcpandey, basha}ee.iitb.ac.inhttp://www.ee.iitb.ac.in/~spilab

IIT Bombay, India

Page 2: ICA 2010 :  20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 2/19

IIT

Bom

b ay OVERVIEW

1. Introduction2. Spectral subtraction3. Estimation of noise spectrum4. Jitter, shimmer, & spectral compensation5. Results 6. Conclusion

Page 3: ICA 2010 :  20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 3/19

IIT

Bom

b ay

1 INTRODUCTION

Glottal excitation to vocal tract

Intro. 1/4

Natural speech

Page 4: ICA 2010 :  20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 4/19

IIT

Bom

b ay

Excitation to vocal tract from external vibrator

Electrolaryngeal speech

Intro. 2/4

Page 5: ICA 2010 :  20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 5/19

IIT

Bom

b ay Problems with electrolarynx

• Dynamic control of level, voicing, & pitch not feasible• Background noise due to leakage of acoustic energy, affecting

the intelligibility• Unnatural quality due to

▪ Low frequency spectral deficit ▪ Constant pitch & level

Intro. 3/4

Page 6: ICA 2010 :  20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 6/19

IIT

Bom

b ay Methods of noise reduction

• Acoustic shielding of vibrator (Epsy-Wilson et al 1996)

• 2-input noise cancellation based on LMS algorithm ( Epsy-Wilson et al 1996)

• Single input noise cancellation using spectral subtraction▪ Averaging based noise est. & pitch-synch. generalized spectral

subtraction (Pandey et al 2002)

▪ Quantile based noise estimation (Pandey et al 2004)▪ Parameter adaptation using freq.domain auditory masking (Liu et al

2006)▪ Min.statistics based noise estimation (Mitra & Pandey 2006, Kabir et al

2008)

Intro. 4/4

Page 7: ICA 2010 :  20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 7/19

IIT

Bom

b ay

2 SPECTRAL SUBTRACIONNoise generation (Pandey et al 2002)

• Leakage of vibrations produced by vibrator membrane• Improper coupling of vibrations to the neck tissue

Spec. sub 1/5

Page 8: ICA 2010 :  20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 8/19

IIT

Bom

b ay s(n) = e(n)*hv(n), l(n) = e(n)*hl(n), x(n) = s(n) + l(n)

Xn(ej) = En(ej) [Hvn(ej) + Hln(ej)]

• Assuming hv(n) & hl (n) to be uncorrelated

Xn(ej)2 = En(ej)2[Hvn(ej)2 + Hln(ej)2]

• For short-time spectra calculated using pitch-synchronous window, En(ej)2 may be considered as constant E(ej)2

• During non-speech intervals, s(n) will be negligible,

Xn(ej)2 = |Ln(ej)|2

= |E(ej)|2 |Hln(ej)|2

Spec. sub 2/5

Page 9: ICA 2010 :  20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 9/19

IIT

Bom

b ay Generalized spectral subtraction (Berouti et al 1979) using

FFT E(k) = | Xn(k)|γ - α|Ln(k)|γ

Clean mag. spectrum

|Y n(k)| = [E(k)](1/ γ),  if E(k) > [β|Ln(k)|]γ

β|Ln(k)|,  otherwise

( : subtraction, : spectral floor, : subtraction power)

yn(m) = IDFT [ Yn(k) ejθn(k)]

Spec. sub 3/5

Page 10: ICA 2010 :  20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 10/19

IIT

Bom

b ay

Phase estimation▪ Noisy phase : θn(k) = Xn(k)▪ Zero Phase : θn(k) = 0▪ Random phase: θn(k) = r (uniformly distr over [0, 2π]▪ Min. phase calculation

iterative tech. (Quatieri and Oppenheim 1981),

cepstrum based non-iterative calculation (Oppenheim & Schafer 1975, Rabiner & Schafer 1978, Yegnanarayana & Dhayalan 1981)

▪ Phase set for continuity across the frames θn(k) = θn-1(k) + (2πndk)/N where nd = window shift, N = FFT size

▪ Noisy phase resulted in better quality than others

Spec. sub 4/5

Page 11: ICA 2010 :  20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 11/19

IIT

Bom

b ay

Block diagram of spectral subtraction

Spec. sub 5/5

Page 12: ICA 2010 :  20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 12/19

IIT

Bom

b ay

3 ESTIMATION OF NOISE SPECTRUM• Variation in noise due to change in the electrolarynx orientation• Voice activity detection difficult in electrolaryngeal speech• Averaging based noise est. (Pandey et al 2002) unsuitable for long

term use• Quantile based noise est. (Stahl et al 2000) used for electrol. speech

(Pandey et al 2004) difficult to implement for real-time processing• Minimum statistics based method (Martin 1994) used for elec. lary.

speech (Mitra & Pandey 2006, Kabir et al 2008) not effective with fixed subtraction parameters.

Est. noise 1/1

Page 13: ICA 2010 :  20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 13/19

IIT

Bom

b ay

• Introduction of jitter and shimmer, using LPC based analysis synthesis, after spectral subtraction for reducing unnaturalness• Spectral compensation for low-frequency spectral deficit

4 INTRO. OF JITTER & SHIMMER & SPECTRAL COMPENSATION

Intro. J & S 1/4

Page 14: ICA 2010 :  20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 14/19

IIT

Bom

b ay

Implementation of shimmerImpulse amplitude = a(1+sr1)

a = mean amplitudes = peak-to-peak shimmerr1= random number uniformly distributed over +0.5

Implementation of jitter Impulse repetition period = N(1+jr2)

N = mean pitch period in number of samples j = peak-to-peak jitter

r2 = random number uniformly distributed over +0.5

Intro. J & S 3/4

Page 15: ICA 2010 :  20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 15/19

IIT

Bom

b ay

• Low frequency spectral deficit in electrolaryngeal speech• High frequency spectral emphasis in resynthesized speech due

to impulse train excitation in LPC analysis-synthesis, • Spectral compensation filter designed by comparing LPC

smoothened spectra of natural and resynthesized /a/, /i/, /u/. Inserted in the excitation path for spectral compensation.

Spectral compensation

Intro. J & S 4/4

Page 16: ICA 2010 :  20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 16/19

IIT

Bom

b ay

5 RESULTS Results 1/2

γ = 1Averaging: α=10, β=0.001, Min.: α = 25, β=0.005, Median: α = 1.5, β = 0.001

Materal: “….Where were you a year ago? 1 2 3 4 5 6 7 8 9 10”Electrolarynx Solatone.

Page 17: ICA 2010 :  20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 17/19

IIT

Bom

b ay

Jitter →---------------------Shimmer ↓

0% 6% 12% 20% 40%

0%6%

12%20%40%

Electrolaryngeal speech Enhan. electrolar. speech after spec. sub. with MBNE

Material: “…Where were you a year ago? “, Electrolarynx: Solatone

Results 2/2

(α = 1.2, β = 0.001, γ = 1),

Page 18: ICA 2010 :  20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 18/19

IIT

Bom

b ay 6 CONCLUSION

▪ Median based noise estimation could be used for noise suppression without varying the oversubtraction factor.

▪ Phase estimation based on minimum phase and phase continuity did not imrove the quality above that of the noisy speech.

▪ Introduction of shimmer did not improve speech quality.

▪ Introduction of peak-to-peak jitter of up to 6 % and spectral compensation helped in improving the quality.

Page 19: ICA 2010 :  20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 19/19

IIT

Bom

b ay

Thank You

Page 20: ICA 2010 :  20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 20/19

IIT

Bom

b ay

P. C. Pandey, S. K. Basha, “Enhancement of electrolaryngeal speech by spectral subtraction, spectral compensation, and introduction of jitter and shimmer”, Proc. 20th International Congress on Acoustics ( ICA 2010), 23-27 August 2010, Sydney, Australia.

Abstract -- An electrolarynx, a verbal communication aid used by laryngectomy patients, is a vibrator held against the neck tissue to provide excitation to the vocal tract, as a substitute to that provided by the glottal vibrations. Although the user can set the vibration level and pitch, a dynamic control of level, voicing, and pitch during speech production is not feasible. In addition to this basic limitation, the electrolaryngeal speech suffers from (i) presence of background noise caused by leakage of acoustic energy from the vibrator and vibrator-tissue interface, (ii) low-frequency spectral deficiency, and (iii) unnatural quality due to constant pitch and level. Background noise decreases the intelligibility, while the other two factors affect the speech quality. Present study involved investigations for improving the intelligibility and quality of electrolaryngeal speech. Pitch-synchronous application of generalized spectral subtraction was used for reducing the background noise. In order to track the variation in the spectrum of the leakage noise due to changes in vibrator orientation and pressure during speech production, a dynamic estimation of noise was carried out from a set of past frames. The estimated noise spectrum was subtracted from that of the noisy speech and the resulting magnitude spectrum was combined with the original phase spectrum. The speech signal was resynthesized using overlap-add method, with two-pitch period analysis frames and one period overlap. Estimation of phase spectrum by minimum-phase assumption and the assumption of phase continuity did not improve the speech quality. An introduction of jitter and shimmer in the speech signal, using LPC based analysis-synthesis, was investigated for improving its naturalness. The excitation for synthesis was an impulse train with the frequency equal to that of the vibrator, with random frequency and amplitude modulations for providing the jitter and the shimmer, respectively. An FIR filtering of the excitation was used to match the long-term average spectral envelope of the processed electrolaryngeal speech to that of the normal speech. A peak-to-peak jitter of up to 6 % increased the naturalness, while introduction of shimmer decreased the quality.

Page 21: ICA 2010 :  20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 21/19

IIT

Bom

b ay

REFERENCES1 M. Weiss, G. Y. Komshian, and J. Heinz, “Acoustic and perceptual characteristics of speech produced with an electronic

artificial larynx,” J. Acoust. Soc. Am., 65, 1298-1308 (1979).2 H. L. Barney, F. E. Haworth, and H. K. Dunn, “An experimental transistorized artificial larynx,” Bell Systems Tech. J., 38, 1337-

1356 (1959).3 Q. Yingyong and B. Weinberg, “Low frequency energy deficit in electrolaryngeal speech,” J. Speech Hearing Res., 34, 1250-1256

(1991).4 C. Y. Espy-Wilson, V. R. Chari, and C. B. Haung, “Enhance ment of alaryngeal speech by adaptive filtering,” Proc. ICSLP, 764-771

(1996).5 P. C. Pandey, S. M. Bhandarkar, G. K. Baccher, and P. K. Lehena, “Enhancement of alaryngeal speech using spectral

subtraction,” Proc. 14th Int. Conf. Digital Signal Prcessing (DSP 2002), Santorini, Greece, 591-594 (2002).6 P. C. Pandey, S. S. Pratapwar, and P. K. Lehana, “Enhancement of electrolaryngeal speech by reducing leakage noise using

spectral subtraction with quantile based dynamic estimation of noise,” Proc. 18th Int. Congress Acoustics (ICA 2004), Kyoto, Japan, 3029-3032 (2004).

7 H. Liu, Q. Zhao, M. Wan, and S. Wang, “Application of spectral subtraction method on enhancement of electrolaryngeal speech,” J. Acoust. Soc. Am., 120, 398-406 (2006).

8 H. Liu, Q. Zhao, M. Wan and S. Wang, “Enhancement of electrolarynx speech based on auditory masking,” IEEE Trans. Biomed. Eng., 53, 865-874 (2006).

9 P. Mitra and P.C. Pandey, “Enhancement of electro laryngeal speech by spectral subtraction with minimum statistics-based noise estimation,” J. Acoust. Soc. Amer., 120, 3039 (2006).

10 R. Kabir, A. Greenblatt, K. Panetta, and S. Agaian, “Enhance ment of alaryngeal speech utilizing spectral subtract ion and minimum statistics,” Proc. 7th International Conference on Machine Learning and Cybernetics, Kunming, 12-15 July (2008).

11 S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust., Speech, Signal Process, 27, 113-120 (1979).

12 M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” Proc.IEEE ICASSP’79, 208-211 (1979).

13 V. Stahl, A. Fisher, and R. Bipus, “Quantile based noise estimation for spectral subtraction and wiener filtering,” Proc. IEEE ICASSP’00, 3, 1875-1878 (2000).

Page 22: ICA 2010 :  20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 22/19

IIT

Bom

b ay

14 R. Martin, “Spectral subtraction based on minimum statisic” Proc. 7th European Signal Processing Conf. (EUSIPCO–94), Edinburgh, Scoltland, 1182-1185 (1994).

15 T. F. Quatieri and A. V. Oppenheim,“Iterative techniques for minimum phase signal reconstruction from phase or magnitude,” IEEE Trans. Acoust., Speech, Signal Process., 29, 1187-1193 (1981).

16 B. Yegnanarayana and A. Dhayalan, “Noniterative techniques for minimum phase signal reconstruction from phase or magnitude,” Proc. IEEE ICASSP, 639-642, (1983).

17 A. V. Oppenheim and R. W. Schafer, Digital Signal Processing. (Prentice-Hall, Englewood Cliffs, New Jersey, 1975).18 L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, (Prentice Hall, Englewood Cliffs, New Jersey, 1978).


Recommended