Synopsis 3

1

Synopsis on

Performance Analysis of Combined Wavelet Transform and Artificial Neural Network for Isolated Marathi Digit

Recognition

By

Atul Dattatraya NarkhedeUnder the Supervision of

Dr. Milind Nemade

Faculty of EngineeringPACIFIC ACADEMY OF HIGHER EDUCATION AND RESEARCH

UNIVERSITY, UDAIPUR.

2

SUMMARY

• Abstract• Introduction• Review of Literature• Research Gaps• Scope of Research• Research Objectives• Hypothesis• Tools & Techniques• Research Plan• Tentative Chapter Flow• References

3

ABSTRACT

Speech processing is useful for various applications such as mobile applications, healthcare, automatic translation, robotics, video games, transcription, audio and video database search, household applications, language learning applications etc.

A speech recognition system has two major components, namely, feature extraction and classification.

There are two dominant approaches of acoustic measurement. First is a temporal domain or parametric approach and second approach is nonparametric frequency domain approach.

4

The objective of our research is to investigate the combined performance of wavelet transform and artificial neural network (ANN) for isolated Marathi digits so as to improve accuracy of speech recognition system.

We proposed in to derive effective, efficient, and noise robust features from the frequency subbands of the frame. Each frame of speech signal is decomposed into different frequency subbands using discrete wavelet transform (DWT) and each subband is further classified using (ANN).

The effective speech features and classification would improve quality of the speech, represent speech signal in terms of frequency and bandwidth and improves speech recognition.

5

INTRODUCTION

The performance of a speech processing system is usually measured in terms of recognition accuracy.

All speech recognizers include an initial signal processing front end that converts a speech signal into its more convenient and compressed form called feature vectors.

Feature extraction method plays a vital role in speech recognition task.

The wavelet transform, with its flexible time-frequency window, is an appropriate tool for the analysis of non stationary signals like speech.

6

In speech signal, high frequencies are present very briefly at the onset of a sound while lower frequencies are present latter for long period.

DWT resolves all these frequencies well. The DWT parameters contain the information of different frequency

scales. This helps in getting the speech information of corresponding

frequency band. Artificial Neural Network (ANN) is an efficient pattern recognition

mechanism which simulates the neural information processing of human brain.

7

The computational intelligence of neural networks is made up of their processing units, characteristics and ability to learn.

In learning the system parameters of NN vary over time and are characterized by their ability of local and parallel computation, simplicity and regularity.

A wavelet transform is an elegant tool for the analysis of non-stationary signals like speech.

The results have shown that this hybrid architecture using Discrete wavelet transforms and Neural networks could effectively extract the features from the speech signal for automatic speech recognition.

8

REVIEW OF LITERATURE

Remarkable Observations in the review of work are as follows

Speech features which are usually obtained via Fourier transforms (FTs), Short time Fourier transforms (STFTs) or Linear Predictive Coding techniques are used for some kind of Automatic speech/speaker recognition (ASR). They may not be suitable for representing speech/voice.

In spite of the improvement on the computation time for FFT, the recognition time of the proposed systems is still too long to be used for real-time applications.

Multi-core and parallel processing for the speech recognition algorithm are necessary to further improve the recognition time and is worthwhile to examine in the research.

9



The conventional approaches like Mel frequency cepstrum coefficients (MFCC) and linear predictive coefficients (LPC) focus on spectral features limited to lower frequency bands.

Best recognition was found from the DWT decomposition when compared to the MFCCs for speaker independent and speaker dependent tasks respectively.

Wavelet transform approaches provided good results in clean, noisy and reverberant environments and also has a much lower computational complexity.

10



Wavelet decomposition results in a logarithmic set of bandwidths, which is very similar to the response of human ear to frequencies.

Wavelet transform efficiently locates the spectral changes in speech signal as well as beginning and end of the sounds can also be located.

Results show that hybrid architecture using discrete wavelet transforms and neural networks could effectively extract the features from the speech signal for automatic speech recognition.

11



Artificial neural network performance depends on the size and quality of training samples.

The simplification of the ANN architecture without reducing the recognition rate can also speed up the recognition time.

Improving the recognition accuracy of the system by combining the multiple classifiers.

12

RESEARCH GAPS

Feature extraction and classification are major components, plays vital role in speech recognition systems. So efficient representation of speech features & its classification is required for speech recognition systems.

To improve accuracy of speech recognition system, we can use hybrid architecture consist of Wavelet Transform (WT) and Artificial Neural Network (ANN).

ANN architecture can be simplified without reducing the recognition rate.

Isolated Marathi Digit should be recognised speedily by speeding up the recognition time.

13

SCOPE OF RESEARCH

The scope of our research is limited to investigate the combined performance of Wavelet Transform (WT) and Artificial Neural Network (ANN) for feature extraction and classification for isolated Marathi digits.

TOOLS & TECHNIQUES

Matlab / Simulink Programming Language

14

RESEARCH OBJECTIVES

The objective of our research is to investigate the combined performance of Wavelet Transform (WT) and Artificial Neural Network (ANN) for Isolated Marathi Digits so as to improve accuracy of speech recognition system.

To derive effective, efficient, and noise robust features from the frequency sub bands of the frame using discrete wavelet transform.

Each frame of speech signal is decomposed into different frequency sub bands using discrete wavelet transform.

Classification of each sub band using artificial neural network (ANN). Determination of accuracy of speech recognition system.

15

ISOLATED DIGITRECOGNITION

16

RESEARCH METHODOLOGY

17

Hypothesis

The objective of our research is to investigate the combined performance of wavelet transform and artificial neural network (ANN) for isolated Marathi digits so as to improve accuracy of speech recognition system.

Tentative Chapter Flow

1. Introduction to Speech Processing, Wavelet Transform & ANN2. Speech Feature Extraction using Wavelet Transform (WT)3. Speech Feature Classification using Artificial Neural Network (ANN)4. Performance analysis of Speech Feature Extraction and

Classification Techniques5. Results & Conclusion

18

ActivityPhase

I

Phase II

Phase III

Phase IV

Phase V

Phase VI

Literature SurveyStudy of Software Tools like MATLAB/SIMULINK, Neural Network Toolbox and its MATLAB link Survey of Existing Methods and AlgorithmsSuggesting techniques for removing limitations in existing algorithms Simulation of combined strategies Comparing results of developed strategies with existing algorithmsPerformance evaluation and implementationDocumentationReview & Research Paper Preparation & Presentation/PublicationThesis Framework Preparation & Submission

RESEARCH PLAN

19

REFERENCES

[1] T. F. Quatieri, “Discrete Time Speech Signal Processing”, Pearson Education, 2002.

[2] R. M. Rao, A. S. Bopardikar, “Wavelet Transform”, Pearson Education, 2005.

[3] J. M. Zurada, “Introduction to Artificial Neural Network”, West, 1992.

[4] Yoshua Bengio, Renato De Mori, Regis Cardin, “Speaker Independent Speech Recognition with Neural Networks and Speech Knowledge”, Department of Computer Science McGill University, pp.218-225, 1990.

[5] Bhiksha Raj, Lorenzo Turicchia, Bent Schmidt-Nielsen, and Rahul Sarpeshkar, “An FFT-Based Companding Front End for Noise-Robust Automatic Speech Recognition”, EURASIP Journal on Audio, Speech, and Music Processing, vol.2007, pp.1-13, 2007.

[6] Adam Glowacz, Witold Glowacz, Andrzej Glowacz, “Sound Recognition of Musical Instruments with Application of FFT and K-NN classifier with Cosine Distance ”,AGH university of Science and Technology, 2010.

20

REFERENCES

[7] Gil Lopes, Fernando Ribeiro, Paulo Carvalho, “Whistle Sound Recognition in Noisy Environment”, Universidade do Minho, Departamento de Electrónica Industrial, Guimarães, Portugal.

[8] Shing-Tai Pan, Chih-Chin Lai and Bo-Yu Tsai, “The Implementation of Speech Recognition Systems on FPGA-Based Embedded Systems with SOC Architecture”, International Journal of Innovative Computing, Information and Control, vol.7, no.11, pp.6161-6175, November 2011.

[9] Hemant Tyagi, Rajesh M. Hegde, Hema A. Murthy and Anil Prabhakar, “Automatic Identification of Bird calls using Spectral Ensemble Average Voice Prints”, 14th European IEEE Signal Processing Conference, pp. 1 – 5, 2006.

[10] Dwijen Rudrapal, Smita Das, S. Debbarma, N. Kar, N. Debbarma, “Voice Recognition and Authentication as a Proficient Biometric Tool and its Application in Online Exam for P.H People”, International Journal of Computer Applications (0975 – 8887), vol.39,no.12, pp.7-12, February 2012.

[11] Asm Sayem, “Speech Analysis for Alphabets in Bangla Language:Automatic Speech Recognition”, International Journal of Engineering Research, vol.3, no.2, pp.88-93, February 2014.

21

REFERENCES

[12] W. M. Campbell, J. P. Campbell, D. A. Reynolds, E. Singer, P. A. Torres-Carrasquillo, “Support Vector Machines for Speaker and Language Recognition”, in Elsevier Journal of Computer Speech & Language, vol. 20, issue 2/3, pp 210 – 229, 2006.

[13] Siddheshwar S. Gangonda, Dr. Prachi Mukherji, “Speech Processing for Marathi Numeral Recognition using MFCC and DTW Features”, International Journal of Engineering Research and Applications (IJERA), pp.218-222, March 2012.

[14] Wahyu Kusuma R., Prince Brave Guhyapati V., “Simulation Voice Recognition System for controlling Robotic Applications”, Journal of Theoretical and Applied Information Technology,vol.39, no.2,pp. 188-196, May 2012.

[15] Thiang and Suryo Wijoyo, “Speech Recognition Using Linear Predictive Coding and Artificial Neural Network for Controlling Movement of Mobile Robot”, International Conference on Information and Electronics Engineering, vol.6, pp.179-183, 2011.

[16] Bishnu Prasad Das, Ranjan Parekh, “Recognition of Isolated Words using Features based on LPC, MFCC, ZCR and STE, with Neural Network Classifiers”, International Journal of Modern Engineering Research, vol.2, pp.854-858, May-June 2012.

22

REFERENCES

[17] P. Zegers, “Speech recognition using neural network”, MS Thesis, Department of Electrical & Computer Engineering, University of Arizona, 1998.

[18] Paul A.K., Das D., Kamal M.M., “Bangla Speech Recognition System Using LPC and ANN”, 7th IEEE International Conference on Advances in Pattern Recognition”, pp. 171-174, 2009.

[19] Firoz Shah. A, Raji Sukumar. A and Babu Anto. P, “Discrete Wavelet Transforms and Artificial Neural Networks for Speech Emotion Recognition”, International Journal of Computer Theory and Engineering, vol. 2, no. 3, pp.319-322, June 2010.

[20] Jeih-Weih Hung, Hao-Teng Fan , and Syu-Siang Wang, “Several New DWT-Based Methods for Noise-Robust Speech Recognition”, International Journal of Innovation, Management and Technology, vol. 3, no. 5, pp.547-551, October 2012.

[21] Jagannath H Nirmal, Mukesh A Zaveri, Suprava Patnaik and Pramod H Kachare, “A novel voice conversion approach using admissible wavelet packet decomposition”, EURASIP Journal on Audio, Speech, and Music Processing, pp 1 – 10, 2013.

[22] T. B. Adam, M. S. Salam, T. S. Gunawan, “Wavelet Cesptral Coefficients for Isolated Speech Recognition”, Telkomnika, vol.11, no.5, pp.2731-2738, May 2013.

23

REFERENCES

[23] Sanja Grubesa, Tomislav Grubesa, Hrvoje Domitrovic, “Speaker Recognition Method combining FFT, Wavelet Functions and Neural Networks”, Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia.

[24] Mohammed Anwer and Rezwan-Al-Islam Khan, “Voice identification Using a Composite Haar Wavelets and Proper Orthogonal Decomposition”, International Journal of Innovation and Applied Studies, vol. 4, no. 2, pp.353-358, October 2013.

[25] Marco Jeub, Dorothea Kolossa, Ramon F. Astudillo, Reinhold Orglmeister, “Performance Analysis of Wavelet-based Voice Activity Detection”, NAG/DAGA-Rotterdam, 2009.

[26] Beng T Tan, Robert lang, Hieko Schroder, Andrew Spray, Phillip Dermody, “Applying Wavelet Analysis to Speech Segmentation and Classification”, Department of Computer Science.

[27] Bartosz Zioko, Suresh Manandhar, Richard C. Wilson and Mariusz Zioko, “Wavelet Method of Speech Segmentation”, University of York Heslington, YO10 5DD, York, UK.

24

REFERENCES

[28] N. S. Nehe, R. S. Holambe, “New Feature Extraction Techniques for Marathi Digit Recognition”, International Journal of Recent Trends in Engineering, Vol 2, No. 2, November 2009.

[29] Sonia Sunny, David Peter S, K Poulose Jacob, “Discrete Wavelet Transforms and Artificial Neural Networks for Recognition of Isolated Spoken Words”, in International Journal of Computer Applications, volume 38, No.9, pp 9 – 13, January 2012.

[30] N. S. Nehe, R. S. Holambe, “DWT and LPC based feature extraction methods for isolated word recognition”, EURASIP Journal on Audio, Speech, and Music Processing, vol.2012, pp.1-7, 2012.

[31] Engin Avci, Zuhtu Hakan Akpolat, “Speech recognition using a wavelet packet adaptive network based fuzzy inference system”, in Elsevier Expert Systems & Applications, vol 31, pp 495 – 503, 2006.

25

Thank you

26

Training phase accepts speech samples from different people and trains the system to create acoustic models for each word in vocabulaey.TP undergoes through two stages Data preparation & Recording data.Verification Phase display some random numbers then check for pronouns number. Some time system consists of speech processing inclusive of digit boundary and recognition which uses zero crossing and energy techniques. Mel Frequency Cepstral Coefficients (MFCC) vectors are used to provide an estimate of the vocal tract filter. Meanwhile dynamic time warping (DTW) is used to detect the nearest recorded voice.The general methodology of audio classification involves extracting discriminatory features from the audio data and feeding them to a pattern classifier. Different approaches and various kinds of audio features were proposed with varying success rates. The features can be extracted either directly from the time domain signal or from a transformation domain depending upon the choice of the signal analysis approach. Some of the audio features that have been successfully used for audio classification include Mel Frequency Cepstral Coefficients (MFCC).

27

MFCCs are commonly derived as follows:1.Take the Fourier Transform of (a windowed excerpt of) a signal.2.Map the powers of the spectrum obtained above onto the Mel Scale, using triangular overlapping windows.3.Take the logs of the powers at each of the Mel frequencies.4.Take the discrete cosine transform of the list of Mel log powers, as if it were a signal.5.The MFCCs are the amplitudes of the resulting spectrum.

28

Date post:	15-Apr-2016
Category:	Documents
Upload:	atul-narkhede
View:	228 times
Download:	1 times

Synopsis 3

Documents