8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 1/117
McMaster University
DigitalCommons@McMaster
EE 4BI6 Electrical Engineering BiomedicalCapstones
Department of Electrical and ComputerEngineering
4-23-2010
Design of a Limited Speech Recognition Systemfor use in a Braille Teaching Device
Brett Lindsay McMaster University
This Capstone is brought to you for free and open access by the Department of Electrical and Computer Engineering at DigitalCommons@McMaster.
It has been accepted for inclusion in EE 4BI6 Electrical Engineering Biomedical Capstones by an authorized administrator of
DigitalCommons@McMaster. For more information, please contact [email protected].
Recommended CitationLindsay, Brett, "Design of a Limited Speech Recognition System for use in a Braille Teaching Device" (2010). EE 4BI6 Electrical Engineering Biomedical Capstones. Paper 34.http://digitalcommons.mcmaster.ca/ee4bi6/34
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 2/117
Design of a LimitedSpeech Recognition System
for use in a
Braille Teaching Deviceby
Brett Lindsay
Electrical and Biomedical EngineeringFaculty Advisor: Dr. Thomas E. Doyle
Electrical and Biomedical Engineering Project Reportsubmitted in partial fulfillment of the degree of
Bachelor of Engineering
McMaster UniversityHamilton, Ontario, Canada
April 23, 2010
Copyright c April 2010 by Brett Lindsay
1
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 3/117
Abstract
The report here submitted defines the scope and content of the Electrical and Biomedical Engineering
Capstone Project as submitted by Brett Lindsay. This project involved the creation of a limited Speech
Recognition system for use in a Braille Teaching Device. The greater project (that of the Braille
Teaching Device) was completed in tandem with Messrs. Chris Agam and Jonathon Hernandez. It was
felt that the Speech Recognition component would be a valuable addition to the project due to the
nature of a teaching device for use by the visually impaired (who would need an assistant to use said
device). The Speech Recognition system was creating by breaking the problem into four subsections:
the collection of data upon call by the teaching program, the manipulation of data, the recognition
algorithms to categorize said data, and the passing of results back to the teaching program. For the
recognition block, the relatively simple method of Dynamic Time Warping was chosen over more
complex options such as Hidden Markov Models or Neural Networks. This method presented some
problems as documented, specifically a tendency to favour letters with larger file sizes (such as 'w').
The Speech Recognition system created during the course of this project failed to deliver on the wanted
efficiency of 60 % and low as possible false positives. While the Speech Recognition presented is
viable, the effectiveness is below that which can be found in market for comparable price.
2
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 4/117
Acknowledgements
Chris Agam was a student at McMaster University in the Electrical and Biomedical Engineering
program and was a member of the group creating a Braille Teaching Device. His project was the
physical device itself. He provided the idea for the project.
Jon Hernandez was a student at McMaster University in the Electrical and Biomedical Engineering
program and was a member of the group creating a Braille Teaching Device. His project was the
programming of the micro controller as well as software for use with the device. He took part in the
creation of the communication between his software, Mr. Agam's device, and Mr. Lindsay's speech
recognition system.
Billy Taj was a student at McMaster University in the Mechatronics Engineering program and provided
additional (basic) feedback in the testing of the device.
Dr. Thomas Doyle was a professor at McMaster University and functioned as the faculty adviser for the
duration of the project.
3
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 5/117
Contents
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . ii
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
TABLE OF CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... iv
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . …vi
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . …vii
NOMENCLATURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .viii
1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21.3 General Approach to the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Scope of the Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2
2 Literature Review 32.1 Speech Recognition Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Common Methods of Implementing Speech Recognition . .. . . . . . . . . . . . . . . . . . . . . . . . 32.3 Spectrograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.4 Comparable Project Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Problem and Methodology of Solutions 93.1 Statement of Problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2 Methodology of Solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.3 Data Collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10
3.4 Data Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .113.4.1 Normalization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.4.2Windowing. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.4.3 Cepstral Filtering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.5 Recognition Algorithm: Dynamic Time Warping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .143.5.1 Spectrogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .143.5.2 Match Matrix. .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.5.3 DTW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.5.4 Library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.6 Returning Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Design Procedures 17
4.1 Speech Recognition Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.3 Data Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3.1 Normalization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .194.3.2Windowing. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.3.3 Cepstral Filtering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19
4.4 Recognition Algorithm: Dynamic Time Warping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 6/117
4.4.1 Spectrogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .214.4.2 Match Matrix. .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .224.4.3 DTW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .224.4.4 Library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24
4.5 Returning Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .245 Testing Results and Discussion 25
5.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .255.2 Data Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .255.2.1 Normalization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.2.2Windowing. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .265.2.3 Cepstral Filtering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.3 Recognition Algorithm: Dynamic Time Warping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .275.3.1 Spectrogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .275.3.2 Match Matrix. .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.3.3 DTW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.3.4 Library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.4 Returning Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376 Conclusions and Recommendations 38
6.1 Conclusions on Project Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386.2 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Appendix A: Computer Software Design Tools 39Appendix B: Additional Testing Notes 40Appendix C: Code of Software Elements 71References 107Vitae 108
5
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 7/117
List of Tables2.1 Results from [8]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85.1 Time Difference DTW/DTWTHREE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .355.2 Average Time and Size of wav/txt. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .365.3 Results of Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .37
6
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 8/117
List of Figures1.1 Braille Representations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1 Neural Network and HMM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 DTW Simplified . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 DTW Simplified Part Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4 Mathematical Equations of Spectrogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5 Spectrogram of an Audio Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.1 Speech Recognition Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2 Data Aquisition Tutorial Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.2 Hamming Window in MatLab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.4 Visualization of Cepstral Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.5 Visualization of Match Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .143.6 Visualization of Distortion Matrix Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.1 Speech Recognition Program. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .174.2 Select Code From Recorder.m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.3 Select Code From normalizer.m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.4 Select Code From hamWindow.m & usefullSig.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.5 Select Code From cepAnal.m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.6 Select Code From Comparison Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.7 Select Code From specCreate.m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.8 Select Code From matchMat.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.9 Select Code From DTW.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.10 Visualization of Faster Trace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.11 Select Code From DTWTHREEm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.12 Select Code From .libCreat.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .244.13 Select Code From speechRec.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.1 Phases of Data Manipulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.2 Timing Measurements of Data Manipultations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.3 Spectrograms. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.4 Timing Measurements of Pattern Recognition .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.5 Results and c Values DTW Original . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.6 C Values for DTW Original . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.7 C Values Plotted Against File Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.8 Workings of matchMat and DTW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .325.9 C Values for DTW w Trace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.10 Results and c Values DTW w Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.11 Visualization of DTW vs DTW w Breaking Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.12 Speed of Pattern Recognition for Library Sample Size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .365.13 Time and Size of wav/txt. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .37
7
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 9/117
NomenclatureDelimiter: a character used to separate independent pieces of data in a text files or data streams.
DTW: Dynamic Time Warping
HMM: Hidden Markov Models
m-file: the format of MatLab files.
NN: Neural Networks
Phoneme: The smallest unit of sound. ie the sound 'ahh' or 'eee'.
Quefrency: a pseudo time domain resulting from Cepstral analysis.
Spectrogram: A representation of the frequencies native to a small portion of time in a signal.
SR: Speech Recognition
8
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 10/117
1 Introduction
1.1 Background
The greater group project, a collaboration between Chris Agam, Jon Hernandez, and myself, is a Braille
teaching device to be used by the visually impaired. In the USA and across the world Braille literacy
numbers are staggeringly low; as of 2009 only ten percent of blind children in America are Braille
literate [12].
Braille itself is a form of writing for the blind, consisting of six "cells" arranged two by three. By
raising dots in various cells via various combinations, one creates letters. For example, in Figure 1.1
below can be seen the letters 'a' and 'p', where the black dots represent the raised bumps.
The scarcity of those fluent in the language makes it a prime candidate for an electronic teaching
device, allowing people to simply plug in and learn. This interactive nature vastly improves upon the
teaching capabilities of a book and assistant, as the assistant will most likely not be fluent in Braille
themselves. A teaching program will therefore allow the assistant a great deal more ability in assisting
the learner.
Further, an important aspect of teaching methods and/or devices is the testing of the pupil on the
subjects being learned. The fact that the pupil will be blind poses a challenge. While possible for the
1
Figure 1.1: The Braille representations of
the letters 'a' and 'p'.
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 11/117
teaching program to allow an assistant to act as a supervisor of testing, a better solution would be direct
interaction between user and program.
To this end, this project is focusing on the development of efficient and lean speech recognition
software that will allow the user to test themselves as they learn. By independently creating our ownsoftware we cut down on cost as well as on the superfluous abilities of commercially available software
(which focus on continuously reconstructing large, complex sentences independent of the speaker).
1.2 Objectives
The objective of the project is the creation of speech recognition software for the Braille Teaching
Device. The method will roughly follow the steps outlined by Jawed, et al [8] in their creation of a
similar system. Their results of a 68% efficiency gave me confidence to say that a minimum 60%
efficiency is deliverable.
It is also necessary for the program to run as fast as possible in order to be useful. It should take, on
average, no more than two seconds (plus recording time) to run.
1.3 General Approach to the Problem
The problem was to create a limited speech recognition program. This was to be achieved using
Mathwork's MATLAB programming environment. After researching the various methods of speech
recognition in use today, it was decided upon to use the simpler Dynamic Time Warping method. This
would allow for confidence in the ability of myself to complete the project, as opposed to other
methods which could have proven too difficult to implement.
1.4 Scope of the Project
The scope of the project was necessarily limited. As this project was being undertaken individually
there was some worry as to the complexity due to the fact that speech recognition projects are rather
difficult for even professional entities to create. Therefore, it was deemed that the software would only
recognize thirty entries: a-z of the alphabet as well as commands 'enter', 'yes', 'no', and 'back'.
2
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 12/117
2 Literature Review
2.1 Speech Recognition Basics
Six diverse articles have been noted that cover the breadth of speech recognition theory and
implementation. Articles like [7] provided general information about the basics, while those like [2]
and [10] provided background on areas of speech recognition that will not necessarily be used in the
project but help build a full understanding of the options open. References [1] and [11] have provided a
background in the implementation of DTW in respects to speech recognition. [8] is the best piece, as it
outlines the general steps used to create a speech recognition software package similar to this project.
Three books were also looked at. [5] focused on the human creation and recognition of speech, and
while providing background was less useful from a practical standpoint. [4] was helpful in
understanding the concept of cepstral analysis as well as the need for a non-rectangular window in thedata-manipulation phase. [6] provided information on a broad range of topics in a more practical sense
then the other books (though still majorly theoretical).
The field of speech recognition can be broken down into discrete or continuous recognition, as well as
speaker independent or dependent. Discrete systems require the user to pause between sounds, while
continuous systems operate without breaks [7]. Speaker dependence requires the user to have done
some training with the system to allow it to recognize the user, where as independent systems will work
regardless of user speech patterns, tones, et cetera (eg. automated phone services) [7]. Discrete,
dependent systems are the easiest to create.
2.2 Common Methods of Implementing Speech Recognition
For the actual speech recognition component of the system, there are three main methods found in
literature. The first is the Hidden Markov Model (HMM). It is a mathematical model where the future
state’s likelihood is dependent on the current state where the states are unobserved [2]. It is complex
and very good at identifying speech that is slurred and accented (as in reality, where the computer will
be unable to identify most information passed in and has to construct sentences out of what little it did
understand). It also far exceeds the complexity of the project’s goals, and so will not be used.
Also common to speech recognition system are Neural Networks (NN). [2] Features in a context
3
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 13/117
window are run through a system of weighted nodes, the output of which is a classification of each
input frame, measured in terms of the probabilities of phoneme-based categories
Dynamic Time Warping (DTW), the final method found, will be used in stead of the previously
mentioned. This involves the modification of the input data’s temporal characteristics to fit within the
realms of a standard template, followed by (relatively) simple matching techniques [11]. In reality, this
is achieved by taking the entered and manipulated data and creating a spectrogram. One then takes the
data one wishes to compare the input with, and creates a spectrogram of it as well. Next, a local match
matrix is created, defined as the cosine difference between the points in the two spectrogram matrices.
From this local match matrix, one can trace through the "path of least resistance" to get the quickest
path through. It is then a simple matter to use this value in a comparison structure to match an input
signal against a variety of template signals and achieve a "best fit".
The concept of all three methods can be a little difficult to wrap one's head around. HMMs and NNs
were not used, so the project no longer concerns itself with their in-depth workings. DTW is a simple
enough concept once one takes the time to simplify the example. For example, look at Figure 2.2. Here,for the purpose of demonstration, rather than matching audio signals one will use letters. On the left is
an example of "CHRIS" matched against "CHRIS", while on the right we find "CHRIS" matched
against "JON".
4
Figure 2.1: Diagrams of Neural Networks (left) and Hidden Markov Models (right).
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 14/117
The local match here is also simplified, and created with the "distance" away from a letter being equal
to 1. So in the left matrix, one can see in the bottom left that it starts at 0 as both axis have the same
value (blank). As one goes up the column, the value of the match gets further away from the wanted
value (blank) and so continuously increases. At the point where both axis are the same value (along the
centre diagonal) the match matrix continues to have values of zero due to these being matches, while
pushing away from the diagonal continually increases due to increased mis-match..
On the right portion of Figure 2.2, where "CHRIS" is matched against "JON", one can comparatively
see what it is like when there are no matches between the letters in the words being tested. The further
into the comparison, the more mis-match there is.
In Figure 2.3 below, there is a comparison of two signals which are closer to being similar then
"CHRIS" and "JON". Both "CHRIS" and "KRIIS" share the feature of ending in "IS". Note how, while
the values of the match matrix increase along the first three mis-matched letters, the final two are
matched and so carry the current value along.
5
Figure 2.2: Example of function of DTW using names Chris and Jon.
Figure 2.3: Example of function of DTW using names Chris and a misspelling of Chris, "Kriis".
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 15/117
A more in-depth discussion of the mathematics of DTW are gone into in section 3.5.
After the local match matrix is created, there are two methods for creating a comparison between
signals. The best is to use the "final" value in the local match matrix as the definition of the best match.
(In Figure 2.3, this would be the value in the top right corner. In the actual code created, this final valuewill be represented in the bottom right corner - see sections 4, 5) . In Figure 2.3, the two comparisons
work out to have a best of '0' for "CHRIS" vs "CHRIS" and a best of '3' for "CHRIS" vs "KRIIS". One
would therefore deem that the left comparison is the best match, and predict that that was the word
said.
Another method is to trace through the the local match matrix and use the length of this trace as the
definition of the best match. This has some practical advantages over the other method, as
demonstrated in the results section of this report (section 5). However, there are some large negatives to
this method. As one can see in the example presented in Figure 2.3, the trace of least resistance for
matching "CHRIS" against both "CHRIS" and "KRIIS" are the same, despite the fact that one is a
much better match then the other.
These values will need to be normalized to a value which is equal for matrices of various sizes, as there
is the obvious problem of larger signals taking more steps and therefore producing larger final values.
2.3 Spectrograms
Understanding spectrograms is necessary to understanding the project. In the explanation of DTW the
input signals "CHRIS" etc. were somewhat glossed over. In actual practice, the match matrix will be
created by comparing the spectrograms of two audio signals. A spectrogram is a representation of the
power spectral density inherent to an audio signal over time. That is to say, it is the magnitude of the
frequencies native to a point in time of a signal. The exact mathematical formulas involved in it's
creation are seen in Figure 2.4.
6
Figure 2.4: Mathematical equations for creation of a Spectrogram
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 16/117
The STFT stands for Short Time Fourier Transform. This works by taking the Fourier Transform of the
signal x(t) for only one short area at a time. This area is determined by the windowing function w(t-
tau). The windowing function w slides along the signal, so as to zero everything in the signal except for
a very small part at which one wishes to find the frequency components. By sliding the window, taking
the Fourier, sliding the window, taking the Fourier, etc. one builds up a series of frequency values forthe specific small amounts of time.
This can hopefully be understood via Figure 2.5. This figure shows an audio signal (in y-magnitude vs
x-time) and it's resulting spectrogram (magnitude of y-frequency at x-time). One can see for the first
pixel in the x-time - a time with very little signal - there are only smaller y-frequencies (less than
1kHz). But if you take a pixel from the x-time closer to 0.2s, one can see that that pixel's corresponding
frequencies are much larger (up to 4kHz).
7
Figure 2.5: Spectrogram of an audio signal.
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 17/117
2.4 Comparable Project Results
[8] has some testing that has allowed for the gauging beforehand of the type of efficiency results that
are achievable. Reproduced in the below Table 2.1 are their results.
8
Table 2.1: Results for project [8].
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 18/117
3 Problem and Methodology of Solutions
3.1 Statement of Problem
The basic problem is the identification of speech. The goal of the project as stated in the Proposal was
for the software to recognize a combination of discrete speaker dependent commands (eg. 'Enter') and
discrete speaker independent characters (eg. 'a') for testing.
The speech recognition system was creating by breaking the problem into four subsections: the
collection of data when signaled by the teaching program, the manipulation of data, the recognition
algorithms to categorize said data, and the passing of results back to the teaching program.
3.2 Methodology of Solutions
There are four basic steps in the speech recognition system. The initial phase will be data acquisition;the entering of data into the computer system from the user. Following this, the data must then be
manipulated into a usable form. After this, robust recognition algorithms must be used to match the
input data with data saved in library to correctly identify the sound. Finally, the identified sound must
be passed out to the teaching program.
9
Figure 3.1: Flow diagram of the Speech Recognition blocks.
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 19/117
3.3 Data Collection
Data collection was achieved through the MatLab Data Acquisition Toolbox. This toolbox allows one
to interact with Microsoft windsound and take in audio signals directly from a microphone installed on
the computer in use. The toolbox will automatically bring in this data and store it as a workable matrix
in the MatLab environment.
While it is possible to do continuous recognition with the Data Acquisition Toolbox, it had already been
decided upon to build a discrete system. This would involve the use of triggers and set samples. It was
decided that a good length of time to allow the user to input would be 3 seconds. This was chosen as it
would allow the user enough time to say the letter even if they were somewhat unprepared.
The beginning and end of the data collection were deemed to necessitate a noise, to inform the user that
it had begun/stopped recording.
The Data Acquisition Toolbox came with a tutorial in it's use. The sample code provided was a good
place to start in learning how to use said device. Below is said code.
This is relatively simple and makes recording audio very simple. The first line is to set the type of
analog input being used - in this case winsound. One could modify this to be viable with a number of
comparable softwares for use on other systems (such as Mac Python).
The Sample Rate sets the sampling frequency (in Hz) and the Samples Per Trigger can be used to set
the length of the signal to be recorded (in this case, 3/8s). Once all of the wanted parameters are set,
one starts the analog input and the recording is done for you. This is then put into a matrix via the
getdata function, in a form that one can easily manipulate.
10
ai = analoginput('winsound');
addchannel(ai, [1 2]);
set(ai, 'SampleRate', 8000);
set(ai, 'SamplesPerTrigger', 3000);
set(ai, 'TriggerType', 'immediate');
start(ai);[data,time] = getdata(ai);
Figure 3.2: Data Aquisition Toolbox tutorial code.
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 20/117
3.4 Data Manipulation
3.4.1 Normalization
Normalization is the process by which the signal is brought into a range consistent with expected
values. Original research into the creation of a normalization algorithm lead to the thought that it would
require analysis of the signal in order to find the peak value, followed by a reduction of the signalsamplitude. As in, one would have to run through the entire signal, record the maximum value, and then
go about the entire signal once again and reduce every point based on this maximum.
This posed the problem of being quite computationally wasteful, and as such initial thoughts were put
into a means by which this could be done at the same time as the program was checking the signal
values for the necessary windowing (see 3.4.2).
An alternative was created when looking for a way to maximize the potential provided by using the
MatLab program instead of another programming environment. MatLab has the advantage of being
built around the quick manipulation of whole matrices, and as such it was realised that one would be
able to normalize a signal by merely dividing by the built in max function.
3.4.2 Windowing
Windowing is the process of dividing the signal into small sections to be looked at independently of
one another, and is simple to achieve (multiply the signal by zero except at point of interest). For this
program, it is assumed that the only region of interest is the letter being spoken. As such, it is not
necessary to window the signal multiple times - one need only determine where the useful portion of
the signal is and cut away everything else.
A function will be created to handle the extraction of the useful signal from the total three seconds of
input data. A rough form of zero crossing will be used to determine when a useful signal has begun and
ended. This involves the checking for a certain level - the "zero" - to be crossed.
Once the useful signal has been extracted, the harsh cutoff at the edges poses a problem. These will
create a frequency signal approaching infinite. As the later pattern recognition stages depend on
creating spectrograms, this could be a problem (see 2.3 for description of spectrograms) [4]. As such,
11
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 21/117
one is required to use a window which is capable of removing the high frequencies at the edges while
not wrecking the frequency information present in the useful signal. Techniques for this include use of
a Hamming window. A Hamming Window can be described by the equation:
w[n] = {0.54-0.46cos(2pi*n/(N-1) 0<=n<=N-1
{0 otherwise
MatLab has a Hamming Window function built in, and so this will be used for ease (rather than using
the above formula). Figure 3.3 is a visualization in MatLab of a Hamming Window both in the time
domain and the frequency domain.
3.4.3 Cepstral Filtering
Ceptral Analysis involves the use of the Inverse DFT to separate the person’s characteristic vocal tract
sounds from the actual speech. The process for Cepstral Analysis has been well detailed via
information from [4], and should not be difficult to do from basic knowledge in MatLab coding
techniques.
12
Figure 3.3: Hamming Window in time and frequency domain as in MatLab
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 22/117
Cepstral filtering is very useful in the creation of a speech recognition system as one is required to
match speech, as opposed to voices. As such, the removal of sound distinctive to the user's vocal tract
will improve on the abilities of the pattern recognition.
Cepstral filtering works as follows:The audio signal of one's voice has two components - the vocal excitation source (s) and the
vocal tract source (v). These two sources form the signal via a convolution such that:
f(t) = v(t)*s(t)
In order to remove the unwanted v(t), we take the Fourier Transform to get the frequency
domain representation, where a convolution in time becomes a multiplication.
|F(f)| = |V(f)|x|S(f)|
We can then make the two distinct by using the properties of the logarithm.
ln(|F(f)|) = ln(|V(f)|) + ln(|S(f)|)
If one now takes the inverse Fourier Transform, one ends up with a representation of the
original signal in what is termed the "quefrency" where the two signals have been separated (an
addition in frequency is an addition in time). The quefrency is in units of time, but it is no
longer an accurate representation of time, hence the new name. The movement into the
quefrency is visualized in Figure 3.4.
The wanted s components of human speech are known to reside in the lower quefrencies. In Figure 3.4,
the spike at a quefrency of ~8.5 ms is the v component, and can be filtered out.
Native to the MatLab coding environment are the functions cceps and icceps. These perform the
forward and inverse cepstral transformations of a signal into and out of the quefrency domain.
13
Figure 3.4: Visualization of the steps involved in Cepstral Filtering.
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 23/117
3.5 Recognition Algorithm: Dynamic Time Warping
The methodology of creating the recognition algorithm will be as such:
1. Audio signal has been input and manipulated into a form which is better for pattern
recognition. Now, it's spectrogram will be created.
2. The creation of a match matrix using the spectrograms of the input signal and of the variousreference signals stored in library.
3. The DTW process on the match matrix to get a value of relationship between the two signals.
3.5.1 Spectrogram
The concept of the spectrogram was detailed in section 2.3. MatLab allows one to easily create a
spectrogram of a signal with the built in specgram function.
3.5.2 Match Matrix
The match matrix is the overlay of two signals' spectrograms. This is done by finding the cosine
distance of the angle between two vectors for each point in the matrix [3], and is an example of a form
of Euclidean distance [8]. In Figure 3.5, for example, pixel (1,1) was found using the vector A(:,1) and
B(:,1) where A and B are the matrices of the two spectrograms being compared. Pixel (1,2) was created
from A(:,1) and B(:,2), and so on. It is also important to normalize the value in this matrix back down
to reasonable level [3].
14
Figure 3.5: Visualization of Match Matrix
Match Matrix of two audio signals; "a" vs "garbage"
2 4 6 8 10 12 14
2
4
6
8
10
12
14
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 24/117
3.5.3 DTW
Data recognition utilizing DTW has been detailed through various sources, mainly [3],[6], & [9]. The
method in [3] involves the modification of the input and reference signals into their respected
spectrograms before DTW. The formula as given by [6] to solve the cumulative distortion measure:
D(i,j)=d(i,j)+minp(i,j){D[p(i,j)]+T[(i,j),p(i,j)]}Where d is a local measure between frame i of the input and j of the reference, p is the coordinates of
possible predecessors, and T is the associated cost of the transition. This matches well to the formula
given by [3]: D(i+1,j+1)=M(i+1,j+1)+min{M(i,j),M(i+1,j),M(i,j+1)}
The basics of what this formula means is the creation of the D (distortion) matrix from the M (match)
matrix. When one is creating the distortion matrix D(i,j), one begins at position (1,1) and sets this to a
null value. The D(i+1,j+1) value is then created using the match matrix M(i+1,j+1) as the basis, but
adding the value of the lowest "jump" to a progressive pixel.
Simplified, if one is creating the distortion point D(4,2) one begins with the match point M(4,2). One
then looks at the values of M(3,1), M(4,1) and M(3,2) and adds the lowest - the quickest way to get
there. This can be seen in figure 3.6. One can then trace through the distortion matrix to find the
quickest path, and use this value as a means of comparison.
There was also thought put in to a way to create a faster way to trace through the distortion matrix, via
breaking away once outside certain bounds. It's effectiveness would need to be tested to see if the time
saved by breaking early would be more then the time incurred by the added code.
15
Figure 3.6: Visualization of Distortion Matrix Creation
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 25/117
3.5.4 Library
The library will be stored in a file in with the m-files, so that the speech recognition program can easily
access it. There will need to be a simple bit of code created capable of entering new files into the
library.
There were two main choices for the way in which to store the data:
1. Save the signals as Microsoft wav files using the MatLab function wavwrite, and access them
using the MatLab function wavread. Convert each accessed vector into it's spectrogram every
time it is accessed.
2. Save the spectrograms of the signals as delimited text files using the MatLab function
dlmwrite, and access them using dlmread.. Covert each vector into it's spectrogram matrix
only once, before it's saved.
The thought process is that saving as a delimited text file should logically take the program less time to
access the spectrogram - as it won't have to convert it every time - compared to saving as a wav. This
will come at the expense of the library being a larger size, as the spectrogram matrix is much larger in
size then the signal's wave vector. Testing will be needed to determine the better method.
3.6 Returning Results
Upon program activation, the teaching program will pass the value of the entry attempting to be
recognized. Data Output will involve returning a signal of (correct) “character recognized”, “failure to
recognize”, or the (incorrect) recognized character (1-26=a-z; 27-30=Enter, Yes, No, Back) to the
teaching program. Outputs returned were to be sent in the form of:
• 1-30 - incorrect character, outputs the results of pattern recognition (1-30).
• 50 - no satisfactory match.
• 100 - correct character.
Original thought into the interaction of the MatLab speech recognition and the C# teaching program
was to use the MatLab Builder for .Net. This would have created a wrapper to allow the MatLab code
to be run in C#. Mr. Hernandez found another way to access this, using a C# program which only
required the m-files to be in the same directory. This was used in stead.
16
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 26/117
4 Design Procedures
4.1 Speech Recognition Program
Figure 4.1 is the program speechRec.m. In this section, the design of it's components will be outlined
by taking selected code from the relative m-files. For the full code, see Appendix C.
17
function out = speechRec(in)
[audioIn fs]= recorder(); %get audio signal
audioIn = normalizer(audioIn);
audioIn = usefullSig(audioIn);
audioIn = cepAnal(audioIn);
audioIn = hamWindow(audioIn);
audioIn=specCreate(audioIn,fs);
%Comparison loop.
numLibEnt=30; %number of library entries, 1-30
%1-26 being alphabet, 27-30 being enter, yes, no, back.numLibSam=2; %number of library samples (ie. 0 to 5 entries of 'a').
cmin=500; %minimum comparison excepted.
c=cmin;
ctemp=0; %#ok<NASGU>
cp=0.7071; %From experimental data, if the DTW block produces a value of
%0.7071 then this is a perfect match. This value is normalised
%for any size difference, et cetera.
r=0; %r is the variable for which is the current lowest match c.
%if r stays as 0, we therefore never achieved a c lower than
%min and don't have a match.
for m=1:numLibEnt
for n=0:numLibSam
[x fs]=wavread(['Library/lib' int2str(m) int2str(n) '.wav']);
Y=specCreate(x,fs);
M=matchMat(audioIn,Y);
ctemp=abs(DTW(M)-cp);
if (ctemp<c)
c=ctemp;
r=m;
end
end
end
%returning block.
if r==0
out=50;
elseif r==in;out=100;
else
out=r;
end
return
Figure 4.1: m-file code used for Speech Recognition Program
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 27/117
4.2 Data Collection
Data collection was designed through the function recorder.m. This function took in no arguments, and
output the recorded signal (as a 1xn vector) as well as the sampling frequency used to record the audio.
In Figure 4.2 one can see some of the code used created to do this. The sampling frequency was setpermanently to be 8.192kHz - this is a standard value for inputting sound using winsound devices. The
time to record was also permanently set to 3 seconds. If one wished to change this, they'd have to go in
to the code to modify it. Allowing these to be changed via in input was considered, but ultimately
discounted as pointless.
The analogue input was set to be of the type winsound, and only recorded on one channel. Sample Rate
was set to fs and Samples Per Trigger set to t*fs ( to record for the the wanted time). Trigger Type set to
manual - meaning that it would begin when told, as opposed to other options such as triggering on a
rising edge. Trigger Repeat was set to 0, so that there was no repeat.
Out was the data recorded from the analog input.
18
fs = 8192; %in Hz, default sampling frequency for sound(), etc.
t=3; %in s, number of seconds to record for.
ai_length = t*fs;
% Set up MatLab Oscilloscope / Winsound Analoginput
ai = analoginput('winsound');addchannel(ai, 1);
set(ai, 'SampleRate', fs);
set(ai, 'TriggerType', 'manual');
set(ai, 'TriggerRepeat', 0);
set(ai, 'SamplesPerTrigger', ai_length);
% Get data from the microphone
beep on;
beep;
start(ai);
trigger(ai);
data = getdata(ai);
beep;
delete(ai);
out = data; %return the audio input.
Figure 4.2: m-file code used to input audio signals.
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 28/117
4.2 Data Manipulation
4.3.1 Normalization
Normalization was done via the function normalizer.m. This function took in an assumed (1xn) vector
and normalized it to a maximum value of 0.5 via the code seen in Figure 4.3. Then returns the vector.
4.3.2 Windowing
Windowing was done via two functions: hamWindow.m and usefullSig.m. In Figure 4.4 one can see the
key aspects of both. The Hamming Window was created using the window function native to MatLab.
The useful signal extractor was done by running through the signal from both ends, as can be seen in
the sampled code. When the magnitude of the signal is above a threshold, this value is recorded as the
value at which to clip, minus an offset. These end values are stored in a and b, and the ends passed
these values are chopped off. Note that as and ab will be set to 0 once a value is found, ensuring that no
second value will be recorded (as the if statement will always be false).
4.3.3 Cepstral Filtering
Cepstral filtering was achieved via the creation of the function cepAnal.m. Figure 4.5 shows the
important code: the pushing of the audio signal into the quefrency, creation of a mask to remove
unwanted quefrecies, then the return to the time domain.
19
x = 0.5*x/max(abs(x));
Figure 4.3: m-file code used to normalize.
w=window(@hamming,length(x)); x=x.*w;
____________________________________
for i=1:l
if (as && abs(x(i,1))>thresh)
a=i-os;
as=0;
end
if (bs && abs(x(l-i,1))>thresh)
b=l-i+os;
bs=0;
end
end
Figure 4.4: m-file code used to window.
c=cceps(x);
pass=int16(length(c)/6); mask=ones(length(c),1);
mask(pass:length(c)-pass,1)=mask(pass:length(c)-pass,1)-1; c=c.*mask;
x=icceps(c);
Figure 4.5: m-file code used to perform Cepstral Filtering.
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 29/117
4.4 Recognition Algorithm: Dynamic Time Warping
Figure 4.6 is the main comparison work of the program speechRec.m. The first step in this code is to
define the size of the library. There are two main components to this: the number of library entries, and
the number of samples. For the purpose of testing this program, a library with 30 entries had been
created, with entries 1-26 corresponding to a-z, as well as the four commands "Enter", "Yes", "No", and"Back" (27-30, respectively). There were three samples of each entry (0-2).
It was decided that the variable to store the best match would be called "c". A cmin was then establish,
being the minimum acceptable value which would be recorded. If no c being returned in later stages
could been the cmin, then it means that there was no recognizable input. A ctemp was also made to
hold returned c values temporarily. Finally, cp (p for perfect) and r were initialized. The cp value was
found through testing to be 0.7071 - that is to say that if the DTW finds a perfect match, it will return a
value of 0.7071 (see 4.4.3). The r is a variable which will hold the entry number which the current best
c falls under, and will be used in the return phase.
The algorithm itself is very simple. There are two nested for loops which will run through every sample
20
%Comparison loop.
numLibEnt=30; %number of library entries, 1-30
%1-26 being alphabet, 27-30 being enter, yes, no, back.
numLibSam=2; %number of library samples (ie. 0 to 5 entries of 'a').
cmin=0.05; %minimum comparison excepted.
c=cmin;
ctemp=0; %#ok<NASGU>
cp=0.7071; %From experimental data, if the DTW block produces a value of
%0.7071 then this is a perfect match. This value is normalised
%for any size difference, et cetera.
r=0; %r is the variable for which is the current lowest match c.
%if r stays as 0, we therefore never achieved a c lower than
%min and don't have a match.
for m=1:numLibEnt
for n=0:numLibSam
[x fs]=wavread(['Library/lib' int2str(m) int2str(n) '.wav']);
Y=specCreate(x,fs);
M=matchMat(audioIn,Y);
ctemp=abs(DTW(M)-cp);
if (ctemp<c)
c=ctemp;
r=m;
end end
end
Figure 4.6 m-file code for comparison loops.
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 30/117
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 31/117
4.4.2 Match Matrix
The match matrix is constructed using matchMat.m as seen in Figure 4.8. The two input spectrograms
A and B are manipulated in order to gain a normalized match matrix via the method described in 3.5.2.
4.4.3 DTW
DTW.m accepts the match matrix as an argument and returns a value of match goodness. It begins by
creating the distortion matrix as described in 3.5.3, as can be seen in Figure 4.9. As discussed in 2.2, the
value in the bottom right corner of the distortion matrix can be used for classification purposes.
Unfortunately, as seen in section 5.3 there were difficulties with this method relating to the size of the
library entries and an inability to effectively normalize them. As such, the second method for
classifying match goodness discussed in 2.2 had to be used. This involved tracing through the
distortion matrix from it's top left corner using the phi values (which stored weather the path of least
resistance was be going right, down, or right and down in one step).
This value can then be easily normalized for varying sizes of match matrices by dividing the trace
length by the diagonal (as a perfect trace should be a line straight down the diagonal). The method of tracing code used (see Appendix C) resulted in a perfect match returning a value - after normalization,
or 0.7071.
22
sA= sqrt(sum(A.^2));
sB = sqrt(sum(B.^2));
M = (A'*B)./(sA'*sB);
Figure 4.8: m-file code used to create match matrix.
%create matrix and variables.
for i=1:m
for j=1:n
[dmax,tb]=min([D(i,j),D(i,j+1),D(i+1,j)]);
D(i+1,j+1)=D(i+1,j+1)+dmax;
phi(m,n)=tb;
end
end
% Tracing Code: see Appendix C.
out=out/sqrt(m^2+n^2); %divide by diagonal so that all answers are equally
weighted.
Figure 4.9: m-file code used to do DTW.
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 32/117
Also created was an attempt to speed up the DTW.m by stopping the trace is one went "out of bounds".
As one knows that a good match will go roughly right down the centre diagonal, one can then postulate
that if the trace is running outside a certain area, it can immediately be discounted as a poor match.
Figure 4.10 visualizes this. On the left is an ideal good region in the middle [6], while on the right is a
somewhat simpler means of implementing the concept.
In the breaking code used in the function DTWTHREE.m, seen in Figure 4.11, the p value is the
vertical position of the trace, and the q is the horizontal. To implement what's seen in Figure 4.10, one
takes the vertical size of the distortion matrix (ie 14). One then assumes that a sixth of this value (ie~2)
is how close one wants the vertical trace to remain to the centre. If the vertical value (p) is greater or
less than this distance from the ideal line (the diagonal), then it is out of bounds and the trace is ended
early. The ideal vertical point p for horizontal point q is found by finding the angle of the idea diagonal
(tan-1(opposite/adjacent)), then finding opposite=adjacent*tan(angle) where the adjacent is q.
23
ideal=q(1,1)*tan(atan(m/n));
ideal1=ideal+m/6;
ideal2=ideal-m/6;
if (p(1,1)>ideal1)
i=-1; %easy way to stop the while loop.
p=m*n; %Some high value.
end
if (p(1,1)<ideal2)
i=-1; %easy way to stop the while loop.
p=m*n; %Some high value.
else
Figure 4.11: m-file code used to do modified trace.
Figure 4.10: Visualizing the idea behind a faster approach.
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 33/117
4.4.4 Library
The library was created manually with the function libCreate.m as seen in Figure 4.12.
4.5 Returning Results
Results were returned at the end of speechRec via the code seen in Figure 4.13. The r value (as noted insection 4.4) is the library entry for which the input audio best matches. If the r remained as a 0 through
the program, this means that no library entry was close enough to warrant a match and so the output
will be 50 (the designated code number for "no recognition" as understood by Mr. Hernandez's teaching
software). If the r value matches what the teaching software told it upon starting is the correct answer,
is will output 100 (the designated code for "recognition of correct value"). If neither of these are the
case, it will return the value of the library entry for which it thought it recognised.
24
fs=8192;
audioIn = recorder(); %get audio signal
audioIn = normalizer(audioIn);
audioIn = usefullSig(audioIn);
audioIn = hamWindow(audioIn);
audioIn = cepAnal(audioIn);
libNum=input('Please input number to be associated with file (ie 10-
999).','s'); %String.
wavwrite(audioIn,fs,['Library/lib' libNum '.wav']);
Figure 4.12: m-file code used to create library entries.
if r==0out=50;
elseif r==in;
out=100;
else
out=r;
end
Figure 4.13: m-file code used to return results.
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 34/117
5 Results and Discussion
5.1 Data Collection
As seen in Figure 5.2, the recorder.m function had a response time of a little over three seconds. This is
expected, as the input was set to record for three second. It functioned as designed (section 4).
5.2 Data Manipulation
Figure 5.1 allows one to view the appearance as it runs through the stages of data manipulation. It
functioned as designed (section 4).
25
Figure 5.1: The phases of data manipulation.
0 1 2 3
x 104
-0.4
-0.2
0
0.2
0.4
0.6
Time Axis
S i g n a l M a g n i t u d e
Normalized sound
0 500 1000 1500 2000-0.4
-0.2
0
0.2
0.4
0.6
Time Axis
S i g n a l M a g n i t u d e
Useful sound
0 500 1000 1500 2000-0.5
0
0.5
Time Axis
S i g n a l M a g n i t u d e
Post Cepstral Filtering
0 500 1000 1500 2000-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Time Axis
S i g n a l M a g n i t u d e
Window`d (hamming) sound
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 35/117
Cutting away everything but the useful signal creates variable response times. In Figure 5.2, one can
see the times taken by the various functions (y-axis) based on the size of the input wav in kB (x-axis).
5.2.1 Normalization
normalizer.m times increased linearly with increased wav size. It functioned as designed (section 4).
5.2.2 Windowing
usefullSig.m times appeared strange due to the tested files already having been run through
usefullSig.m at their point of creation. The time of usefullSig.m in reality is always constant, and is
based on the length of the audio signal (ie 3 seconds * 8192 samples per second). It functioned as
26
Figure 5.2: Timing measurements for input files of differing sizes.
0 5 10 150
1
2
3
4x 10
-4 Times of nomalizer.m vs Size
Size in kB
t
( i n s )
0 5 10 150
1
2
3
4x 10
-4 Times of usefullSig.m vs Size
Size in kB
t ( i n s )
0 5 10 150
0.005
0.01
0.015
0.02Times of capAnal.m vs Size
Size in kB
t ( i n s )
0 5 10 150
0.5
1
1.5x 10
-3Times of hamWindow.m vs Size
Size in kB
t ( i n s )
0 5 10 150
1
2
3x 10
-3 Times of specCreat.m vs Size
Size in kB
t ( i n s )
1 2 3 40
1
2
3
4Times of recorder.m for Four trials
Trial
t
( i n s )
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 36/117
designed (section 4).
hamWindow.m times increased linearly with increased wav size. It functioned as designed (section 4).
5.2.3 Cepstral FilteringcepAnal.m times increased linearly with increased wav size. It functioned as designed (section 4).
5.3 Recognition Algorithm: Dynamic Time Warping
5.3.1 Spectrogram
specCreat.m times increased linearly with increased wav size. It functioned as designed (section 4).
Figure 5.3 shows the spectrograms of three sounds: those of 'c', 'b', and 'w'. Note how 'c' and 'b' are
rather similar, while 'w' appears quite different. This is becomes a minor problems, as when one is
testing for recognition of 'b', one will often see close matches with 'c', 'd', 'e' and other letters which
share similar sounds.
27
Figure 5.3: Examples of Spetrograms
created by specCreate.m for letters 'c', 'b',
and 'w'.
Time
F r e q u e n c y
Spectrogram of Input Sound `c`
0.05 0.1 0.15 0.2 0.250
1000
2000
3000
4000
Time
F r e q u e n c y
Spectrogram of Close Library Sound `b`
0.05 0.1 0.15 0.20
1000
2000
3000
4000
Time
F r e q u e n c y
Spectrogram of Far Library Sound `w`
0.1 0.2 0.3 0.40
1000
2000
3000
4000
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 37/117
5.3.2 Match Matrix
matchMat.m times increased linearly with increased wav size as seen in Figure 5.4. It functioned as
designed (section 4).
5.3.3 DTW
DTW.m times increased linearly with increased wav size as seen in Figure 5.4.
There were numerous challenges in the creation of DTW. Following is a truncated discussion of the
results of the early versions of DTW, as well as the changes which were made as a result. For the full
thought process (in rough formed notes) refer to Appendix B.
28
Figure 5.4: Timing measurements for input files of differing sizes.
2 4 6 8 10 12 140
2
4
6x 10
-3 Times of matchMat.m vs Size
Size in kB
t ( i n s )
2 4 6 8 10 12 140
0.005
0.01
0.015Times of DTW.m vs Size
Size in kB
t ( i n s )
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 38/117
Originally, as documented in 4.4.3 and 2.2, the DTW.m used the bottom right value of the distortion
matrix as value of match goodness. Initially, the diagonal length of the match matrix was attempted to
be used as a normalizing factor. When testing began to determine the efficiency of the program, results
were extremely poor. A constant feature noticed was the tendency for the speech recognition program
to recognize the input speech as 'w' 90% of the time [14]. It was theorized that the normalization viadivision of the diagonal was not working as wanted.
Figure 5.5 from Test File 5 (see Appendix C) involved the running of all library entries through the
pattern recognition code. This guaranteed a perfect match, and if the code was working the way it was
designed to, each perfect match would return a consistent c value.
As can be seen, the c values vary wildly. The same experiment was attempted repeatedly, with different
values in place of the diagonal as an attempt to find a means of normalizing the c values. Theseincluded:
• No normalization.
• Multiplication by diagonal.
• Division/Multiplication by area.
29
Figure 5.5: Results of testing audio files against themselves to see results, as well as c values returned.
x-axis is the corresponding library entrance (1-3=a, 4-6=b, etc.)
0 20 40 60 80 10099
99.2
99.4
99.6
99.8
100
100.2
100.4
100.6
100.8
101Results (100=match)
0 20 40 60 80 1000
0.5
1
1.5
2
2.5
3
3.5x 10
-16 Value of C variable for match
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 39/117
The results of these further tests were much like Figure 5.5.
Figure 5.6 from testFileSeven (see Appendix C) involved taking a known letter (w=23, a=1) and
running it through the pattern recognition code. This test recorded the c values returned by every
library entry sample.
One can see that for testing of w (the left of Figure 5.6) the c values very wildly, though there is some
pattern due to the three sample for each entry having more in common with other entries than samples
(ie. entry 1 and 2 sample 1 are more similar than entry 1 sample 1 and 2) due to different people
creating them.
The minimum value is the perfect match - zero, as expected by the theory from the literature (section
2.2). The x-axis of Figure 5.6 represents the library entry samples, and 67-69 refers to w's.
In the right portion of Figure 5.6, the results for the testing of a are seen. While the perfect match isfound at a (as expected), the next best match is at w with a c value of 0.0027, despite a and w sounding
nothing alike. It should be noted that w @ 69 is also the largest file in the library.
While a perfect match will return the correct recognized character, even the slightest variation will be
30
Figure 5.6: Testing for known w (23) and a(1) to see what sort of c values are returned for all
library entries. x-axis is the corresponding library entrance (1-3=a, 4-6=b, etc.)
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 40/117
beaten by w. It can therefore be surmised that the failure to find an effective way to normalize the
bottom right value of the distortion matrix is responsible for the extremely poor efficiency and
extremely high number of false positives (almost all of which are w).
Figure 5.7 is from testFileEight (see Appendix C). It allows the visualization of the c values given
when 'a' is run through the pattern recognition, plotted along side the relative size of the library
entrances.
C values are in red, sizes in blue. One can see that - other than for a perfect match - the best c values all
correspond to the largest data files. [Note that Figure 5.7 was created using area division in the DTW.]
This allows one to conclusively say that the reason for the "w problem" is that the larger file size needs
to be normalized in some way. A method to do this was not found, resulting in the necessity of
31
Figure 5.7: Value of c returned for entry of a (1) in red plotted against the proportional size of the input
file. x-axis is the corresponding library entrance (1-3=a, 4-6=b, etc.)
0 10 20 30 40 50 60 70 80 900
5
10
15
20
25
30
35
40
45
50
Value of C (red) v Size(b) for 1
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 41/117
changing to the second method of determining match goodness as described in 2.2 and 4.4.3: the trace
back through the distortion matrix as seen in Figure 5.8.
32
Figure 5.8: Demonstrating the workings of matchMat.m and DTW.m. Left column is the local
match matrix for perfectly, somewhat, and poorly matches input spectrograms ('b'&'b', 'b'&'c',
and 'b'&'w' respectively). To the right of these are representations of the path of least resistance
taken by the DTW block in order to find a best match value.
Perfectly Matching Input Specs
2 4 6 8 10 12 14
2
4
6
8
10
12
14
Match Matrix (left), Quickest Path (right)
2 4 6 8 10 12 14
2
4
6
8
10
12
14
Somewhat Matching Input Specs
5 10 15
2
4
6
8
10
12
14
5 10 15
2
4
6
8
10
12
14
Poorly Matching Input Specs
5 10 15 20 25
2
4
6
8
10
12
14
5 10 15 20 25
2
4
6
8
10
12
14
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 42/117
Figure 5.9 below allows for the visualization of the returned c values using the trace method. It was
created using a (1) as the input signal to match against, and was run through the pattern recognition of
only one sample range (hence the 1-30 x-axis, as opposed to the 1-90 x-axis seen in previous figures).
One can see that with this setup, a perfect match is the value of 0.7071. One can see in Figure 5.9 thatlibrary entries that are like 'a' are also found around this value, while library entries which are far off
(like 'w') are quite distant to this value. This greatly reduced the number of false positives being found
and cured the "w problem", coinciding with a noticed improvement in efficiency.
This method did however have some unfortunate problems, as can be seen in Figure 5.10. The problem
is simple and very hard to correct: in the portion of the pattern recognition as detailed in section 4.4
there is included a piece of code ctemp=abs(DTW(M)-cp). This is necessary to find a value close to 0
33
Figure 5.9: the returned c values for the traceback method using a=1 as the input.
0 5 10 15 20 25 300
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
library entry
c v a l u e
c values for library entries
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 43/117
to be the best match goodness measure, so that the value can be compared with the previous best match
goodness.
As can be seen in Figure 5.10, where a similar test to that as seen in Figure 5.5 is run using the new
method, when testing a file in the library through the pattern recognition it does not always equal amatch! The reason for this is that the DTW(M)-cp has a minimum value of 6.781186547510920e-06; it
won't calculate 0 even when given a perfect match. This results in situations where more than one
library entry will give this value, meaning that the recognized character is the last time this happened
instead of the exact match.
While it would have been nice to fully solve this problem, the solution to this point was deemed good
enough for the purposes of this project.
34
Figure 5.10: Results of testing audio files against themselves to see results, as well as c values
returned. x-axis is the corresponding library entrance (1-3=a, 4-6=b, etc.)
0 20 40 60 80 1000
20
40
60
80
100Results (100=match)
0 20 40 60 80 1006.7812
6.7812
6.7812
6.7812
6.7812x 10
-6 Value of C variable for match
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 44/117
In 4.4.3 a method for possibly speeding up the tracing was outlined. It involved breaking out of the
trace while loop if one ran out of certain bounds while tracing. In testFileTwenty (see Appendix C) a
large number of DTW and DTWTHREE (DTW+breaking code) are run and timed. The results can be
seen in Table 5.1 for three runs of testFileTwenty.
The addition of code to catch out of bounds appears, over an extreme sample, less efficient (or
negligibly different) than not having the code.
From testFileTwentyOne was created Figure 5.11, to help visualize this. It test the time to run DTW
and DTWTHREE when first doing a known perfect match (x=1), then a known poor match which
would trigger code (x=2).
Conclusion: the increased chance of having done something wrong is not worth the negligible benefit.
35
Table 5.1: Results of testing for time difference between DTW (t0) and DTWTHREE (t1-with breaking
code) over a large number of averaged runs.
Figure 5.11: Visualisations of time it takes for DTW (blue) and DTWTHREE (red). Done for a known
perfect match (x=1), then a known poor match which would trigger code (x=2).
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 20
0.5
1
1.5
2x 10
-3 DTW in blue, DTWTHREE in red
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 45/117
5.3.4 Library
testFileNine was a simple test on time it takes to do comparison algorithms vs how many samples are
in the library. Figure 5.12 shows this to have a linear relationship.
Testing was also done on the two options discussed in 3.5.4, repeated here:
There were two main choices for the way in which to store the data:
1. Save the signals as Microsoft wav files using the MatLab function wavwrite, and access them
using the MatLab function wavread. Convert each accessed vector into it's spectrogram every
time it is accessed.
2. Save the spectrograms of the signals as delimited text files using the MatLab function
dlmwrite, and access them using dlmread.. Covert each vector into it's spectrogram matrix
only once, before it's saved.
Testing concluded in the results seen in Table 5.2 and Figure 5.13.
36
Figure 5.12: Times to run pattern recognition vs size
of library (defined as number of samples).
0 2 4 6 8 10 12 14 16 18 200
0.02
0.04
0.06
0.08
0.1
0.12Time to run through comparison vs number of samples in library
Number of samples
t ( i n
s )
Table 5.2: Average Time (T) of opening and Size (S) of wav and txt files.
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 46/117
Unexpectedly, it was found that not only did option 1 save on file size (as expected) it also operated
much faster. This means that the function dlmread is actually slower then wavread and specCreate
combined. An interesting result.
5.4 Returning Results
Due to constraints in finding an efficient way to do speech recognition results were rather poor, with
efficiency around 25% as seen in Table 5.3. Interestingly, once the tester was able to get into a groove
of saying the letter in a certain way (obviously matching a library entry), efficiency could spike to
100% as seen in the second w testing. Efficiency would therefore be improved with a larger library.
37
Figure 5.13: Results of testing comparing size (bottom) and speed (top) of stored .wav's
(red) vs. .txt (blue) files producing spectrograms.
1 1.5 2 2.5 3 3.5 40
0.005
0.01
0.015
0.02
0.025
t ( i n
s )
Trial number
.wav in red, .txt in blue
1 1.5 2 2.5 3 3.5 40
20
40
60
80
100
120
s i z e
( i n
k B )
Trial number
Table 5.3: Results of Speech Recognition. Top row is trial number, values are those returned.
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 47/117
6 Conclusions and Recommendations
6.1 Conclusions on Project Objectives
The efficiency of ~25% was far below the wanted efficiency of 60%. However, it was determined that
this was due to the small library size (only three samples per entry). With a larger number of samples in
the library, it can be stated with confidence that efficiency would improve.
The use of Dynamic Time Warping for speech recognition was proven to be a viable method. That said,
it's returns are poorer then one would hope due in large to the fact that it is very difficult to normalize
the match goodness values.
Two interesting and unexpected discoveries were made during the course of the project. The first
pertains to the creation of the library for use in DTW pattern recognition. While it was known thatsaving files in the Microsoft wav format would save physical space compared to saving as a delimited
text file, it was assumed that not having to convert into a spectrogram after reading would give the
delimited text file an edge in computational speed. However, testing showed that the function dlmread
(when reading in the spectrogram) was in fact slower then the combination of wavread (reading in the
recorded audio) plus specCreate.m (which converts the audio into its spectrogram).
The other discovery was in the attempt to increase the speed of the DTW by putting a check in trace
code. It was thought that by finishing once an out of bounds situation was achieved, it might be
possible to speed up the computational time. However, it was found that any speed gained from
breaking early was negligible. Whether this was due to the increased time of checking the if statements,
or if the process was already fast enough that any difference was statistical noise was not explored.
6.2 Recommendations
While the Speech Recognition presented is viable, the effectiveness is below that which can be found in
market for comparable price.
38
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 48/117
Appendix A: Computer Software Design Tools
C#
C# is a multi-paradigm programming language encompassing object-oriented (class-based)
programming disciplines. It is a Microsoft product within the .NET initiative.
Microsoft Visual C# 2008 Express Edition was used during the creation of this project (mainly the
component created by Mr. Hernandez). It was a free program with registration.
Mathwork's MATLAB
MATLAB stands for "Matrix Laboratory" and is a numerical computing environment developed by
MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation of
algorithms, and interfacing with programs written in other languages. It excels in the manipulation of matrices.
A 2007 Student Edition was used during the creation of this project. It is available from Mathworks at a
cost of $99USD. [I already had a copy.]
Data Acquisition Toolbox
From Mathworks: "Data Acquisition Toolbox™ software provides a complete set of tools for analog
input, analog output, and digital I/O from a variety of PC-compatible data acquisition hardware. The
toolbox lets you configure your external hardware devices, read data into MATLAB® and Simulink®
environments for immediate analysis, and send out data."
It is available from Mathworks for $29USD.
39
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 49/117
Appendix B: Additional Testing NotesNote: In online copy only. For the physical copy, these have been removed. If one wishes to view thiscode, they may contact the writer for information into such.TESTING AFrom Test File 5
If the normalization was working as I wanted it to, this should be consistent value.
Try removing the divided by in DTW. Results:
The division is not what is causing the problem.
40
0 20 40 60 80 10099
99.2
99.4
99.6
99.8
100
100.2
100.4
100.6
100.8
101
Results (100=match)
0 20 40 60 80 1000
0.5
1
1.5
2
2.5
3
3.5
x 10-16 Value of C variable for match
0 20 40 60 80 10099
99.2
99.4
99.6
99.8
100
100.2
100.4
100.6
100.8
101Results (100=match)
0 20 40 60 80 1000
1
2
3
4
5
6
7
8x 10
-15 Value of C variable for match
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 50/117
Want to check consistency, will return division and see if it is as before, then work on creating bettersystem.
So, yes. Good, at least there's no funny bug. Consistent results.
Second replacement, using area instead of diagonal value.
Area and diagonal produce basically same results.
41
0 20 40 60 80 10099
99.2
99.4
99.6
99.8
100
100.2
100.4
100.6
100.8
101Results (100=match)
0 20 40 60 80 1000
0.5
1
1.5
2
2.5
3
3.5x 10
-16 Value of C variable for match
0 20 40 60 80 10099
99.2
99.4
99.6
99.8
100
100.2
100.4
100.6
100.8
101Results (100=match)
0 20 40 60 80 1000
1
2
3
4
5
6
7
8
9x 10
-17 Value of C variable for match
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 51/117
TESTING BFrom testFileSevenTesting to see what kind of c values a single letter gets.
Okay, we see 69 matching (ie good match for w).
42
0 10 20 30 40 50 60 70 80 900
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018Value of C variable for 23
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 52/117
Look at a:
3 matches at 4.9343e-019, which is expected. But next closest is 69 at 0.0027. So here's the w problem.
43
0 10 20 30 40 50 60 70 80 900
0.005
0.01
0.015
0.02
0.025
0.03
Value of C variable for 1
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 53/117
TESTING CFrom testFileEight.Want to look at correspondence between c value and file size.
Can actually see an anti-correlation. Note that this is using an area division in the DTW.
44
0 10 20 30 40 50 60 70 80 900
5
10
15
20
25
30
35
40
45
50Value of C (red) v Size(b) for 1
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 54/117
Going to remove this and try again.
Again, anti-correlation between file size and the c value being produced.
That is to say, bigger files are producing smaller c values. Hmmmm.
Also not that 'w's are some of the biggest files, perhaps accounting for the tendency to run to 0.
45
0 10 20 30 40 50 60 70 80 900
2
4
6
8
10
12
14
16Value of C (red) v Size(b) for 1
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 55/117
What if I multiplied instead? Here are the results if DTW multiplied by area:
We see that now we have correlation, which we don't want either.
46
0 10 20 30 40 50 60 70 80 900
2000
4000
6000
8000
10000
12000
14000Value of C (red) v Size(b) for 1
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 56/117
Testing DFrom testFileNine.Simple test on time it takes to do comparison algorithms vs how many samples are in the library.
47
0 2 4 6 8 10 12 14 16 18 200
0.02
0.04
0.06
0.08
0.1
0.12Time to run through comparison vs number of samples in library
Number of samples
t ( i n s )
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 57/117
Testing EFrom testFileTwoHere is the comparison between opening up the files as .wavs and converting to spectrograms andsaving as .txt files.
Can see that the .wav's are, strangely, both processed faster and stored in smaller files.
Twav =0.0046s Ttxt =0.0155sSwav =8.0350kB Stxt =67.1000kB
Interesting to note that the txt is proportional, but wav is not.
48
1 1.5 2 2.5 3 3.5 4
0
0.005
0.01
0.015
0.02
0.025
t ( i n s )
Trial number
.wav in red, .txt in blue
1 1.5 2 2.5 3 3.5 40
20
40
60
80
100
120
s i z e ( i n k B )
Trial number
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 58/117
Testing FFrom testFileTen.Very similar to test file two. Going to get timings for various bits in relation to the size of the wav.
49
2 4 6 8 10 12 140
1
2
3
4x 10
-4 Times of nomalizer.m vs Size
Size in kB
t ( i n s )
2 4 6 8 10 12 140
1
2
3
4x 10
-4 Times of usefullSig.m vs Size
Size in kB
t ( i n
s )
2 4 6 8 10 12 140
0.005
0.01
0.015
0.02Times of capAnal.m vs Size
Size in kB
t ( i n
s )
2 4 6 8 10 12 140
0.5
1
1.5x 10
-3 Times of hamWindow.m vs Size
Size in kB
t ( i n s )
2 4 6 8 10 12 140
1
2
3x 10
-3 Times of specCreat.m vs Size
Size in kB
t ( i n s )
1 1.5 2 2.5 3 3.5 40
1
2
3
4Times of recorder.m for Four trials
Trial
t ( i n s )
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 59/117
Testing GFrom testFileEleven
Looking at the matchMat and DTW timing for different sized wavs.
50
2 4 6 8 10 12 140
1
2
3
4
5
6x 10
-3 Times of matchMat.m vs Size
Size in kB
t ( i n s )
2 4 6 8 10 12 140
0.002
0.004
0.006
0.008
0.01Times of DTW.m vs Size
Size in kB
t ( i n
s )
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 60/117
Testing HFrom testFileTwelve
Timing speechRec.m for various sizes of arrays.
Note: sample 6 of rSpec was removed as it was an extremely large size and made other results hard toread.
51
0 2 4 6 8 10 12 14 16 18 200
0.5
1
1.5x 10
-3 Ratio between time and size (in s/ArraySize)
0 2 4 6 8 10 12 14 16 18 200
2
4
6
8
Time of speechRec.m in blue for trials (in s)
0 2 4 6 8 10 12 14 16 18 200
1
2
3x 10
4 Size of audio in files in red (in Array Size)
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 61/117
tSpec4.72751166698561 5.50082608264458 5.53807886197827 5.020779666130755.79221702758311 3.87255475207044 5.82172762180668 5.928604892521264.43773356669633 5.57504354603728 5.82292959021328 4.652102851060685.06504150667194 4.07166958370407 5.85625526428638 6.101567308135535.79194227199267 5.74865583474995 5.91190165230497 3.85032357464426
sSpec10454 17240 18265 12890 20603320 20223 20626 8183 1801820620 10064 13402 5017 2059620603 19634 19305 18471 3005
rSpec0.000452220362252306 0.000319073438668479 0.0003032071646306200.000389509671538460 0.000281134641925114 0.01210173360022010.000287876557474494 0.000287433573767151 0.0005423113242938200.000309415226220295 0.000282392317663108 0.0004622518731181120.000377931764413665 0.000811574563225847 0.0002843394476736450.000296149459211548 0.000294995531832162 0.0002977806700207170.000320063973380162 0.00128130568207796
52
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 62/117
Testing IFrom testFileThirteen
This is comparing the c values of the three a's in the library currently.
This is for DTW multiplying by area. Ideally, when we have a perfect match, the results should be thesame no matter the size of the array.
53
1500 2000 2500 3000 3500 4000 45000
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5 x 10
-13
c v a l u e r e t u r n e d f o r p e r f e c t m a t c h
size of array
c value returned for letter "1"
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 63/117
Here is the result for dividing by diagonal, and dividing by area respectively.
54
1500 2000 2500 3000 3500 4000 45000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1x 10
-16
c v a l u e r e t u r n e d f o r p e r f e c t m a t c h
size of array
c value returned for letter "1"
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 64/117
55
1500 2000 2500 3000 3500 4000 45000
0.2
0.4
0.6
0.8
1
1.2
1.4x 10
-17
c v a l u e r e t u r n e d f o r p e r f e c t m a t c h
size of array
c value returned for letter "1"
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 65/117
And here is with no factoring due to size:
56
1500 2000 2500 3000 3500 4000 45000
0.5
1
1.5
2
2.5
3x 10
-15
c v a l u e r e t u r n e d f o r p e r f e c t m a t c h
size of array
c value returned for letter "1"
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 66/117
Testing JFrom testFileFourteen
Getting c's across a certain numLibSam.
This is with DTW having no factoring:
MAJOR PROBLEM:A PERFECT MATCH SHOULD BE A PERFECT MATCH SHOULD BE A PERFECT MATCH.
57
1000 2000 3000 4000 5000 6000 7000
0
1
2
3
4
5
6
7
8x 10
-15
c v a l u e r e t u r n e d f o
r p e r f e c t m a t c h
size of array
c value returned for sample "2"
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 67/117
Here's divided by area, multiplied by area, divided by diagonal:
58
0 1000 2000 3000 4000 5000 6000 70000
1
2
3
4
5
6
7
8
9x 10
-17
c v a l u e r e t u r n e d f o r p e
r f e c t m a t c h
size of array
c value returned for sample "2"
0 1000 2000 3000 4000 5000 6000 70000
0.2
0.4
0.6
0.8
1
1.2x 10
-11
c v a l u e r e t u r n e d f o r p e r f e c t m a t c h
size of array
c value returned for sample "2"
0 1000 2000 3000 4000 5000 6000 70000
0.5
1
1.5
2
2.5
3
3.5x 10
-16
c v a l u e r e t u r n e d f o r p e r f e c t m a t c h
size of array
c value returned for sample "2"
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 68/117
Testing KFrom testFileFifteen
Want to see if the size of the spectrograms is affecting the c. Not sure if there's a difference betweenarray size and spectrogram size.
This is for no factoring.
59
0 2000 4000 6000 8000 10000 12000 14000
0
2
4
6
8x 10
-15
c v a l u e r e t u r n e d f o r p e r f e c t m a t c h
area of spec
c value returned for sample "2"
0 500 1000 1500 2000 25000
2
4
6
8x 10
-15
c
v a l u e r e t u r n e d f o r p e r f e c t m a t c h
area of match matrix
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 69/117
WONDERING IF SIZE OF SPECTROGRAM HAS TO DO WITH IT.
IF I CAN STANDARDIZE THESE, WILL IT IMPROVE?
Changed specCreate.m to have X=specgram(x);Result:
Note: This will cause an error in matchMat.m as dimensions will no longer agree.
Here, all the match matrices are either 7*7 or 8*8.
60
0 1000 2000 3000 4000 5000 6000 7000 8000 90000
0.2
0.4
0.6
0.8
1x 10
-14
c v a l u e r e t u r
n e d f o r p e r f e c t m a t c h
area of spec
c value returned for sample "2"
0 10 20 30 40 50 60 700
0.2
0.4
0.6
0.8
1x 10
-14
c v a l u e r e t u r n e d f o r p e r f e c t m a t c h
area of match matrix
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 70/117
Testing LFrom testFileSixteen
With division:
No division:
61
0 5 10 15 20 25 30-9
-8
-7
-6
-5
-4
-3
-2
-1
0
1x 10
-17
library entry
c
v a l u e
c values for library entries
0 5 10 15 20 25 30-8
-6
-4
-2
0
2
4
6
8 x 10
-15
library entry
c v a l u e
c values for library entries
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 71/117
In respect to size, no division:
Putting division back in:
62
5 10 15 20 25 30 35 40 45 50
-0.5
0
0.5
x 10-14
library entry size
c v a l u e
c values for library entries
5 10 15 20 25 30 35 40 45 50-10
-8
-6
-4
-2
0
2x 10
-17
library entry
c v a l u e
c values for library entries
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 72/117
Crazy thought!Do p q trace through. Take these and divide by diagonal.
With these changes, results for c by size are:
63
5 10 15 20 25 30 35 40 45 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
library entry
c
v a l u e
c values for library entries
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 73/117
Here I'm going to go through the tests of the previous testing data with the new DTW.
Testing A
Woops. Not a success after all.. Why does it seem to be working in testFileSixteen but not other testfiles?
Creating testFileSeventeen to mimic testFileSixteen except that will use DTW.m.
Here's with all perfect matches (x axis is library entry size):
64
5 10 15 20 25 30 350
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
library entry
c v a l u e
c values for library entries
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 74/117
Now to test all against 'a':
Perfect match is about 0.7071.
Going to modify speechRec so that it is closest to 0.7071 rather than lowest which is match.
65
0 5 10 15 20 25 300
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
library entry
c v a l u e
c values for library entries
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 75/117
Results of testFileFive post modification.
Doesn't make sense. A constant c value? And all of them are 6.781186547621942e-06?
Something's not working the way I think it's working.
Why do I have outs=0? That means r is never changing, ie we never get abs(DTW(M)-cp)<cmin.
But that should be impossible as I've determined that a perfect match produces a value of 0.7071, andI'm guaranteed to get at least one perfect match, producing a c~0.
Code must not work like I'm thinking it does.
Mistake in the code caused some problems: had it as 1:num instead of 0:num!
Running through code, problem becomes apparent. 6.781186547510920e-06 is the smallest numberMatLab can produce! As a result, multiple returns are all giving this values back to me.
66
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100Results (100=match)
0 20 40 60 80 1000
1
2
3
4
5
6
7x 10
-6 Value of C variable for match
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 76/117
Okay, getting shit results. Not much to do about that now.
New day, new idea.
By taking length of q or p, am I not getting the number of steps? I believe so. This make results closerthen they actually are.
Nope, this is good.
Testing BFrom testFileTwenty.
Testing if the code I've written to pop out of the back trace early actually has an effect on speed.
Withideal1=ideal+m/6;
Results: t0 =37.1087 t1 = 37.2153t0 =37.0801 t1 = 38.6007t0 =37.1170 t1 = 37.4828
Result: As suspected, the addition of code to catch out of bounds appears, over an extreme sample, lessefficient than not having the code.
Testing CFrom testFileTwentyOne
Testing the results when first do a perfect match (x=1), then a known broken (x=2).Graphical results of a five of trials below. Appears as if statistical noise.
67
0 20 40 60 80 1000
20
40
60
80
100Results (100=match)
0 20 40 60 80 1006.7812
6.7812
6.7812
6.7812
6.7812x 10
-6 Value of C variable for match
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 77/117
Conclusion: the increased chance of having done something wrong is not worth the negligible benefit.
Testing D
From testFileTwelve
Showing c values of 'b' matches.
68
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 20
0.5
1
1.5
2x 10
-3 DTW in blue, DTWTHREE in red
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 78/117
Testing Successful Recognition
Letter 1 2 3 4 5 6 7 8 9 10 %
69
0 5 10 15 20 25 300
5000
10000
library entry
c v a l u e
c values for library entries
0 5 10 15 20 25 300
10
20
30
40
library entry
S i z e o f M a t c h
M a t r i x
size of match matric for library entries
0 5 10 15 20 25 30 350
5000
10000
Size of Match Matrix
c v a l u e
c values for Size of Match Matrix
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 79/117
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 80/117
Appendix C: Code of Software ElementsNote: In online copy only. For the physical copy, these have been removed. If one wishes to view thiscode, they may contact the writer for information into such.%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% To be used in the
% Electrical and Computer Engineering Project
% Submitted in partial fulfillment of the requirements for the degree% of Bachelor of Engineering at McMaster Universtiy
%
% To be used in conjunction with projects of
% Chris Agam & Jon Hernandez
%
% File: libCreate.m
% Author: Brett A. Lindsay 0648981
% Required Files:
%
% Function: cepAnal.m will perform cepstral analysis on the input
% signal in order to remove the effects of the speakers
% vocal tract.
%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function out = cepAnal(x)
c=cceps(x);
pass=int16(length(c)/6);
mask=ones(length(c),1);
mask(pass:length(c)-pass,1)=mask(pass:length(c)-pass,1)-1;
c=c.*mask;
x=icceps(c);
out = x;
return
71
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 81/117
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% To be used in the
% Electrical and Computer Engineering Project
% Submitted in partial fulfillment of the requirements for the degree
% of Bachelor of Engineering at McMaster Universtiy
%
% To be used in conjunction with projects of
% Chris Agam & Jon Hernandez%
% File: DTW.m
% Author: Brett A. Lindsay 0648981
% Required Files:
%
% Function: DTW.m returns the value of the quickest path through the
% local match matrix of two audio signals (M), normalised.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function out = DTW(M)
M=1-M; %need to find lowest path.
[m,n]=size(M);
D=zeros(m+1,n+1); %create matrix to trace through.
D(1,:) = NaN;
D(:,1) = NaN;
D(1,1)=0;
D(2:m+1,2:n+1)=M;
phi=zeros(m,n);
for i=1:m
for j=1:n
[dmax,tb]=min([D(i,j),D(i,j+1),D(i+1,j)]);
D(i+1,j+1)=D(i+1,j+1)+dmax;
phi(m,n)=tb; end
end
% figure,imagesc(D),colormap(gray);
i=m;j=n;p=m;q=n;
% out=0;
while i>1 && j>1
tb=phi(i,j);
if tb==1
i=i-1;
j=j-1;
elseif tb==2i=i-1;
elseif tb==3
j=j-1;
else
break;
end
p=[i,p];
q=[j,q];
72
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 82/117
% out=out+1;
end
%portion for returning trace value.
out=0;
if (p(1,1)>1)
out=p(1,1);
elseout=q(1,1);
end
out=(out+length(p)-1)*10000;
% D=D(2:m+1,2:n+1);
% out=D(size(D,1),size(D,2));
out=out/sqrt(m^2+n^2); %divide by diagonal so that all answers are equally
weighted.
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% To be used in the
% Electrical and Computer Engineering Project
% Submitted in partial fulfillment of the requirements for the degree
% of Bachelor of Engineering at McMaster Universtiy
%
% To be used in conjunction with projects of
% Chris Agam & Jon Hernandez
%
% File: DTW.m
% Author: Brett A. Lindsay 0648981
% Required Files:
%
% Function: .%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function out = DTWORIGINAL(M)
M=1-M; %need to find lowest path.
[m,n]=size(M);
73
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 83/117
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 84/117
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Attempting to create a faster working DTW by capping the trace through.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function out = DTWTHREE(M)
M=1-M; %need to find lowest path.
[m,n]=size(M);
D=zeros(m+1,n+1); %create matrix to trace through.
D(1,:) = NaN;
D(:,1) = NaN;
D(1,1)=0;
D(2:m+1,2:n+1)=M;
phi=zeros(m,n);
for i=1:m
for j=1:n[dmax,tb]=min([D(i,j),D(i,j+1),D(i+1,j)]);
D(i+1,j+1)=D(i+1,j+1)+dmax;
phi(i,j)=tb;
end
end
% figure,imagesc(D),colormap(gray);
i=m;j=n;p=m;q=n;
while i>1 && j>1
tb=phi(i,j);
if tb==1
i=i-1;
j=j-1;
elseif tb==2
i=i-1;
elseif tb==3
j=j-1;
else
break;
end
p=[i,p];
q=[j,q];
%Breaking code.
%p is verticle, q is horizontal of trace.
%Idea: %Take the verticle size of the matrix ie 14
%third it=~4
%if the verticle value is greater or less than this distance from the
% ideal p, then it's too far out.
% ideal p=q*tan(atan(m/n));
%ie if at q=10, if p<3 or p>17, it's a poor match.
ideal=q(1,1)*tan(atan(m/n));
75
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 85/117
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 86/117
% Author: Brett A. Lindsay 0648981
% Required Files: speechRec.m
%
% Function: hamWindow.m will apply a hamming window to the signal,
% before returning the data.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function out = hamWindow(x)
w=window(@hamming,length(x));
x=x.*w;
out = x;
return
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% To be used in the
% Electrical and Computer Engineering Project
% Submitted in partial fulfillment of the requirements for the degree
% of Bachelor of Engineering at McMaster Universtiy
%
% To be used in conjunction with projects of
% Chris Agam & Jon Hernandez%
% File: libCreate.m
% Author: Brett A. Lindsay 0648981
% Required Files: recorder.m
% preEmphasis.m
% normalizer.m
% hamWindow.m
% usefullSig.m
% specCreate.m
% capAnal.m
%
% Function: libCreate.m will be used to input audio signals into a
% reference sound library for use with the speech% recognition block of the project.
% Assumes fs=8192 Hz.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function libCreate()
fs=8192;
77
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 87/117
audioIn = recorder(); %get audio signal
%!!!!!!
%Need to look into effectiveness of preEmphasis network.
% Seems pointless for descrete SR.
%audioIn = preEmphasis(audioIn); %pass through pre-emphasis network
audioIn = normalizer(audioIn);
audioIn = usefullSig(audioIn);
%sound(audioIn,fs);
audioIn = hamWindow(audioIn);
%This may need more work.
audioIn = cepAnal(audioIn);
%Testing showed it was better to save these as .wav's and convert them to
% spectrograms when needed.
%audioIn=specCreate(audioIn);
%dlmwrite(['Library/test' libNum '.txt'], audioIn, 'delimiter',
% ...'\t','precision', 4);
libNum=input('Please input number to be associated with file (ie 10-999).','s');
%String.
%wavwrite(audioIn,fs,['Library/test' libNum '.wav']);
wavwrite(audioIn,fs,['Library/lib' libNum '.wav']);
%wavwrite(audioIn,fs,['Library/setUp' libNum '.wav'])
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% To be used in the
% Electrical and Computer Engineering Project
% Submitted in partial fulfillment of the requirements for the degree
% of Bachelor of Engineering at McMaster Universtiy
%
% To be used in conjunction with projects of
% Chris Agam & Jon Hernandez
%
% File: depSetUp.m
% Author: Brett A. Lindsay 0648981
% Required Files:
%
% Function: matchMat.m creates the "local match matrix".
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
78
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 88/117
function out = matchMat(A,B)
A=abs(A); B=abs(B); %Need absolute values
%Calculates the cos of the angle between two vectors of each point in the
% matrix
%Find the average (RMS) value of the matrix, so that later when the A and
% B matrixes are multiplied, we can somewhat normalise them back to% reasonable levels.
sA= sqrt(sum(A.^2));
sB = sqrt(sum(B.^2));
Mat = (A'*B)./(sA'*sB);
out = Mat;
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% To be used in the
% Electrical and Computer Engineering Project
% Submitted in partial fulfillment of the requirements for the degree
% of Bachelor of Engineering at McMaster Universtiy
%
% To be used in conjunction with projects of% Chris Agam & Jon Hernandez
%
% File: normalizer.m
% Author: Brett A. Lindsay 0648981
% Required Files: speechRec.m
%
% Function: normalizer.m will normalize the data to 0.5, then pass it
% back.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function out = normalizer(x)
x = 0.5*x/max(abs(x));
out = x;
return
79
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 89/117
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% To be used in the
% Electrical and Computer Engineering Project
% Submitted in partial fulfillment of the requirements for the degree
% of Bachelor of Engineering at McMaster Universtiy
%
% To be used in conjunction with projects of
% Chris Agam & Jon Hernandez
%
% File: recorder.m
% Author: Brett A. Lindsay 0648981
% Required Files: speechRec.m
%
%% Function: recorder.m will return a 3 second audio signal.
%
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [out fs]= recorder()
fs = 8192; %in Hz, default sampling frequency for sound(), etc.
t=3; %in s, number of seconds to record for.
ai_length = t*fs;
% Set up MatLab Oscilloscope / Winsound Analoginput
ai = analoginput('winsound');
addchannel(ai, 1);
set(ai, 'SampleRate', fs);
set(ai, 'TriggerType', 'manual');
set(ai, 'TriggerRepeat', 0);
set(ai, 'SamplesPerTrigger', ai_length);
%Look into changing this from a manual trigger to a rising edge:
%set(ai, 'TriggerType', 'software');
%set(ai, 'TriggerCondition', 'Rising');
%set(ai, 'TriggerConditionValue', 0.01);
%set(ai, 'TriggerChannel', ai.Channel(1));
%set(ai, 'TriggerDelay', -0.1);
%set(ai, 'TriggerDelayUnits', 'seconds'); %set(ai, 'TimeOut', 10);
% Get data from the microphone
beep on;
beep;
start(ai);
trigger(ai);
data = getdata(ai);
80
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 90/117
beep;
delete(ai);
out = data; %return the audio input.
return
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% To be used in the
% Electrical and Computer Engineering Project
% Submitted in partial fulfillment of the requirements for the degree
% of Bachelor of Engineering at McMaster Universtiy
%
% To be used in conjunction with projects of
% Chris Agam & Jon Hernandez
%
% File: specCreate.m
% Author: Brett A. Lindsay 0648981
% Required Files: speechRec.m
%
% Function: specCreate.m will create a spectogram out of the input
% audio signal x.% Assumes x is a (length,1) vector and fs=8192 Hz.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function out = specCreate(x,fs)
%var=min(256,length(x));
%S = specgram(a,nfft,fs,window,numoverlap)
%x is the signal;
%window is the window WIDTH. ->use 512.
%noverlap = length of the window/2
%nfft=min(256,length(a)) is the default, seems good.%fs is assumed to be 8192 Hz.
%X = specgram(x,var,fs,var,var/2);
%Simpler form has max of 8 time windowing periods, not very accurate for
%DTW (?):
% X=spectrogram(x);
81
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 91/117
X = specgram(x,512,fs,512,384);
out = X;
return
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% To be used in the% Electrical and Computer Engineering Project
% Submitted in partial fulfillment of the requirements for the degree
% of Bachelor of Engineering at McMaster Universtiy
%
% To be used in conjunction with projects of
% Chris Agam & Jon Hernandez
%
% File: speechRec.m
% Author: Brett A. Lindsay 0648981
% Required Files: library
% recorder.m
% preEmphasis.m
% normalizer.m% hamWindow.m
% usefullSig.m
% specCreate.m
% capAnal.m
% matchMat.m
% DTW.m
%
% Function: speechRec.m will be called by Jon Hernandez's c# program,
% which will pass in a character to be checked.
% speechRec.m will signal the user for audio input (ie.
% their answer/command), record this, process this, and test
% against a library using the method of Dynamic Time Warping.
% speechRec.m will then return information about the
% character being tested or if a command was entered.%
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function out = speechRec(in)
82
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 92/117
[audioIn fs]= recorder(); %get audio signal
%!!!!!!
%Need to look into effectiveness of preEmphasis network.
% Seems pointless for descrete SR.
%audioIn = preEmphasis(audioIn); %pass through pre-emphasis network
audioIn = normalizer(audioIn);
audioIn = usefullSig(audioIn);
audioIn = cepAnal(audioIn);audioIn = hamWindow(audioIn);
%[audioIn,fs] = wavread(['Library/lib' int2str(in) int2str(1) '.wav']);
audioIn=specCreate(audioIn,fs);
%Comparison loop.
numLibEnt=30; %number of library entries, 1-30
%1-26 being alphabet, 27-30 being enter, yes, no, back.
numLibSam=2; %number of library samples (ie. 0 to 5 entries of 'a').
cmin=500; %minimum comparison excepted.
c=cmin;
ctemp=0; %#ok<NASGU>
cp=0.7071*10000; %From experimental data, if the DTW block produces a value of
%0.7071 then this is a perfect match. This value is normalised
%for any size difference, et cetera.
r=0; %r is the variable for which is the current lowest match c.
%if r stays as 0, we therefore never achieved a c lower than
%min and don't have a match.
for m=1:numLibEnt
for n=0:numLibSam
[x fs]=wavread(['Library/lib' int2str(m) int2str(n) '.wav']);
Y=specCreate(x,fs);
M=matchMat(audioIn,Y);ctemp=abs(DTW(M)-cp);
if (ctemp<c)
c=ctemp;
r=m;
end
end
end
%returning block.
% 1-30 - incorrect character (1-30).
% 50 - no satisfactory match.
% 100 - correct character.
if r==0out=50;
elseif r==in;
out=100;
else
out=r;
end
return
83
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 93/117
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% To be used in the
% Electrical and Computer Engineering Project
% Submitted in partial fulfillment of the requirements for the degree
% of Bachelor of Engineering at McMaster Universtiy
%% To be used in conjunction with projects of
% Chris Agam & Jon Hernandez
%
% File: usefullSig.m
% Author: Brett A. Lindsay 0648981
% Required Files: speechRec.m
%
% Function: usefullSig.m will clip out the parts of the signal which
% are considered useful (ie. it will remove the beginning
% and end, before and after user has spoken).
% Assumes x is a (length,1) vector.
% Sensitivity (z) should be - if soft spoken, + if loud.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function out = usefullSig(x)
%sensitivity.
z=-1;
if z<0
s=0.4;
elseif z>0
s=0.2;
else
s=0.3;
end
%Note: Changing Threshold will dramatically change matching ability
% Would like to have more adaptive thresh.
thresh=s*max(x); %s% of maximum.
l=length(x);
%f's for os of 10,20,50,100 respectively, with length of ~24k
%f=0.0004069;
%f=0.0008138;
%f=0.002;
f=0.0041;
os=floor((1+s)*f*l); %offset.
a=0; %The lower bound.
as=1;b=0;
bs=1;
for i=1:l
if (as && abs(x(i,1))>thresh)
a=i-os;
as=0;
end
84
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 94/117
if (bs && abs(x(l-i,1))>thresh)
b=l-i+os;
bs=0;
end
end
%Without these, there is the potential to go outside the vector bounds.
if (a-os)<1
a=1;end
if (b+os)>l
b=l;
end
% Trying to solve w problem:
% out(length(x),1)=0;
% out(a:b,1) = x(a:b,1);
out = x(a:b,1);
return
85
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 95/117
TEST FILE 1
clc;clear;close all;
fs=8192;
t=0:1/fs:3-1/fs;
audioIn=recorder();
figure,plot(t,audioIn);
sound(audioIn,fs)
%audioIn = preEmphasis(audioIn);
%figure,plot(audioIn);
audioIn = normalizer(audioIn);
figure,plot(t,audioIn);
sound(audioIn,fs)
audioIn = usefullSig(audioIn);
figure,plot(audioIn);
sound(audioIn,fs)
audioIn=cepAnal(audioIn);
figure,plot(audioIn)
sound(audioIn,fs);
audioIn = hamWindow(audioIn);
figure,plot(audioIn);
sound(audioIn,fs)
pause(1);
var=min(256,length(audioIn));
figure, specgram(audioIn,512,fs,512,384);
%figure, specgram(B,var,fs,var,var/2);
%figure, specgram(C,512,fs,512,384);
pause(1);close all;
TEST FILE 2
% This code is used to test the time it takes to open and covert a .wav file
% vs stroing the data as spectrograms in a .txt file and reading them directly.
clc;clear;close all;
86
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 96/117
t=1:4;
t1=zeros(1,4);
t2=zeros(1,4);
T1=zeros(1,4);
T2=zeros(1,4);
stxt=[51.3 111 89.8 16.3];
swav=[6.15 13.4 10.2 2.39];
Stxt=zeros(1,4);Swav=zeros(1,4);
for i=1:4
tic;
C=dlmread(['Library/test00' int2str(i) '.txt']);
t2(1,i)=toc;
end
for i=1:4
tic;
[audioIn fs]=wavread(['Library/test00' int2str(i) '.wav']);
C=specCreate(audioIn,fs);
t1(1,i)=toc;
end
T1(1,:)=sum(t1)/length(t1) %#ok<NOPTS>
T2(1,:)=sum(t2)/length(t2) %#ok<NOPTS>
Swav(1,:)=sum(swav)/length(swav) %#ok<NOPTS>
Stxt(1,:)=sum(stxt)/length(stxt) %#ok<NOPTS>
figure(1), subplot(2,1,1),plot(t,t1,'r',t,t2,'b',t,T1,'--r',t,T2,'--b'),ylabel('t
(in s)'),xlabel('Trial number'),title('.wav in red, .txt in blue');
subplot(2,1,2),plot(t,stxt,'b',t,swav,'r',t,Stxt,'--b',t,Swav,'--r'),ylabel('size
(in kB)'),xlabel('Trial number');
TEST FILE 3
clc;clear;close all;
fs=8192;
A=wavread('Library/lib271.wav');
var=min(256,length(A));
A=specgram(A,var,fs,var,var/2);
B=wavread('Library/lib272.wav');
var=min(256,length(B));
B=specgram(B,var,fs,var,var/2);
87
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 97/117
M=matchMat(A,B);
min=DTW(M);
TEST FILE 4
clc;clear;close all;
a=1;tic;
out=speechRec(a);
toc;
TEST FILE 5
% The purpose of this test is to see if the algorithm can recognise files
% already in the system.
%
%If it can't, then the algorithm is fundamentally broken.
%
%The results will be stored in an array.
%
%Hopefully, it is all 100s.
%NOTE: Update, successful test.
numLibEnt=30; %number of library entries, 1-30
%1-26 being alphabet, 27-30 being enter, yes, no, back.
numLibSam=2; %number of library samples (ie. 0 to 5 entries of 'a').
out=0;in=0; %predefining.
88
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 98/117
outArray=zeros(1,numLibEnt*(1+numLibSam));
outArrayCount=1;
cArray=zeros(1,numLibEnt*(1+numLibSam));
cArrayCount=1;
for a=1:numLibEnt
for b=0:numLibSam
%Read in each file.[x fs]=wavread(['Library/lib' int2str(a) int2str(b) '.wav']);
X=specCreate(x,fs);
%comparison loops.
cmin=1; %minimum comparison excepted.
c=cmin;
ctemp=0; %#ok<NASGU>
cp=0.7071; %From experimental data, if the DTW block produces a value of
%0.7071 then this is a perfect match. This value is normalised
%for any size difference, et cetera.
r=0; %r is the variable for which is the current lowest match c.
%if r stays as 0, we therefore never achieved a c lower than
%min and don't have a match.
for m=1:numLibEnt
for n=0:numLibSam
[x fs]=wavread(['Library/lib' int2str(m) int2str(n) '.wav']);
Y=specCreate(x,fs);
M=matchMat(X,Y);
ctemp=abs(DTW(M)-cp);
if (ctemp<c)
c=ctemp;
r=m;
end
end
end
cArray(1,cArrayCount)=c;cArrayCount=cArrayCount+1;
%returning block.
% 1-30 - incorrect character (1-30).
% 50 - no satisfactory match.
% 100 - correct character.
in=a;%want to see if the found r is a.
if r==0
out=50;
elseif r==in;
out=100;
else
out=r; end
outArray(1,outArrayCount)=out;
outArrayCount=outArrayCount+1;
end
end
89
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 99/117
figure(1),
subplot(1,2,1),plot(outArray),title('Results (100=match)');
subplot(1,2,2),plot(cArray),title('Value of C variable for match');
TEST FILE 6
% This file is created to run through all the files in the library and listen
% to them, for personal understanding.
numLibEnt=30; %number of library entries, 1-30 %1-26 being alphabet, 27-30 being enter, yes, no, back.
numLibSam=2; %number of library samples (ie. 0 to 5 entries of 'a').
out=0;in=0; %predefining.
outArray=zeros(1,numLibEnt*(1+numLibSam));
outArrayCount=1;
for a=1:numLibEnt
for b=0:numLibSam
%Read in each file.
[x fs]=wavread(['Library/lib' int2str(a) int2str(b) '.wav']);
sound(x,fs);
end
end
TEST FILE 7
% This test is to take a look at what a c values a letter will generate over
% the whole testing algorithm.
numLibEnt=30; %number of library entries, 1-30
90
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 100/117
%1-26 being alphabet, 27-30 being enter, yes, no, back.
numLibSam=2; %number of library samples (ie. 0 to 5 entries of 'a').
cArray=zeros(1,numLibEnt*(1+numLibSam));
cArrayCount=1;
%File to test (w=23)
in=1;
[x fs]=wavread(['Library/lib' int2str(in) '2.wav']);
X=specCreate(x,fs);
%comparison loops.
cmin=1; %minimum comparison excepted.
c=cmin;
ctemp=0; %#ok<NASGU>
r=0; %r is the variable for which is the current lowest match c.
%if r stays as 0, we therefore never achieved a c lower than
%min and don't have a match.
for m=1:numLibEnt
for n=0:numLibSam
[y fs]=wavread(['Library/lib' int2str(m) int2str(n) '.wav']);
Y=specCreate(y,fs);
M=matchMat(X,Y);
ctemp=DTW(M);
%Added code here:
cArray(1,cArrayCount)=ctemp;
cArrayCount=cArrayCount+1;
if ctemp<c
c=ctemp;
r=m;
end
end
end
%returning block.
% 1-30 - incorrect character (1-30).
% 50 - no satisfactory match.
% 100 - correct character.
if r==0
out=50;
elseif r==in;
out=100;
else
out=r;
end
figure(1),plot(cArray),title(['Value of C variable for ' int2str(in)]);
91
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 101/117
TEST FILE 8
% Here I want to see the correspondance between the size of the audio file
% and the c values returned.
numLibEnt=30; %number of library entries, 1-30
%1-26 being alphabet, 27-30 being enter, yes, no, back.
numLibSam=2; %number of library samples (ie. 0 to 5 entries of 'a').
cArray=zeros(1,numLibEnt*(1+numLibSam));
cArrayCount=1;
sArray=zeros(1,numLibEnt*(1+numLibSam));
sArrayCount=1;
%File to test
in=1;
[x fs]=wavread(['Library/lib' int2str(in) '2.wav']);
X=specCreate(x,fs);
%comparison loops.
cmin=1; %minimum comparison excepted.
c=cmin;
ctemp=0; %#ok<NASGU>
r=0; %r is the variable for which is the current lowest match c. %if r stays as 0, we therefore never achieved a c lower than
%min and don't have a match.
for m=1:numLibEnt
for n=0:numLibSam
[y fs]=wavread(['Library/lib' int2str(m) int2str(n) '.wav']);
Y=specCreate(y,fs);
M=matchMat(X,Y);
ctemp=DTW(M);
%Added code here:
cArray(1,cArrayCount)=ctemp;
cArrayCount=cArrayCount+1;
sArray(1,sArrayCount)=size(y,1);
sArrayCount=sArrayCount+1;
if ctemp<c
c=ctemp;
r=m;
end
end
end
92
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 102/117
%returning block.
% 1-30 - incorrect character (1-30).
% 50 - no satisfactory match.
% 100 - correct character.
if r==0
out=50;
elseif r==in;
out=100;
elseout=r;
end
% figure(1),subplot(1,2,1),plot(cArray,'r'),title(['Value of C (red) v Size(b) for
' int2str(in)]);
% subplot(1,2,2),plot(sArray,'b');
figure(1),plot(cArray,'r'),title(['Value of C (red) v Size(b) for ' int2str(in)]);
hold on;plot(sArray,'b');hold off;
TEST FILE 9
% Here, this is going to be a test for how much time a certain number of
% iterations of the library will take (ie if the library has x samples)
% in regards to the comparison block.
numLibEnt=30; %number of library entries, 1-30
%1-26 being alphabet, 27-30 being enter, yes, no, back.
numLibSam=2; %number of library samples (ie. 0 to 5 entries of 'a').
mod=20; %number of times to modify the run through.
timeArray=zeros(1,mod);
%File to test
in=1;
[x fs]=wavread(['Library/lib' int2str(in) '2.wav']);
X=specCreate(x,fs);
cmin=1; %minimum comparison excepted.
c=cmin;
ctemp=0; %#ok<NASGU>
r=0; %r is the variable for which is the current lowest match c.
%if r stays as 0, we therefore never achieved a c lower than
93
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 103/117
%min and don't have a match.
for m=1:numLibEnt
for l=1:mod
tic;
for n=0:numLibSam
[y fs]=wavread(['Library/lib' int2str(m) int2str(n) '.wav']);
Y=specCreate(y,fs);
M=matchMat(X,Y);ctemp=DTW(M);
if ctemp<c
c=ctemp;
r=m;
end
end
t=toc;
t=t/3; %going to take an average for better results.
if l==1
timeArray(1,l)=t;
else
timeArray(1,l)=t+timeArray(1,l-1);
end
end
end
figure(1),plot(timeArray),title('Time to run through comparison vs number of
samples in library'),xlabel('Number of samples'),ylabel('t (in s)');
TEST FILE 10
% Doing timings for varying sizes of wav files.
clc;clear;close all;
swav=[6.15 13.4 10.2 2.39]; %sizes of test wav files.
numSam=4;
numRun=20;
tNorm(1,numSam)=0;
tUsef(1,numSam)=0;
tCepA(1,numSam)=0;
tHamW(1,numSam)=0;
tSpec(1,numSam)=0;
tReco(1,numSam)=0;
94
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 104/117
for i=1:numSam
[x fs]=wavread(['Library/test00' int2str(i) '.wav']);
t=0;
for m=1:numRun
tic;
xp=normalizer(x);
t=t+toc; end
t=t/numRun;
tNorm(1,i)=t;
figure(1),subplot(3,2,2),stem(swav,tNorm),title('Times of nomalizer.m vs
Size'),xlabel('Size in kB'),ylabel('t (in s)');
t=0;
for m=1:numRun
tic;
xp=usefullSig(x);
t=t+toc;
end
t=t/numRun;
tUsef(1,i)=t;
figure(1),subplot(3,2,3),stem(swav,tUsef),title('Times of usefullSig.m vs
Size'),xlabel('Size in kB'),ylabel('t (in s)');
t=0;
for m=1:numRun
tic;
xp=cepAnal(x);
t=t+toc;
end
t=t/numRun;
tCepA(1,i)=t;
figure(1),subplot(3,2,4),stem(swav,tCepA),title('Times of capAnal.m vs
Size'),xlabel('Size in kB'),ylabel('t (in s)');
t=0;
for m=1:numRun
tic;
xp=hamWindow(x);
t=t+toc;
end
t=t/numRun;
tHamW(1,i)=t;
figure(1),subplot(3,2,5),stem(swav,tHamW),title('Times of hamWindow.m vs
Size'),xlabel('Size in kB'),ylabel('t (in s)');
t=0; for m=1:numRun
tic;
xp=specCreate(x,fs);
t=t+toc;
end
t=t/numRun;
tSpec(1,i)=t;
figure(1),subplot(3,2,6),stem(swav,tSpec),title('Times of specCreat.m vs
95
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 105/117
Size'),xlabel('Size in kB'),ylabel('t (in s)');
t=0;
for m=1:numRun
tic;
xp=recorder();
t=t+toc;
end
t=t/numRun;tReco(1,i)=t;
figure(1),subplot(3,2,1),stem([1 2 3 4],tReco),title('Times of recorder.m for
Four trials'),xlabel('Trial'),ylabel('t (in s)');
end
TEST FILE 11
% Doing timing for matchMat and DTW for vaious sizes of wavs.
clc;clear;close all;
swav=[6.15 13.4 10.2 2.39]; %sizes of test wav files.
numSam=4;
numRun=20;
tMatc(1,numSam)=0;tDTW(1,numSam)=0;
for i=1:numSam
[x fs]=wavread(['Library/test00' int2str(i) '.wav']);
X=specCreate(x,fs);
t=0;
for m=1:numRun
tic;
XP=matchMat(X,X);
t=t+toc;
end
t=t/numRun;tMatc(1,i)=t;
figure(1),subplot(2,1,1),stem(swav,tMatc),title('Times of matchMat.m vs
Size'),xlabel('Size in kB'),ylabel('t (in s)');
t=0;
for m=1:numRun
tic;
a=DTW(XP);
96
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 106/117
t=t+toc;
end
t=t/numRun;
tDTW(1,i)=t;
figure(1),subplot(2,1,2),stem(swav,tDTW),title('Times of DTW.m vs
Size'),xlabel('Size in kB'),ylabel('t (in s)');
end
TEST FILE 12
% Doing timing for speechRec.mfunction testFileTwelve()
numTri=20; %number of trials.
t=0;
tSpec(1,numTri)=0;
sSpec(1,numTri)=0;
for m=1:numTri
t=0;
tic;
[out fileSize]=speechRecTest(5);
t=toc;
tSpec(1,m)=t;
sSpec(1,m)=fileSize;end
rSpec=tSpec./sSpec;
figure(1);
subplot(3,1,1),stem(tSpec,'r'),title('Time of speechRec.m in red for trials (in
s)');
subplot(3,1,2),stem(sSpec,'b'),title('Size of audio in files in blue (in Array
Size)');
subplot(3,1,3),stem(rSpec,'g'),title('Ratio between time and size (in
s/ArraySize)');
t=0;%So I can stop debugger.
end
function [out fileSize]=speechRecTest(in)
[audioIn fs]= recorder(); %get audio signal
97
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 107/117
%!!!!!!
%Need to look into effectiveness of preEmphasis network.
% Seems pointless for descrete SR.
%audioIn = preEmphasis(audioIn); %pass through pre-emphasis network
audioIn = normalizer(audioIn);
audioIn = usefullSig(audioIn);
%ADDED CODE HERE TO GET SIZE
fileSize=size(audioIn,1);
audioIn = hamWindow(audioIn);
audioIn = cepAnal(audioIn);
audioIn=specCreate(audioIn,fs);
%Comparison loop.
numLibEnt=30; %number of library entries, 1-30
%1-26 being alphabet, 27-30 being enter, yes, no, back.
numLibSam=2; %number of library samples (ie. 0 to 5 entries of 'a').
cmin=500; %minimum comparison excepted.
c=cmin;
ctemp=0; %#ok<NASGU>
r=0; %r is the variable for which is the current lowest match c.
%if r stays as 0, we therefore never achieved a c lower than
%min and don't have a match.
for m=1:numLibEnt
for n=0:numLibSam
[x fs]=wavread(['Library/lib' int2str(m) int2str(n) '.wav']);
Y=specCreate(x,fs);
M=matchMat(audioIn,Y);
ctemp=DTW(M);
if ctemp<c
c=ctemp;
r=m; end
end
end
%returning block.
% 1-30 - incorrect character (1-30).
% 50 - no satisfactory match.
% 100 - correct character.
if r==0
out=50;
elseif r==in;
out=100;
elseout=r;
end
end
98
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 108/117
TEST FILE 13
%This is comparing the c values of the three a's in the library currently.
letter=1; %which letter to compare.
numLibSam=2; %number of library samples (ie. 0 to 2 entries of 'a').
s(1,(1+numLibSam))=0;
c(1,(1+numLibSam))=0;
for m=0:numLibSam
[x fs]=wavread(['Library/lib' int2str(letter) int2str(m) '.wav']);
X=specCreate(x,fs);
s(1,m+1)=size(x,1);
M=matchMat(X,X);c(1,m+1)=DTW(M);
end
figure(1), stem(s,c),ylabel('c value returned for perfect match'),xlabel('size of
array'),
title(['c value returned for letter "' int2str(letter) '"']);
TEST FILE 14
% Testing values of c's across libSamples.
numLibEnt=30; %number of library entries, 1-30
%1-26 being alphabet, 27-30 being enter, yes, no, back.
numLibSam=2; %number of library samples (ie. 0 to 2 entries of 'a').
s(1,(numLibEnt))=0;
c(1,(numLibEnt))=0;
for m=1:numLibEnt
[x fs]=wavread(['Library/lib' int2str(m) int2str(numLibSam) '.wav']);
X=specCreate(x,fs);
s(1,m)=size(x,1);
M=matchMat(X,X);
99
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 109/117
c(1,m)=DTW(M);
end
figure(1), stem(s,c),ylabel('c value returned for perfect match'),xlabel('size of
array'),
title(['c value returned for sample "' int2str(numLibSam) '"']);
TEST FILE 15
% Testing values of c's across libSamples, now comparing with size of
% spectrograms. C's will be for perfect match.
numLibEnt=30; %number of library entries, 1-30
%1-26 being alphabet, 27-30 being enter, yes, no, back.
numLibSam=0; %number of library samples (ie. 0 to 2 entries of 'a').
aX(1,(numLibEnt))=0;
aM(1,(numLibEnt))=0;
c(1,(numLibEnt))=0;
for m=1:numLibEnt
[x fs]=wavread(['Library/lib' int2str(m) int2str(numLibSam) '.wav']);
X=specCreate(x,fs);
[mX,nX]=size(X);
aX(1,m)=mX*nX;
M=matchMat(X,X);
[mM,nM]=size(M);
aM(1,m)=mM*nM;
c(1,m)=DTW(M);
end
figure(1), subplot(2,1,1),stem(aX,c),ylabel('c value returned for perfect
match'),xlabel('area of spec'),
title(['c value returned for sample "' int2str(numLibSam) '"']);
subplot(2,1,2),stem(aM,c),ylabel('c value returned for perfect match'),xlabel('area
of match matrix'),
100
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 110/117
TEST FILE 16
%Testing bits of Dr. Ellis' code.
clc;clear;close all;
numLibEnt=30; %number of library entries, 1-30
%1-26 being alphabet, 27-30 being enter, yes, no, back.
numLibSam=0; %number of library samples (ie. 0 to 2 entries of 'a').
s(1,(numLibEnt))=0;
c(1,(numLibEnt))=0;
for m=1:numLibEnt
[d1,sr] = wavread(['Library/lib' int2str(1) int2str(numLibSam) '.wav']);
[d2,sr] = wavread(['Library/lib' int2str(m) int2str(numLibSam) '.wav']);
% Listen to them together:
ml = min(length(d1),length(d2));soundsc(d1(1:ml)+d2(1:ml),sr)
% or, in stereo
soundsc([d1(1:ml),d2(1:ml)],sr);
D1 = specgram(d1,512,sr,512,384);
D2 = specgram(d2,512,sr,512,384);
SM = matchMat(D1,D2);
figure(1)
subplot(121)
imagesc(SM)
colormap(1-gray)
[p q C cp]=DTWTWO(1-SM);
hold on; plot(q,p,'r'); hold off
subplot(122)
imagesc(C)
hold on; plot(q,p,'r'); hold off
% c(1,m)=C(size(C,1),size(C,2))/(size(C,1)*size(C,2));
c(1,m)=cp;
s(1,m)=size(C,1);
end
figure(2),stem(s,c),xlabel('library entry'),ylabel('c value'),title('c values for
library entries');
101
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 111/117
TEST FILE 17
%Testing my code in style of testFileSixteen
clc;clear;close all;
numLibEnt=30; %number of library entries, 1-30
%1-26 being alphabet, 27-30 being enter, yes, no, back.
numLibSam=0; %number of library samples (ie. 0 to 2 entries of 'a').
s(1,(numLibEnt))=0;
c(1,(numLibEnt))=0;
for m=1:numLibEnt
[d1,sr] = wavread(['Library/lib' int2str(2) int2str(numLibSam) '.wav']);
[d2,sr] = wavread(['Library/lib' int2str(m) int2str(numLibSam) '.wav']);
% % Listen to them together:
% ml = min(length(d1),length(d2));
% soundsc(d1,sr)
% % or, in stereo
% soundsc(d2,sr);
D1 = specgram(d1,512,sr,512,384);
D2 = specgram(d2,512,sr,512,384);
M = matchMat(D1,D2);
cp=DTW(M);
c(1,m)=cp;
s(1,m)=size(M,2);
end
e(1,30)=0for m=1:30
e(1,m)=7071;
end
figure(1),subplot(3,1,1),stem(e,'r'),
hold on, stem(c,'b'),xlabel('library entry'),ylabel('c value'),title('c values for
library entries');,hold off;
subplot(3,1,2),stem(s,'b'),xlabel('library entry'),ylabel('Size of Match
Matrix'),title('size of match matric for library entries');
subplot(3,1,3), stem(e,'r'),
hold on,stem(s,c,'b'),xlabel('Size of Match Matrix'),ylabel('c value'),title('c
values for Size of Match Matrix');,hold off;
102
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 112/117
TEST FILE 18
%For getting pictures.
clc;clear;close all;
[audioIn fs]= recorder();% sound(audioIn,fs)
figure(1),plot(audioIn),xlabel('Time Axis'),ylabel('Signal Magnitude'),title('Input
sound');
audioIn = normalizer(audioIn);
figure(2),subplot(2,2,1),plot(audioIn),xlabel('Time Axis'),ylabel('Signal
Magnitude'),title('Normalized sound');
audioIn = usefullSig(audioIn);
figure(2),subplot(2,2,2),plot(audioIn),xlabel('Time Axis'),ylabel('Signal
Magnitude'),title('Useful sound');
audioIn = cepAnal(audioIn);
figure(2),subplot(2,2,3),plot(audioIn),xlabel('Time Axis'),ylabel('Signal
Magnitude'),title('Post Cepstral Filtering');
audioIn = hamWindow(audioIn);figure(2),subplot(2,2,4),plot(audioIn),xlabel('Time Axis'),ylabel('Signal
Magnitude'),title('Window`d (hamming) sound');
sound(audioIn,fs)
close all;
[y fs]=wavread(['Library/lib31.wav']);
figure(3),subplot(3,1,1),specgram(y,512,fs,512,384),title('Spectrogram of Input
Sound `c`');
Y=specCreate(y,fs);
[x fs]=wavread(['Library/lib21.wav']);
subplot(3,1,2),specgram(x,512,fs,512,384),title('Spectrogram of Close Library Sound`b`');
X=specCreate(x,fs);
[z fs]=wavread(['Library/lib231.wav']);
subplot(3,1,3),specgram(z,512,fs,512,384),title('Spectrogram of Far Library Sound
`w`');
Z=specCreate(z,fs);
MP=matchMat(X,X);
figure(4),subplot(3,2,1),imagesc(MP),colormap(1-gray),title('Perfectly Matching
Input Specs');
[p q CP cp]=DTWTWO(1-MP);
subplot(3,2,2),imagesc(CP);hold on; plot(q,p,'r'); hold off;title('Match Matrix(left), Quickest Path (right)');
M=matchMat(X,Y);
subplot(3,2,3),imagesc(M),colormap(1-gray),title('Somewhat Matching Input Specs');
[p q C c]=DTWTWO(1-M);
subplot(3,2,4),imagesc(C);hold on; plot(q,p,'r'); hold off
103
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 113/117
MO=matchMat(X,Z);
subplot(3,2,5),imagesc(MO),colormap(1-gray),title('Poorly Matching Input Specs');
[p q CO co]=DTWTWO(1-MO);
subplot(3,2,6),imagesc(CO);hold on; plot(q,p,'r'); hold off
TEST FILE 19
%To test the newer versions of DTW.clc;clear;
[y fs]=wavread(['Library/lib31.wav']);
Y=specCreate(y,fs);
[x fs]=wavread(['Library/lib21.wav']);
X=specCreate(x,fs);
[z fs]=wavread(['Library/lib231.wav']);
Z=specCreate(z,fs);
M=matchMat(X,Z);
imagesc(M),colormap(1-gray),title('Poorly Matching Input Specs');
c=DTWTHREE(M);
TEST FILE 20
%Will test if there's a time difference beween DTW.m and DTWTHREE.m
%DTWTHREE has the breaking code.
clc;clear;
numLibEnt=30; %number of library entries, 1-30
104
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 114/117
%1-26 being alphabet, 27-30 being enter, yes, no, back.
numLibSam=2; %number of library samples (ie. 0 to 5 entries of 'a').
cmin=500; %minimum comparison excepted.
c=cmin;
ctemp=0; %#ok<NASGU>
cp=0.7071*10000; %From experimental data, if the DTW block produces a value of
%0.7071 then this is a perfect match. This value is normalised
%for any size difference, et cetera.
r=0;
tic
for a=1:numLibEnt
for b=1:numLibSam
[audioIn,fs] = wavread(['Library/lib' int2str(a) int2str(b) '.wav']);
audioIn=specCreate(audioIn,fs);
for m=1:numLibEnt
for n=0:numLibSam
[x fs]=wavread(['Library/lib' int2str(m) int2str(n) '.wav']);
Y=specCreate(x,fs);
M=matchMat(audioIn,Y);
ctemp=abs(DTW(M)-cp);
if (ctemp<c)
c=ctemp;
r=m;
end
end
end
end
end
t0=toc %#ok<NOPTS>
tic
for a=1:numLibEnt
for b=1:numLibSam[audioIn,fs] = wavread(['Library/lib' int2str(a) int2str(b) '.wav']);
audioIn=specCreate(audioIn,fs);
for m=1:numLibEnt
for n=0:numLibSam
[x fs]=wavread(['Library/lib' int2str(m) int2str(n) '.wav']);
Y=specCreate(x,fs);
M=matchMat(audioIn,Y);
ctemp=abs(DTWTHREE(M)-cp);
if (ctemp<c)
c=ctemp;
r=m;
end end
end
end
end
t1=toc %#ok<NOPTS>
105
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 115/117
TEST FILE 21
%This tests more specific differences beween DTW.m and DTWTHREE.m
clc;clear;
a=rand %#ok<NOPTS>
[x fs]=wavread(['Library/lib21.wav']);X=specCreate(x,fs);
[z fs]=wavread(['Library/lib231.wav']);
Z=specCreate(z,fs);
L=30000000;
t0(1,2)=0;
t1(1,2)=0;
for m=L;
M=matchMat(X,X);tic
c=DTW(M);
t0(1,1)=toc+t0(1,1);
end
for m=L;
M=matchMat(X,X);
tic
c=DTWTHREE(M);
t1(1,1)=toc+t1(1,1);
end
for m=L;
M=matchMat(X,Z);
tic
c=DTW(M);
t0(1,2)=toc+t0(1,2);
end
for m=L;
M=matchMat(X,Z);
tic
c=DTWTHREE(M);
t1(1,2)=toc+t1(1,2);
end
t0%#ok<NOPTS>
t1%#ok<NOPTS>
stem([1,2],t0,'b'),hold on, stem([1,2],t1,'r'),title('DTW in blue, DTWTHREE in
red')
106
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 116/117
References[1] Chiba, S, and Sakoe, H., “Dynamic programming algorithm optimization for spoken wordrecognition,” IEEE Transactions on Acoustics, Speech and Signal Processing, 26 pp. 43- 49.
[2] Dumitru, C.O. and Gavat, I., “Vowel, digit and continuous speech recognition based on statistical,neural and hybrid modelling by using ASRS_RL,” EUROCON 2007 - The International Conference on
Computer as a Tool, pp. 856-863, September 2007.
[3] Ellis, Dan. "Dynamic Time Warp (DTW) in Matlab." Dan Ellis's Home Page (Columbia University
Electrical Engineering) . Web. http://www.ee.columbia.edu/~dpwe/resources/matlab/dtw/
[4] Flanagan, JL. Speech Analysis: Synthesis & Perception. New York: Academic In., 1965. Print.
[5] Fry, DB. The Physics of Speech. Cambridge: Cambridge UP, 1979. Print.
[6] Gold, B., and N. Morgan. Speech and Audio Signal Processing. John Wiley & Sons Inc., 2000.Print.
[7] Hart, P. "Voice recognition: what all the talk is about," Telecommunication,. vol 29. no 7. July 1995.
[8] Jawed, F., Muzaffar, F. et al. “DSP implementation of voice recognition using dynamic timewarping algorithm,” 2005 Student Conference on Engineering Sciences and Technology, SCONEST .Karachi, Pakistan, 2005.
[9] Kale, Kaustubh R. "Dynamic Time Warping." Computaional NeuroEngineering Lab at the
University of Florida. Web. http://www.cnel.ufl.edu/~kkale/dtw.html
[10] Mrvaljevic, N. and Ying, S. “Comparison between speaker dependent mode and speaker
independent mode for voice recognition,” Bioengineering, Proceedings of the Northeast Conference,Boston, United States of America, April. 2009.
[11] Nelson, B. and Runger, G., “Predicting processes when embedded events occur: Dynamic timewarping,” Journal of Quality Technology, vol 35, no 2, pp. 213-226, April 2003.
[12] National Federation of the Blind, “Braille readers are leaders,” [Online] 2009 Available:http://www.nfb.org/nfb/Braille_coin.asp [Accessed: Oct. 7 2009]
[13] The MathWorks Store, [Online] 2009Available: http://www.mathworks.com/store/ [Accessed: Oct. 4 2009]
[14] Lindsay, B. "4BI6 Group 13 Logbook," 2009-2010.
107
8/6/2019 Design of a Limited Speech Recognition System for Use in a Braill
http://slidepdf.com/reader/full/design-of-a-limited-speech-recognition-system-for-use-in-a-braill 117/117
VITANAME: Brett LindsayPLACE OF BIRTH: Burlington Ontario, CanadaYEAR OF BIRTH: 1988SECONDARY EDUCATION: Lord Elgin High School (2002-2004)
Robert Bateman High School (2004-2006)
UNDERGRAD EDUCATION: McMaster University (2006-2010)HONOURS and AWARDS: Queen Elizabeth II Aiming for the Top Scholarship 2006
McMaster Entrance ScholarshipSmurfit-Stone Scholarship 2006, 2007, 2008, 2009Dean’s Honour List 2007, 2009