Results of Tagalog vowel Speech recognition using Continuous HMM
Arnel C. FajardoPh. D student
(Under the supervision of Professor Yoon-Joong Kim)
Basic structure of HTK
1.Data Preparation
Speech Data for Training and Testing-Data ( wave file)
625 wave file 25 sets5 sets per speaker
-Training Data 5 sets per speaker 25 sets
-Test DataSpeaker DependentTest 1 :5 setsTest 2 :10 sets
Feature of Speech Data: *.wav-16Khz, 16 bit, linear PCM
a1001.wav=>”a”e1001.wav=>”e”i1001.wav=>”I”o1001.wav=>”o”u1001.wav=>”u”…………
Variables:2 test : 5 speakers ( 1 set each)
10 speakers ( 1 set each)
Hmmdefsm5m6
Compute Feature Vectors
• Use HCopy -C configs\HCopy.config -S scripts\HCopy.scp• Hcopy.exe
– Compute the features from wave file and save the features on the same folder.
– MFCC was used
-C configs\HCopy.configConfiguration file to compute features
-S scripts\HCopy.scpScript file of a listWave file and feature file
• HCopy
• Number of Inputs files: 3Waveform files - *.wavConfiguration file – Hcopy.configScript file - Hcopy.scp
Number of output file: 1MFCC file - *.mfc
Create Hcopy.config in ….Configs/Hcopy.configWrite:# Coding parametersSOURCEKIND = WAVEFORMSOURCEFORMAT = NIST SOURCERATE = 625 TARGETKIND = MFCC_0 TARGETRATE = 100000.0 SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 250000.0USEHAMMING = T PREEMCOEF = 0.97NUMCHANS = 26CEPLIFTER = 22 NUMCEPS = 12 ENORMALISE = F
Script file: Scripts/Hcopy.scp
Prepare the Master label file Master label File- Word level transcriptionsmlfs/words.mlf
ModelList-modelList/wordList
Hmm model name list
Generate initial master macro file or HmmdefsHCompV -C configs\config -f 0.01 -m -S scripts\train.scp -M wordHmms\m0\ wordHmms\proto
•HCompV.exe
number of Inputs: 3Input 1 - -C configs/config //parameters for computing feature
-f 0.01 //the variance floor macro (called vFloors) will be // computed with value 0.01 times the global variance
-m //the mean and the variance will be computed Input 2- -S scripts/train.scp //mfc feature vector list to be used in training Input 3- WordHmms/proto //the handwritten hmm prototype
• Number of output: 1 -M WordHmms/m0 // directory for the result //vfloors : variance floor macro
Output 1 //proto : hmm prototype with valued GMM //hmmdefs : will be written manually with proto
Input 1Configs/configscript/Hcopy.config => configs/config
wordHmms/mo/vfloorglobal constant values for computing bj(ot) shown below
Input 2Scripts/train.scp
Input 3General Hmm model(prototype) for mono phone speechWord Hmms/protoIt has 3 states Note: NumStates has 5 states since state 1 and 5 correspond to sil
wordHmms/proto + global means and variances => wordHmms/m0/protoShows the result of the command HCompV for wordHmms/m0/proto
Input 3 - wordHmms/mo/hmmdefs-Master Macro file (MMF)
Step 2.Training HERest -C configs\config -I mlfs\words.mlf -S scripts\train.scp -H wordHmms\m0\hmmdefs -M wordHmms\m1 modelList\wordList
HERestNumber of inputs: 5 -C configs/config //parameters for feature -I mlfs/words.mlf //master label file, word, speech file
modellist/wordList //word name list(hmm list) -S scripts/train.scp //mfc file list for training -H wordHmms/m0/hmmdefs //hmmdefs (a set of hmm
prototypes) for all wordsNumber of output: 1 -M wordhmms/m1 // re-estimated hmmdefs
Input 1Configs/configConfiguration for wordhmms/m1
Input 3modelList/wordList
Input 2mlfs/words.mlf
Input 4Scripts/train.scp
Input 5wordHmms/mo/hmmdefs(MMF)
Output 1HERest -C configs\config -I mlfs\words.mlf -S scripts\train.scp -H wordHmms\m0\hmmdefs -M wordHmms\m1 modelList\wordList Result: wordHmms/m1/hmmdefs
Reestimate hmmdefs :HERest –C configs/config –I mlfs/words.mlf -S scripts/train.scp –H wordHmms/m1/hmmdefs –M wordHmms/m2 modelList/wordList
HERest –C configs/config –I mlfs/words.mlf -S scripts/train.scp –H wordHmms/m2/hmmdefs –M wordHmms/m3 modelList/wordList
HERest –C configs/config –I mlfs/words.mlf -S scripts/train.scp–H wordHmms/m3/hmmdefs –M wordHmms/m4 modelList/wordList
HERest –C configs/config –I mlfs/words.mlf -S scripts/train.scp–H wordHmms/m4/hmmdefs –M wordHmms/m5 modelList/wordList
Step 3.Recognition TestHVite –C configs/config -S scripts/test.scp –H wordHmms/m5/hmmdefs –w dic/tag_Net –i mlfs/recOutWordm5.mlf dic/dict modelList/wordList
HVite Number of Inputs = 5 –C configs/config //parameters for mfc modelList/wordList // hmm name list -S scripts/test.scp // mfc vector list for testing –w dic/tag_Net //word network for recognition Dic/dict //pronouncing dictionary –H wordHmms/m5/hmmdefs //a set of hmms Number of output = 1 –i mlfs/recOutWordm5.mlf // result of recognition
dic/dict - Writing a pronouncing dictionary Word [outsym] models –Word : word to be recognized –[outsym] : string to output when word is recognized –models : hmm model list
BNF
Grammar rule $ :variable {} : zero or more repitions <>:one or more repitions [] : optional
(sil $words sil)
$words= a | e | i | o | u;(sil $words sil)
HParse –C configs/config dic/tag_v_Gram dic/tag_Net
(dic/tag_v_Gram)
HParse –C configs/config dic/korGram dic/tag_Net Results of HParse to tag_v_Gram:
dic/tag_Net configs/config
HVite –C configs/config -S scripts/test.scp –H wordHmms/m5/hmmdefs –w dic/tag_Net –i mlfs/recOutWordm5.mlf dic/dict modelList/wordList
config/config scripts/test.scp modellist/wordList
HVite –C configs/config -S scripts/test.scp -H wordHmms/m5/hmmdefs –w dic/tag_Net –i mlfs/recOutWordm5.mlf dic/dict modelList/wordList mlfs/recOutWordm5.mlf
Step 4.Recognition results.HResults –I mlfs/words.mlf modelList/wordList mlfs/recOutWordm5.mlfFirst test : 5 sets ( each set represents 1 speaker) = > 5 speakers
Step 4.Recognition results.HResults –I mlfs/words.mlf modelList/wordList mlfs/recOutWordm5.mlfSecond test : 10 sets ( each set represents 1 speaker) = > 10 speakers
Comparison of m5 and m6 ( hmmdefs)( slight difference)HERest -C configs\config -I mlfs\words.mlf -S scripts\train.scp -H wordHmms\m4\hmmdefs -M wordHmms\m5 modelList\wordList
HERest -C configs\config -I mlfs\words.mlf -S scripts\train.scp -H wordHmms\m5\hmmdefs -M wordHmms\m6 modelList\wordList
m5 m6
END