Date post: | 10-Apr-2017 |
Category: |
Software |
Upload: | aavaas-gajurel |
View: | 39 times |
Download: | 0 times |
SupervisorDr. Basanta Joshi
Aavaas Gajurel (068/BCT/501)Anup Pokhrel (068/BCT/505)
Manish K. Sharma (068/BCT/523)
System Overview
System Block Diagram - TrainingNoise
ReductionSplit Module (VAD Based) Training Set
MFCC Features Train HMM
System Block Diagram - Recognition
Audio Input Noise Reduction
Split Module (VAD Based)
MFCC Computation
HMM
Audio ClassifierLanguage Model Output
SYSTEM DESIGN METHODOLOGY
NOISE REDUCTION
Creating Noise Profile
BUILD NOISE PROFILE
Update the computed Noise Profile
AVERAGE OVER TIME1𝑁 [𝑆𝑢𝑚𝑜𝑓 𝐹𝐹𝑇𝐶𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡>10 𝑓𝑟𝑎𝑚𝑒𝑠 ]
FOURIER TRANSFORMFFT of 32ms Audio Samples
Spectral Subtraction
INVERSE FOURIER TRANSFORM
Rebuild the Signal
SUBTRACT NOISE PROFILE (STATIC AND MUSICAL)Over Subtraction Short Segment Removal
FOURIER TRANSFORM OF SIGNALFFT of 32ms Audio Samples
After Spectral Subtraction After Musical Noise Removal
Before Noise Removal Spectral Subtraction output
VOICE ACTIVITY DETECTION
Voice Activity Detection
Voice Activity Detection Process I
CALCULATE THE TRIGGER
𝑡𝑤=𝜇+𝛼 𝛿𝑤
COMPUTE mean AND variance
SAMPLE10 Frame Sampling
Voice Activity Detection Process II
CLASSIFY
If greater than trigger then voice
COMPUTE CLASSIFICATION MEASURE
READ THE SAMPLERead the frame
𝑊 𝑠1 (𝑚 )=𝑃 𝑠1(𝑚)(1−𝑍𝑠 1 (𝑚 ) )𝑆𝑐
Feature Extraction
Audio Feature Extraction
Feature Extraction Process I
APPLY MEL FILTERBANK Multiply Filterbank(20-40) by Periodogram Estimate
CALCULATE PERIODOGRAM ESTIMATE𝑃 𝑖 (𝑘 )= 1
𝑁 ¿𝑆𝑖 (𝑘)∨¿2¿
FRAMINGDivide Audio into Sections of 20ms-40ms
Feature Extraction Process II
KEEP REQUIRED COEFFICIENTSKeep Required Number of Coefficients
DISCRETE COSINE TRANSFORM OF ENERGIESTake DCT of Coefficients of Above Step
SCALINGTake Logarithm of Filterbank Energies
Language Model
Using Language Model
Language Model Training
Language Model Based Classification
SELECT BEST
𝑃 (𝑊 𝑖|𝑊 𝑖− 1 )=𝜆1 (𝑃 (𝑊𝑛|𝑊𝑛−1 ) )+𝜆2𝑃 (𝑊 𝑛)
GET POSSIBLE CANDIDATESFrom Acoustic Model
READ PREVIOUS WORD
ACOUSTIC MODEL
HMM Based Classification
Training the Acoustic Model
TRAIN USING BAUM WELCH ALGORITHM
SELECT HMM MODEL
READ MFCC COEFFICIENTS AND WORD
Using the Acoustic Model
OUTPUT WORD CORRESPONDING TO MODEL
SELECT MODEL WITH MAXIMUM PROBABILITY
FIND LOG PROBABILITY OF WORD FOR EACH MODEL
READ MFCC COEFFICIENTS OF WORD
RESULTS
Trained vs. Untrained Input
• 3 Speakers • 5X10 Words Each• 5 Testing Set Each
Accuracy of System0
10
20
30
40
50
60
70
80
90
100
86.67
66.67
Using Trained and Untrained Input
Trained Set Untrained Set
Noise Reduced vs. Not Noise Reduced
• 3 Speakers • 5X10 Words Each• Untrained Input Files for Testing• 5 Testing Set Each
Accuracy of System0
10
20
30
40
50
60
70
80
46.67
66.67
Effect of Noise Reduction
Noise Not Reduced Noise Reduced
Gender Based Results
• 7 Speakers • 3 Females, 4 Males• Animal Names as Test• Untrained Input Files for Testing
Female Voice Training Male Voice Training Female and Male Voice Training
0
10
20
30
40
50
60
70
36
6459
66
44
5651
5458
Gender Based Result
Male Female Male + Female
LIMITATIONS AND RECOMMENDATIONS
Limitations
Limited Vocabulary
User Specific Noise Profiles
Static MFCC Coefficients Only
Training Data Storage Absent
Non-Continuous Recognition
Recommendations
Using Dynamic Coefficients
Continuous HMM Model
Extensive Training
Better Phonemic Modeling
Dynamic Noise Modeling
USAGE SCENARIO
Usage Scenario I
Easy Nepali Input
Automated Telecom Assistance
Speech Controlled Interface
Automated Transcribing
Usage Scenario II
Military Sector for Automated Wire Tapping
Public Guidance System
Automated User Support (banks, corporate houses,etc.)
Thank You !