From last time:PR Methods
• Feature extraction + Pattern classification
• Training, testing, overfitting, overtraining
• Minimum distance methods• Discriminant Functions• Linear• Nonlinear (e.g, quadratic, neural
networks)• -> Statistical Discriminant
Functions
Statistical Pattern Recognition
• Many sources of variability in speech signal
• Much more than known deterministic factors
• Powerful mathematical foundation• More general way of handling
discrimination
Statistical Discrimination Methods
• Minimum error classifier and Bayes rule
• Gaussian classifiers• Discrete density estimation• Mixture Gaussians• Neural networks
we decide x is in class 2
we decide x is in class 1
How to approximate a Bayes classifier• Parametric form with single pass
estimation
• Discretize, count co-occurrences
• Iterative training (mixture Gaussians, ANNs)
• Kernel estimation
Minimum distance classifiers• If Euclidean distance used,
optimum if:• Gaussian
• Equal priors
• Uncorrelated features
• Equal variance per feature
• If different variances per feature, correlated features, MD could be better
•Then the discriminant function can be
Di(x) = wiTx + wi0
•Where
Wi = Σi-1μi
•Andwi0 = - ½ (μi
T Σi-1μi) + log p(ωi)
•This is a linear classifier
General Gaussian case
•Unconstrained covariance matrices per class
•Then the discriminant function is
Di(x) = xTWix + wiTx + wi0
•This is a quadratic classifier
•Gaussians are completely specified by 1st and 2nd order statistics
•Is this enough for general populations of data?
log p(x |ωi) + log p (ωi )
A statistical discriminant function
P(a|b) = P(a,b)/P(b)
P(a,b) = P(a|b)P(b) = P(b|a)P(a)
Remember:
Upcoming quiz etc.• Monday, 1st the guest talk on
“deep” neural networks
• Then the quiz. Topics: ASR basics, pattern recognition overview. Typical questions are multiple choice plus short explanation. Aimed at a 30 minute length.
• There will be one more HW, one more quiz, then all oriented towards project.