Transcription of Text by Incremental Support Vector machine Anurag Sahajpal and Terje Kristensen.

Transcription of Text by Incremental Support Vector machine

Anurag Sahajpal and Terje Kristensen

Outline

Introduction Theory of Incremental SVM Application Discussion, further work and references

Introduction

Phoneme : the basic abstract symbol representing speech sound

Transcription : process of converting text (word) into corresponding phonetic sequence

Letter-to-phoneme correspondence is generally not one-to-one

Examples : ”lønnsoppgaver” trascribes to !!2nsOpgA:v@r ”natt” transcribes to nAt while rar to rA:r

The Problem

Phoneme transcription an instance of more general problem of Pattern recognition

Phonetic rules compiled by experts are time consuming and fixed for a particular langauge

What is required is an automatic approach, independent of any particular language

The Problem

Machine learning approach using SVM reported in earlier paper

The phonemic data in a language shows regional variation

Distributed learning by SVM may be tried to adapt to geografically distributed phonemic data

Support Vector Machine

Distribution free Non-parametric Non-linear High-dimensional Small training data

sets Convex QP problem Good generalization

performance

Support Vectors

Margin Width

x2

x1

Support Vector Machine

In a nutshell: map the data to a predetermined very high-

dimensional space via a kernel function

Find the hyperplane that maximizes the margin between the two classes

If data are not separable find the hyperplane that maximizes the margin and minimizes the (a weighted average of the) misclassifications

Which Separating Hyperplane to Use?

Var1

Var2

Maximizing the Margin Var1

Var2

Margin Width

Margin Width

IDEA 1: Select the separating hyperplane that maximizes the margin!

MultiClass SVMs

One-versus-all Train n binary classifiers, one for each class

against all other classes. Predicted class is the class of the most confident

classifier

One-versus-one Train n(n-1)/2 classifiers, each discriminating

between a pair of classes Several strategies for selecting the final

classification based on the output of the binary SVMs

Outline

IntroductionIntroduction

Theory of Incremental SVM ApplicationApplication Discussion, further work and referencesDiscussion, further work and references

SVM in Incremental and Distributed Settings

Performance constriants with SVM training for large-scale problems

Cumulative learning algorithms that incorporate new data over time (incremental) and space (distributed)

Modifications to batch SVM learning to adapt to cumulative settings

Calls for provable convergence properties

A naive approach to cumulative learning

SVM learns D1 and generate a set of support vectors SV1

add SV1 to D2 to get a data set D`2

SVM learns D`2 and generate a set of support

vectors SV2

Incremental SVM Learning

Convex hull of a set of points, S, is the smallest convex set containing S

U-Closure property satisfied for convex hulls

Vconv(Vconv(A1) U A2) = Vconv(A1 U A2) where Vconv(A) denote the vertices of a convex hull of a set A

Incremental SVM Learning

learning algorithm, L, computes Vconv(D1(+)) and Vconv(D1(-))

Add Vconv(D1(+)) to D2(+) to obtain D`2(+)

Add Vconv(D1(-)) to D2(-) to obtain D`2(-)

L computes Vconv(D`2(+)) and Vconv(D`2(-))

Generate a training: D12 = Vconv(D`2(+)) U Vconv(D`2(-))

compute SVM (D12)

Outline

IntroductionIntroduction Theory of Incremental SVMTheory of Incremental SVM

Application Discussion, further work and referencesDiscussion, further work and references

SAMPA for Norwegian

SAMPA (Speech Assessment Methods Phonetic Alphabet)

- A computer readable phonetic alphabet Consonants and Vowels are classified into

different subgroups : Consonants – plosives(6), fricatives(7), sonorant

consonants(5) Vowels – long(9), short(9), Diphthongs(7)

In our work, an estimate of 43 phonemes plus 10 additional phonetic symbols

Example of Training data file

Some examples of transcription of words using the Sampa notation:

Words Transcription

ape, !!A:p@

apene, !!A:p@n@

lønnsoppgaver !l2nsOpgA:v@r

politiinspektørene !puliti:inspk!t2:r@n@

regjeringspartiet re!jeriNspArti:@

spesialundervisningen spesi!A:l}n@rvi:sniN@n

Transcription Method

Each letter pattern is a window onto a segment of the word where the phoneme to be predicted is in the middle of the window

The size of the window is selected to 7 letters in all the experiments

* e l e v e n

context contextactive

Pre-processing and coding

A pattern file of data consist of words and their trancription

Each pattern file is preprocessed before it is fed into SVM

An internal coding table is defined in the program to represent each letter and its corresponding phoneme

Example data file for LIBSVM

0 4:52 5:51 6:38 7:510 3:52 4:51 5:38 6:51 7:370 2:52 3:51 4:38 5:51 6:370 1:52 2:51 3:38 4:51 5:370 1:51 2:38 3:51 4:371 4:55 5:54 6:53 7:550 3:55 4:54 5:53 6:550 2:55 3:54 4:53 5:550 1:55 2:54 3:53 4:550 4:55 5:54 6:53 7:51

Experiment

Various steps in the experiment One-versus-all 30000 training patterns Generation of 54 class files Separate training for 54

corresponding models

Experiment

Various steps in the experiment The test file containing 10000

patterns is tested by each model and voting was carried out

The output file and the true output was compared to find the accuracy

Outline

IntroductionIntroduction Theory of Incremental SVMTheory of Incremental SVM ApplicationApplication

Discussion, further work and references

Discussion and Future Work

Complexity of convex hull computations have an exponential dependence on the dimensionality of the feature space.

Implementation and modification to the standard batch-mode SVM to incorporate convex hull algorithm

Extension to non-linear SVM classifier

References

Caragea D. and Silvescu A and Honavar V

“Agents that learn from distributed data sources”

In fourth International Conference on Autonomous Agents. 2000

http://www.kernel-machines.org/tutorial.html C. J. C. Burges. A Tutorial on Support Vector

Machines for Pattern Recognition. Knowledge Discovery and Data Mining, 2(2), 1998.

Date post:	31-Dec-2015
Category:	Documents
Upload:	victor-sydney-oneal
View:	214 times
Download:	0 times

Transcription of Text by Incremental Support Vector machine Anurag Sahajpal and Terje Kristensen.

Documents