Optimum guard zone for self-supervised learning

Optimum guard zone for self -supervised learningS.K. Pal, M.Tech., Ph.D., Mem.I.E.E.E.

Indexing terms: Algorithms, Fuzzy sets, Pattern recognition, Vowel recognition

Abstract: A self-supervised learning algorithm using fuzzy set and the concept of guard zones around theclass representative vectors is presented and demonstrated for vowel recognition. An optimum guard zonehaving the best match with the fully supervised performance is determined. Results are also comparedwith that of nonsupervised case for various orders of input patterns.

1 Introduction

An adaptive pattern recognition system can be viewed as alearning machine which improves the system's performance byacquiring necessary information for decision during thesystem's operation. In a supervised system, the machine ingeneral, requires an extra source of knowledge, usually of ahigher order, for correcting the decision taken by the classifier.Bayesian estimation and stochastic approximation [1,2] canbe used for supervised learning to learn unknown parameterssuccessively in a given form of distribution of each class.In a strictly nonsupervised adaptive system, these parametersare updated solely on the basis of the decision of the classifier.The convergence of the system to an optimal set of classrepresentative parameters may thus be seriously affected byincorrect decisions of the classifier [1]. Again, when anextra source of knowledge on which the supervisory programmecould be based is not readily available, the performance of thesystem becomes completely unpredictable. In the presentpaper, a system of self supervision based on inherent propertiesof the class distribution of features is proposed. For allpractical problems, the distribution of the members of a classin the feature space has a central tendency and it may beassumed that probability of misclassification near thesecentral tendencies is substantially low. Thus, one can constructa region around this central tendency of a class, so that anunrestricted updating procedure for the samples falling only inthis region would assist the convergence of the systemsignificantly. Such a region, defined as a 'guard zone', formsthe basis of a supervisory programme which needs only tocheck whether the classified input is within the guard zone forthe purpose of inhibition of the updating programme. Amathematical formulation of such guard zones would requirea thorough knowledge of the distribution function of thefeatures for each class. When these functions are not preciselyknown, the size of the guard zones has to be experimentallydetermined. The various guard zones, the semiaxes of whichare the l/Xth(X2 = 0.5, 1,2,4,6, 8) part of the correspondingstandard deviations are therefore considered around theselected (estimated) representative vectors for obtaining theoptimum one.

The effectiveness of the algorithm with a classifier based onthe concept of fuzzy set [3] is demonstrated on the differentsequences of a set of about 900 vowel sounds when the firstthree format frequencies are considered as input features.

2 Fuzzy sets and classification algorithm

A fuzzy set A with its finite number of supports Xi ,x2, . . . ,xn in the universe of discourse U is defined as

A = W * , ) / x / } i = 1 ,2 , . . .n (1)

where the membership function MA(*«) having positive values

Paper 1661E, received 24th June 1981The author is with the Electrical Engineering Department of theImperial College of Science and Technology, London SW7 2BT,England. He is on leave from the Electronics & CommunicationSciences Unit, Indian Statistical Institute, Calcutta 700 035, India

in the interval [0, 1] denotes the degree to which an eventXi may be a member of A. This characteristic function can beviewed as a weighting coefficient which reflects the ambiguity(fuzziness) in A.

Similarly, the property p defined on an event xt is afunction p(x,) which can have values only in the interval[0, 1]. A set of these functions, which assigns the degree ofpossessing some property p by the event xt constitutes whatis called a property set [4].

Now we develop a multicategory classifier on the basis ofthe property set where the input and output (decision) aredeterministic, but the process of classification is fuzzy. Let

= [Pi/xltp2/x2,...,pn/xn,.. -,

be an unknown pattern in an TV-dimensional vector space con-taining m pattern classes to be recognised. pn/xn denotes thedegree of possessing the nth fuzzy property pn by the nthmeasurement xn of the pattern X and has values betweenzero and one. The decision of the classifier is based on themagnitude of the fuzzy similarity vector SJl)(X) = {s^J} ofX with respect to the /th prototype in the/th class, where

Pn =

xn = max maxj i

(2)

(3)

(4)

(5)

forn = 1,2,. . .N\j= 1,2, . . .mand/ = l , 2 , . . . n ; .si}} denotes the grade of similarity between the nth property

of X and that of the /th prototype in the/th class, p^] denotesthe degree to which property pn is possessed by the /thprototype in the /th class. xn is the nth reference constant.hj is the number of prototypes in the /th class, x^ and o^jcorrespond to the /th prototype and represent the mean andstandard deviation corresponding to the nth component in the/th class. (l/on

1}) is used as a weighting coefficient in measuringsimilarity. Positive constants Fe andFd are the exponential anddenominational fuzzifiers, respectively, which play the role ofcreating different amount of fuzziness in the property set [5].It is to be noted here that the property pn (eqn. 3) is definedusing the 77-function [6] which represents the compatibilityfunction corresponding to a fuzzy set 'xn i s x n \

The pattern X is then decided to be from the kth class if

\Sk(X)\ = max\Sj(X)\j

where

\Sj(X)\ = max\SJl\X)\i

= k=\,2,...m.

(6)

(7)

IEEPROC, Vol. 129, Pt. E, No. 1, JANUARY 1982 0143-7062/82/010009+06 $01.50/0

r 3 Self-supervision algorithm

The 'decision parameter of the supervisor' (DPS) for the /thclass is defined as

(DPS)j= (8)

(9)

m.for n = 1, 2 , . . .N;I = 1, 2 , . . . hj and/ = 1, 2,

X (a positive constant) is termed as 'zone controllingparameter' which controls the dimension o'np of the guardzone in a class. It is to be mentioned here that this decisionparameter would lead to ellipsoidal shapes of the guard zones.Since the system uses the inherent properties of the distributionof the same parameters as used by the classifier itself, it maybe called a 'self-supervisory system'.

The supervisor then accepts the decision made by theclassifier that X is from the kth class, only if

(DPS)k < 1 (10)

and the parameters x$,xn, 0$ and p$ for the kth class arethen correspondingly updated for that input sample X.Otherwise, the decision is thought to be doubtful and no otheralteration of these class parameters is made.

4 Iterative algorithm for parameter estimation

In general, the input events which are to be classified are in asomewhat randomly mixed sequence. These samples afterbeing classified become members of certain classes and modifythe centres and the weight vectors of them.

Let xnj(t)> °2nj(t) a nd Pnj(t) represent the mean, varianceand fuzzy property p, respectively, along the nth co-ordinateaxis of the /th class and i n ( t ) be the nth reference constant,estimated by the first t samples. Then, after the addition ofanother sample xn(t+l), these parameters would be adjustedas follows:

= 2-

= maxxnj{t+l)i

Pnj(t + 1) - \ x n(t+l)

xnj(t + l) (11 )̂

(lie)

1 = 1,2,. ..t (lid)

(We)

FdYFe (11 / )

For simplicity, we have omitted / (superscript) of theseparameters. It is to be noted here that only the parametersxnut+i) a nd Cnj(t + i) (eqns. l la and c) are directly modifiedby the new input xn (f+ 1) . The others follow from these two.

5 Method of recognition

Fig. 1 shows the block diagram of a self-supervised recognitionsystem. The model uses a classifier based on 'fuzzy propertyset' which measures the similarity between the differentrepresentative vectors and the input vector and then assignsthe input to the class for which the representative vectorsshow maximum similarity. To study the adaptive capabilityof the system in recognising a pattern, the initial values ofthese parameters are deliberately chosen to be different((*rvW> (0<nj(t))> (xn(t)>> ^P^W) from their true values. <•)denotes the estimated value. Of these four parameters, asmentioned in the Section 4, we need primarily (x $> and(Onj) to be estimated from some sets of training samples.Others are being automatically derived from those estimates.

After the classification of X, it is the task of the supervisorto judge whether the sample X is within the specified guardzone as defined around <5c$(O -̂ If it does, the decision ofthe classifier that X is from A:th class is accepted by thesupervisor and the parameters of kth class are updated by X.Otherwise, there will be no alteration of the class parametersbefore the next input.

In fully supervised learning, the decision of the classiferis verified by an external supervisor and the class parametersare altered only if the classification is found to be correct.

start

readX

computeP.

n(t>

compute

SJ" (X,*( l ) .n j ( t ) Jl

nj (»)

store parameters

n j ( t ) ' n ( t ) '

(I)

n = 1 , 2 , . . . N; j — 1 ,2 . . . m ;

I = 1 2 . . . . h

determine

( X ) | = max |

J l )° nk(O

-( I )X

n k ( t )

testend of data

compute

nk(t«1 )

compute

decide

kth classyes

stop

Fig. 1 Block diagram of self-supervised recognition system

10

compute(DPS)k

computead)

nk(U1 )

compute

compute-x{\)

nk(U1)

yes

(DPS)k«1

IEEPROC, Vol. 129, Pt. E, No. 1, JANUARY 1982

For nonsupervised case, the decision of the classifer isconsidered to be final and the parameters of the recognisedclasses are correspondingly modified.

6 Implementation of vowel recognition

The previously mentioned algorithm was implemented on a setof 871 Telugu (an important Indian language) vowel sounds[7-10] uttered by three speakers in the age group of 30 to35 years. The first three vowel formant frequencies wereconsidered as recognition features to classify ten vowel classes(8, a:, i, i: e, e:, u, u:, o and o:) including long and shortcategories. Since the short and long categories of a vowel differonly in duration, these ten vowels were then divided into sixgroups (3, a:, I, E, U and 0) which differ only in phoneticfeatures.

The set of data for each of the vowel classes has been foundto follow the normal distribution [8]. Therefore, theassumption (as mentioned in Section 1) that 'the probabilityof misclassification of the input patterns falling within theguard zone constructed around the central tendency of a classdistribution is substantially low' is well justified here. Anunrestricted updating programme for those samples wouldthus assist the convergence of the system significantly. Ofcourse, the convergency of this adaptive system (as mentionedin the following text) is also experimentally verified [9].

Now, we are interested here in studying the adaptiveefficiency of the system in recognising vowel sounds with the

76

75

;73

>>

72

71

70

. 1

\ >\

5 6step number

a

73

«J72o

t: 71o

70

69

68

\

nonappropriate prototype vectors representing the classes.Similar investigations have also been reported [9] which usednonadaptive, fully supervised and nonsupervised procedureswhere the prototype points and corresponding weightingcoefficients of a specified class were obtained from fiveutterances of one of the three speakers selected randomlyfrom each of the classes. Such an initial incorrect set of classrepresentative points was found as the process of classificationcontinued, to approach gradually the respective true meanvalues, demonstrating the convergence property of the learningalgorithm. Once the optimum size of the training set (usually16 to 20 samples per class) [8—10] is obtained by theclassifier, further increase in the size of the set does notimprove the system's adaptivity, and hence the performance,significantly. The initial class representative vectors in thisexperiment were chosen just outside the boundary of anellipsoid having the three axes equal to the respectivestandard deviations of the features and mean of the class asthe centre. The standard deviations for providing weightingcoefficients corresponding to those representative pointswere obtained from a set of ten training samples selectedrandomly from each of the classes. Although the shorterand longer types of vowels I, E, U and 0 are treated similarly,they were given individual reference and weighting vectors.Thus, in our experiment, m = 6, N= 3, h = 1 for 3 and a:,and h = 2 for I, E, U and O.Fe andFd were considered to be0.5 and 40000, respectively. These values of the fuzzifierswere found to be optimum [5, 8, 11] with respect torecognition score.

80r

79

78

77

76

•75

73

72

71

70

69

68

67

5 6step number

b

step numberc

Fig. 2 System performance curves

fully supervised, + + + + nonsupervisedD—D—o \ 2 = 0.5, o—o—o X2 = l

x—x—x X = 6, X2 = 8

IEEPROC, Vol. 129, Pt. E, No. 1, JANUARY 1982 11

7 Experimental results

Since the performance of an adaptive system depends much onthe sequence of incoming samples [1], the experiment wasrepeated ten times for different orders of appearence of theevents in sample sapce. Fig. 2 illustrates, for three such typicalinstances, the variation of the cumulative recognition scoreafter every 100 samples for different values of X. To restrictthe size of the paper, the performance curves only for threeinstances are presented. Results obtained with self-supervisedlearning were compared with those for fully supervised andnonsupervised cases. It was revealed, under investigation, thatthe sequence corresponding to Fig. 2a provided a worse setof input events after the 5th step. The nonsupervised system,as expected, resulted in a poor performance where the largenumber of wrong classifications further weakened the alreadyweak representative points. The reverse is the true for Fig. 2bwhere the sequence provided a better set of input samplesafter the 3rd step. For Fig. 2c, the first 100 patternscontained the best and resulted in higher initial recognitionscores of 80% for fully supervised and 74% for nonsupervisedlearnings; the following sequences of worse events havereduced these scores to 75.5% and 67.75%, respectively.

As X (in eqns. 8 and 9) increases/decreases, the dimensionof the guard zones dni decreases/increases and the corre-sponding DPS-values are increased/decreased. Therefore thechance of correct/wrong samples correcting/vitiating therepresentative vectors is decreased/increased and the systemperformance accordingly approaches the nonadaptive/nonsupervised cases. Based on the mean-square distance,defining the mean-square error (Table 1) at every instanceof the self-supervised curves, the curve corresponding toX = 2 shows the best match with that of the fully supervisedcase. The entries shown in Table 1 are the average valuescomputed over ten different sequences of inputs. The classifiercorresponding to X = 2 has been found, on average, to makeavailable the highest proportion of correct to incorrectsamples, so that after the several utterances have been dealt

Table 1: Mean-square distance (mean-square error) of self-supervisedcurves from fully supervised curve (averaged over ten observations)

Mean-squaredistance

0

1

.5

.341

1

2.672

2

2 .317

4

0.929

6

1.739

8

1.676

with by the classifier, the class-representative parameters arelikely to approach their respective true values (as determinedby the correct mean and standard deviations of the classes).The above results conform to our earlier findings [7], wherethe classifier was based on the 'minimum weighted distancefunction'. Tables 2a to c illustrate the confusion matrices ofvowel recognition corresponding to Fig. 2a—c for both thecases of fully supervised (upper score) and optimum self-supervised (lower score) learnings. A figure in a cell denotesthe number of times the machine took the same decisionin recognising the vowel sounds for the two respective cases.The diagonal elements represent the number of utterancescorrectly identified. The confusion in vowel recognition, asexpected from previous research [7—11], is seen to berestricted to two neighbouring classes.

8 Conclusions

The model of self-supervised learning algorithm with aclassifier based on the fuzzy properties of patterns has beenimplemented to the real problem of vowel recognition.

System performance for different guard zones selectedaround the initial representative vectors is studied for differentorders of input samples. With the shrinkage of the zone

12

boundaries, the system behaves like a nonadaptive recognitionsystem, whereas the nonsupervised performance is approachedby relaxing the boundaries. Optimum results are obtainedwhen, the semiaxes of the guard zones defined for the classescorrespond to one half the standard deviations along therespective co-ordinate axes.

The self-supervision algorithm does not involve any conceptof the theory of fuzzy sets, and hence can be used withany other classifier to improve the system's performance inrecognising patterns. In deciding the relative merit betweenfuzzy and statistical classifiers, one should study the compu-tational procedures involved and their respective efficacies ina given circumstance. For example, if we consider a Bayesianmaximum likelihood classifier [1]; it uses more prior infor-mation about the system description and needs a large numberof training samples in its design stage in order to evaluate themean vectors, and magnitude and inverse of dispersionmatrices corresponding to each of the classes.

Its decision-making stage involves a product (correspondingto each of the classes) of three matrices having dimensions(1 x N), (N x N) and (N x 1), respectively in order to classifyan unknown pattern. The recognition scores for vowel andplosive sounds as obtained by such a nonadaptive statisticalclassifier using more prior information were found to beslightly better (2-3%) [5, 8, 10, 12] compared with thoseobtained by a fuzzy classifier reported in this paper. Thisfuzzy classifier, on the other hand, has lesser computationalcomplexity and memory requirement for storing the represen-tative parameters. It also gives satisfactory performance,even when the number of training samples is small enough todesign a statistical classifier [8, 10]. Again, the fuzzy approachhas been found to have more computational flexibilitiesbecause of the various operators and connectives, which canbe exploited according to the extant problems [13].

9 Acknowledgments

The author wishes to thank Professors D. Dutta Majumder andA.K. Datta for their valuable discussion, and N.R. Ganguliand B. Mookherjee for processing the spectograms. Thanksare also due to R.A. King for his interest in this work.

10 References

1 MENDEL, J.M., and FU, K.S. (eds.): 'Adaptive, learning and pat-tern recognition system - theory and applications' (AcademicPress, NY, 1970)

2 YOUNG, T.Y., and CALVERT, T.W.: 'Classification, estimationand pattern recognition' (American Elsevier Publishing Co., London,1974)

3 ZADEH, L.A.: 'Outline of a new approach to the analysis of com-plex systems and decision processes', IEEE Trans., 1973, SMC-3,pp. 28-44

4 ALLEN, A.D.: 'Measuring the empirical properties of sets', ibid.,1974, SMC-4, pp. 66-73

5 PAL, S.K., and DUTTA MAJUMDER, D.: 'On automatic plosiveidentification using fuzziness in property sets', ibid., 1978, SMC-8,pp. 302-308

6 ZADEH, L.A., FU, K.S., TANAKA, K., and SHIMURA, M. (eds.):'Fuzzy sets and their applications to cognitive and decisionprocesses' (Academic Press, NY, 1975)

7 PAL, S.K., DATTA, A.K., and DUTTA MAJUMDER, D.: 'A self-supervised vowel recognition system', Pattern Recognition, 1980,12, pp. 27-34

8 PAL, S.K.: 'Studies on the application of fuzzy set theoreticapproaches in some problems of pattern recognition and man-machine communication by voice'. Ph.D. thesis, Calcutta University,1978

9 PAL, S.K., DATTA, A.K., and DUTTA MAJUMDER, D.: 'Adaptivelearning in classification of fuzzy patterns - an application tovowels in CNC context', Int. J. Syst. Sci., 1978, 9, pp. 887-897

10 DUTTA MAJUMDER, D., DATTA, A.K., and PAL, S.K.: 'Com-puter recognition of Telugu vowel sounds', /. Comput. Soc. India,1976, 7, pp. 14-20

IEEPROC, Vol. 129, Pt. E, No. 1, JANUARY 1982

Table 2: Confusion matrices for vowel recognition

Actual class

8 a:

DC

0

u

roT3CD

EooCD

DC

5

a:

0

U

8 a:

o

1

106117

6451

24

E

3

163151

3039

99

55

5

1012

5149

88

22

11

a:

2627

5758

64

0

67

11

55

133136

3531

U

15

22

23

134126

(i)

Actual class

I

107120

6349

23

E

1

159151

3340

1110

45

5

812

5349

78

32

11

a:

3027

5256

76

O

58

21

55

132135

3631

U

22

1523

134126

(ii)

Actual class

I

99117

7249

16

E

3

186142

1448

49

35

5

1111

5150

78

22

11

a:

2829

5557

63

O

117

61

25

129132

3235

U

1

22

1111

137138

(iii)

(i) Corresponding to Fig. 2a, (ii) Corresponding to Fig. 2b, (iii) Corresponding to Fig. 2cUpper score: fully supervised; lower score: optimum self-supervised (X = 2)

IEEPROC, Vol. 129, Pt. E, No. 1, JANUARY 1982 13

11 PAL, S.K., and DUTTA MAJUMDER, D.: 'Effect of fuzzificationon automatic vowel sound recognition'. Proceedings of fourthinternational joint conference on pattern recognition, Kyoto,Japan, 1978, pp. 1044-1046

12 DATTA, A.K., GANGULI, N.R., and RAY, S.: 'Recognition ofunaspirated plosives - a statistical approach', IEEE Trans., 1980,ASSP-28, pp. 85-91

13 JAIN, R.: 'Comments on 'Fuzzy set theory versus Bayesianstatistics", ibid., 1978, SMC-8, pp. 332-333

Dr. Pal was born in Calcutta in 1950.He obtained B.Sc. (Hons.) in physics andB.Tech., M.Tech., and Ph.D. in radio-physics and electronics in 1969, 1972,1974 and 1979, respectively, from theUniversity of Calcutta. At present he isworking in the Electrical EngineeringDepartment, Imperial College, London,as a Commonwealth scholar '79 and ison leave from the post of computerengineer in the Indian Statistical Institute,

Calcutta. His research interests include pattern recognition andimage processing using fuzzy logic. He has published 35 papersand is a co-author of three edited books.

He is one of the reviewers of Mathematical Reviews (USA)in the fields of fuzzy sets, logic and applications, and a memberof the Computer Society of India, the British Pattern Recog-nition Association, and of the IEEE.

Book reviewProcessing of visible language 2P.A. Kolers, M.E. Wrolstad and H. BoumaNato Conference Series, Series II: Human factorsPlenum Press, 1980, 616pp., $49.50ISBN: 0-306-40576-8

This volume is the second in a series of conference proceedingsintended to bring together graphic designers, engineers andpsychologists. In fact, the 37 papers range from the nature ofwriting systems, including Egyptian hieroglyphics (Meltzer),through computer conferencing (Baer and Turoff) to philo-sophical aspects of representation (Howard). Faced with thedifficult task of knitting such diverse papers together, theeditors have grouped them in sections with alternating emphasison textual material and pictorial material. Like so many inter-disciplinary collections, overall coherence suffers because ofwide variation in the target audiences of the papers. Severaltutorial reviews lack 'a message' and are rather too basic intheir coverage. Many of the specialised papers are aimed atrestricted audiences, and make few concessions to inter-disciplinary communication.

In spite of these problems, the open-minded interdisciplinaryreader could take away some important object lessons fromthis collection, both on specific points and on the differentperspectives adopted by the various disciplines on commonissues. The main area of overlap between the disciplinesinvolves the human factors of the use of written or symbolicmaterial where user understanding is critical. In the context ofstatistical graphs, Wainer discusses the incorporation ofunnecessary graphic embellishments — 'chartjunk' — whichdetract from the simple and clear abstraction of underlyingstatistical relationships. Interestingly, Wainer, a psychologist,advocates the practical resolution of interposing graphicalspecialists between the originator of the material and theactual artist.

As an industrial designer, Doblin incorporates the samething in the guise of 'decodability' within the much broadercontext of 'getting the message across', including the designcontext, power and credibility. The central feature of Doblin'sanalysis is his categorisation of messages and his model for

analysing the effectiveness of designs; this involves thequestioning of various levels of the design and their conse-quences for the user.

A broad questioning process also underlies Wright's analysisof the usability of written material and textual literacy. Wrightis a psychologist and her questions are derived from empiricalresearch. Like Doblin, Wright advocates critical evaluation ofdraft material but, like Frase etal., she emphasises the import-ance of direct empirical evaluation by systematic pretesting ofdocuments on an appropriate sample of users.

A theme which emerges from Wright's analysis is thatunderstanding depends not only on the structure and contentof material, but also on the purpose for which someone isreading or using it. This important point recurs, in Perkinsanalysis of pictorial material in the form of matching presen-tation to the viewers, 'habits of information pickup', andShebilske's, who stresses 'the active contribution of the reader'.

The volume emphasises technological developments incomputerised presentation of text and graphics. The interestsof the technologists and psychologists reflect the current drivetowards more usable interfaces. Here, there are markedcontrasts. Baecker, a computer scientist, describes the designof human-computer dialogue as an 'art form' in which advan-tageous system features assume prominence. Most psychologistsprefer a more empirical approach. Sadly, some of the repre-sentatives of the latter, although technically sound, should beinterpreted cautiously. As Brown et al. point out in one paper,particular dependent measures can drastically influence theconclusions of experiments on character recognition. Anotherpaper by Treurniet is a case in point. This reports a study ofcharacter spacing on a VDU using a simple character searchtask. It is by no means clear that these results can be translatedinto firm generalisable guidelines.

In sum then, this is a disappointing collection, but it ispunctuated with highlights. The interdisciplinary exchange isvital and such collections can only be expected to yielddividends if these disciplines try harder to understand eachothers' point of view.

P. BARNARD

14 IEEPROC, Vol. 129, Pt. E, No. 1, JANUARY 1982

Date post:	20-Sep-2016
Category:	Documents
Upload:	sk
View:	219 times
Download:	6 times

Optimum guard zone for self-supervised learning

Documents