A Comparison of Classification by Two Versions of the Method of Generalized Portraits

106 IEEE TRANSACTIONS ON SYSTEms, MAN, AND CYBERNETICS, JANUARY 1974

A Comparison of Classification by Two Versions o o oof the Method of Generalized Portraits * *

T. F. SCHATZKI o oAbstract-The modified method of generalized portraits is compared * e

with the original method. Particular attention is paid to the existence ofsolutions for the two methods and the shape of the resultin decision Asurfaces in the training space. One finds that solutions always exist in the..*original method, regarless of the placement of the training vectors. Theindecision region is a hyperplane, which may, in extreme cases, cover the 1 .o* Sentire test space. In the modified method a solution exists only if theomembers of one class can be encapsulated in a hyperellipse to the ex- 1.cusion of the other class. The effect of stepcoding is investigated. Theresults are displayed by means of two-dimensional examples. .

DIRECT AND MODFED MemoDsWe have recently [1] described a modification of the method of |

generalized portraits [2], (3] and have shown there the underlying | . !Xgeometry, proofs required, and implementation of the modifiedLmethod. In this correspondence, we will compare the direct (original)and modified methods with regard to assumptions made and the_classification rules which result. The latter will draw on two-dimen-sional examples which we display. The notation is the same as in [1]. E F

Either method attempts to separate members of C. and C.' (=notC.h)in E. The separation is achieved by finding a surface * x = 1 in |an augmented space E", with n' > n. In the direct method n' = 2n *and E2" is obtained from En by replacing every vector (xl, - - ,x1) by 0 o(xi,- ,x,l- xi,- **,1 - x), with xi = or 1. The vector O(S) is a 000linear combination of a subset S of the training set 7T; the algorithm to 0 ,.find the correct subset So is described in [1]. -=(So). Under certain I| *conditions, linear inseparability can be proven during the execution of G Hthe algorithm. This was shown in the appendix of [1]. We shall wish to * CC, >1consider the role this theorem plays in the direct method. We shall also 0 e.LC2=wish to display the shape of q * x = I in E" in this case.

It is useful to first prove three Lemmas for the direct method. All x Fig. 1. q, decision function using stepcoding.vectors refer to vectors in E2n. All sums go over j.Lemma 1: If S is a set of linearly independent vectors {xJ} and g . x = I in E2l and replacing Xk+n by I -xk throughout. This yields

x' = E ajxJ, then t6(S) * x' = I aj. This follows directly by multi- a hyperplaneplication since O(S)-xJ = 1 in each right-hand term. (Lemma 1 n napplies to the modified method as well.) E (k - k+n)Xk = 1- E k+Xk.Lemma 2: If x' = I ajxJ, then F aj = 1. This follows for the sum k=1

of any component k and its augmented counterpart is 1 Strictly speaking, 0 * x = I is a dichotomy of the corners of the n-cube

I=Xkf + xk =x aj(x,,J + x,',.) = Eaj. in the direct method. To obtain a surface, we have here relaxed therequirement that Ix'l = Jn in E2n, maintaining solely that xi+, =

Lemma 3: The rank of the training set in E2" is n + 1 in the direct I - xk' while allowing continuous components. We may call this themethod. This follows from the fact that the augmented form of the null continuous direct method. The resulting augmented points fall on avector in E", (0,0,-- -,0,1,1,-- -,1), plus the n unit vectors (0,0,..., plane of dimensionn embedded in E2n which intersects the hypersphere0,1,0,0,- - -,0,1,1,* - -,1,0,1,1, - -.,I) span El' (the rank of En' is n + Iin the modified methods as well, of course). of radius vn at those points for which xk = 0 or 1. It is for this reasonWe see now that the direct method always yields a feasible solution that the geometry presented in [I] is of little application here

which may be constructed using any subset S which spans T. By CONVERSION OF CONTINUOUs To BINARY VARIABLESLemmas I and 2, we are assured that $(S) - x' = 1, for all x' e (T - S),while #(S) * x' = 1, for x' c S, by construction of O(S). Of course, Although a continuous version of the direct method is feasible, thesuch a solution is of little use since it classifies no member of T. S of direct method has commonly been applied to bivalued components.rank < n + I is desired. In practice the direct method will resolve all Now many physical variables used as input components to a patterndifficult separation problems by increasing the rank of S as much as recognition scheme are by their nature continuous. A one-bit repre-required, increasing the indecision region. We shall see this in examples sentation, such as replacing a variable by I if its value exceeds a thresh-later. The modified method, on the other hand, indicates inseparability hold and by 0 otherwise, is frequently inadequate for classificationprecisely at this point; the theorem of [1] proving inseparability applies purposes. It was suggested (as early as 1964) that such componentswhen S is of rank n + I and q(S) * x' fails to satisfy might be included in the (direct) method of generalized portraits by

using stepcoding [51.1 In stepcoding a number of threshholds areO(S) * xl 2~1, if x' e- C,, c T O1(S) * x' <5 1, if x' e- C,,' r T.

Furthermore, the shape of the decision function 1 - x = 1 in ER is Stepcoding involves breaking each continuous variable xi into a set ofof interest for the direct method. We obtain this directly from writing rt ranges and then replacing xi by r, - 1 binary variables as follows. Letxi be ranged by the limits

Manuscript received March 9, 1973; revised July 27, 1973. = - X X , Xg('-1), Xgt'J = +00,The author was with the Shell Development Company, Emeryville, i.e., if x,(k-)'< x; < x1t5, xi is in range k. Now replacexi by a sequence

Calif. He is now with the Western Regional Research Laboratory, Agri- of r - k zeroes followed by k - I ones; e.g., if the limits in a certaincultural Research Service, U.S. Department of Agriculture, Berkeley, variable are x = -10,- 5,15,100 then rt = 5 and if, for a certain pattern,Calif. 94710. xJ = 28, then for that pattern k = 4 and xEJ becomes 0111.

CORRESPONDENCE 107

.cI _2*5>I 0 .C1

0 0 °u:z

0C 00 C

A BA B

c D c D

E F E F

|0 0 0L* f( , 02.xp>1 {cES "I>8+.° 'C, 0 2v-A1 0 'C2 0lXl 2%x A1

Fig. 2. 02 decision function using stepcoding. Fig. 3. Recognition using stepcoding.

chosen for a variable, each threshhold is then represented by a separate of C,, strongly influences the size of the unassigned region. The assign-bit, adjacent threshholds coffesponding to adjacent bits. ments made in the case of a bimodal distribution, as in D, are quiteThus each variable xi in the training space is mapped to a corner unacceptable on a distance criterion.

of an r, cube, adjacent corners corresponding to adjacent ranges of xi. We may compare these results with those obtained by the modifiedA vector in the training space En is mapped to a comer of an n cube, method. Cases B, C, D, F, and Hare illustrated in [1, fig. 3]. In general,with 1L r; = m >> n. for unimodal distributions as in B, C, and H both methods agree inWe wish to study the decision function #4 * x = 1 which results from the large; in D and Fthe modified method fails, indicating overlapping

stepcoding followed by the direct method. i * x is a hyperplane in E'. distributions. From inspection, we see that the modified method wouldIn the original En space . * x >, <, or = 1 corresponds to an assign- fail for A (barely) and G (completely) while in E the upper left-handment of al hyperrectangles (to C., C.' or neither) and 0 * x = 1 is not corner would have #1 * x > 1. Of course, when the modified methodnecessarily a plane. It would be satisfying if one could use the preceding finds linear separability it leaves only an elliptical curve unassigned.lemmas, plus the transformation rules implied in stepcoding, to prove As suggested by Vapnik et al. [2], [3], we can go further. Lettinggeneral properties regarding the shape of # . x = I in En. We have C2 = C., we can solve for 02 with the results shown in Fig. 2. Somenot been able to do so and from the examples we have tested we believe of the comments we made with respect to unpredictability of thefew general statements can be made. All that can be said is that the decision region, etc., might be repeated here, in particular on compar-regions for which # * x = 1 and for which 0 . x > I are not unreason- ing, say, Fin both figures. However, combining #1 and 02 and assign-able in a distance sense in view of the training examples given)We now ing a test pattern only if it is included in exactly one class C,(#, * x > 1,display some examples which led us to this conclusion. On' * x < 1), yields reasonable results (Fig. 3). This is referred to asFor visual reasons we chose n = 2, assuming each component to be "recognition" in Vapnik's terminology. Surprisingly good results are

bounded in the interval [0, 0.9]. We divided the interval into nine obtained in cases such as F. Yet very large regions are clearly notranges and stepcoded each variable separately. From this m = 16 and assigned but should be, as in A, B, and H. Using the same recognitionthe direct method operates in E32. We thus classified the squares of a approach for the modified method we realize that solutions are only9 x 9 checkerboard by using the information that a limited number of possible when C1 and C2 are unimodal and nonoverlapping in E2, assquares are assigned to one class C1, or another C2. Letting C, = C. in E and H. Assignments would agree with the direct method in Eand solving for # = #1 we evaluated *1 * x for each square and while in case H the modified method would divide the square approx-tentatively assigned test squares to C1 or C.' (= C2) according to the imately from lower left to upper right, a more reasonable assignmentsign of #1 x - 1. We show some examples in Fig. 1 where we illus- than that found in Fig. 3.trate a number of different C, and C2 "distributions." We see that for We mentioned previously that the success of the direct methodsome unimodal distributions, like B, C, H, and possibly F, quite depended heavily on stepcoding, i.e., adjacent ranges were assignedreasonable assignments are made with only smal areas of indecision adjacent corners of the m cube. We may illustrate this by renumbering(tl * x = 1). On the other hand, the nonassigned areas in A, E, and G the ranges in the previous examples by a binary code, i.e., the nineare disappointingly large, particularly in the former. A comparison of ranges are now numbered 00001 1,0001I 10,00101101,- - *,10010110,A, E, andH indicates that the parcular arrangement of the members in that order. We obtain the results shown in Fig. 4, clearly a totally

108 IEEE TRANSACIONs oN SYSTEMS, MAN, AND CYnnNrICS, JANUARY 1974

in most cases (as judged by two-dimensional examples), but hardly* 111 l o * * l predictable in any rigorous sense. The attachment of any scalar to the

* l l lresults, in particular, seems unreliable.

0 0 ~~~~~~~ACKNOWLEDGMENTThe author would like to thank his colleagues for valuable discus-

sions and suggestions. In particular, Dr. Robinson and Dr. Vogiatzis,E who suggested the use of n' = n + 1, and Lemmas 2 and 3. He is also

001 1 grateful to Prof. R. R. Meyer of the University of Wisconsin, for-0o reading and commenting on the manuscript.

REFERENCIES[1] T. F. Schatzki, "Hyperelliptic decision surfaces from the method of

generalized portraits,' IEEE Trans. Syst. Man, Cybern., vol. SMC-3,pp. 428-433, July 1973.

[21 V. N. Vapnik and A. Ya. Lerner, "Pattern recognition using generalizedC D portraits," Automnat. Remote Contr., vol. 24, pp. 709-715, 1963.* Dcgsion Funmtion [3] V. N. Vapnik and A. Ya. Chervonenkis, "On a class of perceptrons,"

Automat. Remote Cont,., vol. 25, pp. 103-109, 1964.o [4] J. D. Robinson and J. P. Vogiatzis, Shell Development Co., Houston.Tex., private communication.

[5] V. N. Vapnik, A. Ya. Lemer, and A. Ya. Chervonenkis, Eng. Cybern.,vol. 1, pp. 63-77, 1965.

An Analysis of Cut-Off Rules for Optimizationd Algorithms

0 HARVEY J. GREENBERG AND LAWRUENCE W. T. LOH

0 DAbstract-The problem of optimal stopping has long bee of intrest02 D.cision Function to various scientific disciplines. Most notably, the approacbes taken have

o ', O ,.V,, Or= 2c been based on sequential decision theory and modeling techniqes.., ,C ,1A1 +-x0-1L .+.............. =l

m 0,.A<I, for 02A>1 Aiming at optimization algorithms, we propose in this paper to combine

Fig. 4. Decision functions using binary coding.the present approaches and study the cut-off rule problem, using asymp-Fig. 4. Decision functions using binary coding. totic convergence behavior of sequences. A unifying framework for thedesign and analysis of stopping rules for algorithms generating a monot-onically convergent sequence is presented while methodology for optimal

unacceptable classification scheme. The "reasonableness" of the step- design is discussed. The concavity structure should be observed to playcoding approach thus results from the quasi-isometric aspect of the an important role in our analysis.transformation, as we surmised.

I. INTRODucrION/BACKGROUNDSUMMARY This correspondence is a first step towards the analysis of cut-off

We have presented two approaches to the algorithm of the method rules for optimization algorithms. When do we stop searching for aof generalized portraits [2], [3]. Both start with a training set in El and better answer than we have already computed? We are confronted withdefine additional components in terms of those already given. In the an enigma, having complexities that include basic notions of "good-resulting E"', a linear optimal separation is then sought. The methods ness." Can we obtain more effective rules? Is a rigorous analysis reallydiffer in two important respects. In the direct (original) method n' = possible or useful? Anyone who has considered this issue must have2n, components being defined by x*+" = I - xk. The resulting surface faced such questions. Our answer is yes, but we make no pretense inis a hyperplane in En and can always be found. Cases of linear in- having solved the problem. However, we do provide a framework forseparability are handled by placing the hyperplane so that difficult such analysis, and we describe certain design principles with specificpoints are placed in the boundary plane. This plan may, in difficult examples.cases, be of high rank, and in the extreme include every training and There have been a variety of approaches taken, though most studiestest point. In the modified method n' = n + 1 with ='±t = 1. do not treat the cases we shall describe. Breiman [1] provides a reviewThe resulting surface is a hyperellipse, provided it exists. In the case representing general approaches. We shall use one aspect, namely,of linear inseparability in E"' the algorithm terninates indicating such optimal design within a class. The (semi-) Martingale theory ashyperellipses encapsulating one class to the exclusion of the other do not exemplified by the work of Chow and Robbins [2] does not captureexist. If the hyperellipse is found it allows the assignment of a scalar to many ofour aims with regard to simplicity and model realism. Robbins'each member of the test set which yields a measure of the "amount [4] use of sequential sampling (akin to Wald's [5] method of inference)of inclusion within the hyperellipse" which is suggestive of fuzzy sets. is more relevant to our goals, but his work is concerned with a differentThe methods also treat continuous variables in different ways. The issue than we consider here. The work of Randolph [3] pertains to a

modified method simply requires that variables be bounded in the unit specific technique which does not address the problem we pose.square. The direct method may also be applied to continuous bounded Thus while there have been studies lending important background tovariables (a variation we have termed the continuous direct method) in this study, there does not appear to have been a framework upon whichwhich case it becomes but another algorithm for finding hyperplanes.However, the method was designed for bivalued variables. When con-tinuous variables are converted to binary ones by stepcoding (the most Manuscript received March 21, 1973; revised July 24, 1973.

The authors are with the Computer Science/Operations Research Center,reasonable approach) the classification of test vectors seem reasonable Southern Methodist University, Dallas, Tex. 75275.

Date post:	24-Sep-2016
Category:	Documents
Upload:	t-f
View:	216 times
Download:	2 times

A Comparison of Classification by Two Versions of the Method of Generalized Portraits

Documents