®PERGAMON PRESS - Stacksdn006mm9648/dn006... · 2015-10-17 · Figure3 givesatreefor...

Reprinted fromPattern Recognition Pergamon Press 1969. Vol. I, pp. 301 313. Printed in

Great

Britain

Learning Patterns in Terms of Other PatternsR. SHERMAN and G. W. ERNST

Case Western Reserve University, Cleveland, Ohio 44106

®PERGAMON PRESSOXFORD NEW YORK LONDON PARIS

(Received 9 December 1968)

Pattern Recognition Pergamon Press 1969. Vol.

1,

pp. 301 313. Printed in

Great

Britain

Learning Patterns in Terms of Other PatternsR. SHERMAN and G. W. ERNST

Case Western Reserve University, Cleveland, Ohio 44106(Received9 December 1968)

Abstract—This paper investigates the learning of patterns in terms of previously learned patterns. The mainproblem is to efficiently select from the set ofall previously learned patterns a few in terms ofwhich the patternto be learned can be described. Acomputer program for doing this is described and the importance ofits memoryorganization is discussed.

1. INTRODUCTION

There seems to be (at least) two distinct approaches to pattern recognition : In the moreclassical approach a pattern is described by a discriminant function which is a polynomialin some basic features of the pattern. In the other approach (typifiedby Banerji,' ll Evans' 2'and Ledley (4)) a pattern is described in some language specifically designed for describingpatterns. In some sense both approaches are similar because in both a description of apattern is an expression containing features of the pattern. One difference lies in the con-nectives used to combine the features. In the case of discriminate functions, the connectivesare addition and multiplication. In the other case, the connectives are the basic connectivesof the specialized language which often turn out to be logical connectives.

A more important difference is that discriminant functions contain only basic featuresof the pattern while the other description often contains "non-basic" features. The class of"non-basic" features that we are interested in is the class of previously learned patternsbecause they often simplify the description of a new pattern. For example, if letters havebeen learned, it is easy to describe words in terms of letters regardless if the connectivesare numerical, logical or whatever. Describing patterns in terms of previously learnedpatterns is particularly useful when the patterns are more complex than words. For morediscussion on this issue see Banerji," l Evans' 21 and Ledley.'4 'One difficulty is that most previously learned patterns do not facilitate the learning ofa new pattern. For example, if digits in addition to letters have to be learned, it will be noeasier (and perhaps harder) to learn words than if the digits had not been learned. Thus,if one is to learn a pattern in terms of previously learned patterns, one must be able todiscover which previously learned patterns are "relevant" to the task at hand. And, inmost cases, the number of irrelevant previously learned patterns is much larger than thenumber of relevant ones.

This paper describes a program that can learn patterns in terms of previously learnedpatterns. In the next section the task is defined in more detail and assumptions about thedata are discussed. Section 3 describes how patterns are stored in memory and Section 4describes the learning algorithms. An example is given in Section 5. The last sectiondiscusses the importance of the memory organization in the learning and recognition ofpatterns.

Mil

R. Sherman and G. W. Ernst302

1

2. PROBLEM DEFINITION

A pattern is a set of objects from some universe of objects. Each object is encoded interms of the basic features, f\,f%,-.-,fn- We distinguish between two types of objects:A simple object is an n-tuple of binary digits. The interpretation is that the object possessesfif and only if the Ith component of the object is a 1.

A compound object is inductively defined as1. a simple object, or2. a list of compound objects.

The interpretation is that a compound object consists of a number of sub-objects. Anexample of a compound object is ((101)(111)). In this case the basic features are/i,/2 ,/3 .This compound object is a list of two simple objects, the first of which possesses/,. and/3 .In general, the list of objects that comprises a compound object can contain objects thatare not simple.

This encoding scheme assumes that the preprocessor, in addition to recognizing basicfeatures, can recognize the boundaries between simple objects, groups of simple objects,groups of groups ofsimple objects, etc. This is a large assumption, but in the case of wordsit seems reasonable. Each letter is a simple object and each word is a compound object.In this case the boundary of simple objects is indicated by the space betweenprinted letters.(For handwritten words the boundaries of simple objects are not so obvious.) At the nextlevel, the boundary between words is indicated by a larger space than the spaces betweenletters in a word.

Compound objects are a convenient way to encode any input that is basically one-dimensional. However, for two-dimensional inputs, such as pictures, a more elaborateencoding is desirable which, of course, would assume more elaborate preprocessors. Forexample, see Ledley.'4 'Information is presented to theprogram in theform ofa triplet, (object) (name) (sign).For example, the input ABC CI + indicates that ABC is in the pattern CI. ABC denotesacompound object consisting of three subobjects, A, B, C. ABD C\ — indicates that ABCis not in CI. The only other (sign) is ? which is used to retrieve information. Forexample,ADE CI ? represents the question "is ADE in the pattern CI ?". The program answers

+ or - depending upon the description of the pattern it has formed.The task of learning a pattern consists of accepting inputs whose (sign) is + or -

and/or the basis of this information forming a description of the pattern. Thepatterns arelearned correctly if, when given inputs whose (sign) are ?, the program answers correctly.

The basic construct of the language for describing patterns is a test which is describedbelow. The connective for combining tests are implied by the way tests are stored inmemory, andwill be described in the next section. A test has theform (argument) (relation)(name).An argument is a list ofelements in the set {Ist,1 st, 2nd, . . . , wlh}. Therelation is =orcand the name is the name of some pattern. For example, lsl eCis a test and is true whenapplied to a compound object whose first subobject is a simple object that is an instanceof aC. The test Ist1 st = lis true of a simple object possessingfx (whose Ist1st component is a 1).The test, (l sl slh)5lh ) =0, is true of a compound object whose first subobject is a simple objectnot possessingf5 . Thus, the argument specifies some subpart of an object. The subpartcan be tested equal to some specific value or can be tested for membership in some pattern.

In passing we mention that the simple objects given to the program consisted of 15binary digits, 8 of which wererandom to simulate noise.

Learning patterns in terms ofother patterns 303

3. ORGANIZATION OF MEMORY

The memory of the program is binary tree. Each non-terminal node contains a test.Each terminal node contains a list of tests and the name of a pattern.* Toretrieve informa-tion from the memory an object-name pair is "sorted" through the tree. At each non-terminal node a test is applied. If true, the object-name pair descends the left branch;if false, the right branch. When the object-name pair lands at a terminal node all of thetests are applied. If they are all true and the name at the terminal is the same as the name ofthe object-name pair, then the program assumes that the object is in the pattern andanswers +.

An example will clarify how information is retrieved from the memory tree. Assumethat the input (10110)^1 ? is given to the program. (In this example there are only 5 basicfeatures; the inputs to the program contained 15 basic features.) If the tree in Fig. 1 is thememory then the object name pair will first go to theright node and then to left of theright

Fig. 1.A small memory tree for simple objects.

node. All of the tests at the terminal are true and both the name at terminal and the namein the input are A. Therefore, the program assumes that (10110) is in the pattern A. Theprogram answers negativelyto the following inputs assuming Fig. 1 is the memory :

1. (01110)/!?2. (11110M?3. (11010)/!?

(1) goes to the node that contains the name B. Thus, the program assumes that (OHIO) isa B and not an A. (2) goes to the same node but since the test, (1 st) =0, fails, the programassumes that the object is neither an A nor a B. (3) goes to the node that contains an Abut the test, (3rd) =1, fails.

From these examples it can be seen that memory is a number of tests and namescom-bined together by the tree structure. The implicit connective is conjunction. That is, thedescription of a pattern whose name is at a terminal node is the conjunction of the signedtests on the path to the terminal and the tests at the terminal. For example, the descriptionof an A in Fig. 1 is

(5th +1) and (2 nd =0) and (1 st = 1) and (3rd = 1).

* This

memory

organization is similar to EPAM's (Feigenbaum'3 ').

304 R. Sherman and G. W. Ernst

»

Thus, due to the fact that only five basic features are used in this example there are onlytwo A's: (10100) and (10110).

Certainly other connectives besides conjunction are needed. For example, there aremany differentkinds ofA's such as A, a, stf, etc. It would be unrealistic torequire that theyall possess the same basic features. Such patterns can be easily represented in the tree byallowing the same name at several terminal nodes. For example, the leftmost node in Fig. 1may represent the set ofsmall roman ,4's and the othernode with the name A may representcapital roman As. In general, the description of a pattern, +, is the disjunction of thedescriptions represented by terminal nodesthat contain the name, a. In Fig. 2 the descriptionof the pattern A is

((5th #1) and (2nd =0) and (1 st =1) and (3rd = 1))

V((5lh = 1)and (1 st = 1) and (4th = 1)).

Ofcourse, if type fonts vary radically, several terminal nodes may be required to representa capital roman A.

Fig. 2. A small

memory

tree for simple objects

Thus far, we have only described that part ofmemory pertaining to simple objects. Theprogram distinguishes between compound and simple objects by a special test "simple"at the top of the tree. Figure 3 gives a tree for both compound and simple objects; thus,Figs. 1 and 2 are really only the left subtree of the memory tree.

Therules for combining tests remain the same. For example, the rightmost node Fig. 3represents the fact that if an object satisfies ((3 rd $D) and (3rd c A)), then it is in the patternC2. (Hopefully, this expression simplifies to 3rd c A because a letter cannot be both an Aand a D. But, in general we cannot make this assumption.) Since the name C2appears attwo terminal nodes, the description of C 2is (3rd eD)K((3rd £D)and (3rd c A)) whichsimplifies to (3rd c D)V(3rd c A). IfAis the set ofcapital romanX's and Dis the set ofcapitalroman D's, C2is a rather trivial pattern. However, in the case of arbitrary A's and D's itis a rather complex pattern which contains, for example, cba, BXA, yAd, aDD, etc. Thus,allowing the description of a pattern to contain the names ofother patterns gives the prog-ram a powerful way to express patterns. What is more, once the effort to learn the set ofall /l's has been expended,it is relatively easy to learn new patterns, such as C2.

Learning patterns in terms of other patterns 305

I

Fig. 3. A memory tree

In some cases it is necessary to have tests on names. Consider the two leftmost terminalnodes of the memory tree shown in Fig. 4. Some objects satisfyall tests on the path to thesenodes except the test "name = A". A capital roman A is one such object. A test on nameis used to direct the object-name pair to a unique terminal node. For example, if the namein an object-name pair arriving at this test is A, then it will go to the left and otherwise to theright of that node. The reason for needing tests on names is that, in general, an object is inseveral different patterns, e.g. capital, roman, A, vowel, are some of the patterns thatcontain 'A.

I'll 4 A

memory

tree containing a name test. Cap is the name ofa pattern

We have briefly described how information is retrieved from memory assuming thatthere is some algorithm for evaluating tests. The equality tests are trivial to evaluate but


r

the set-membership tests are not trivial to evaluate. The algorithm for evaluating set-membership tests follows:

1. Find the subpart of the object designated by the (argument) of the test.2. Form the question (subpart) (name in test) ? and answer it as if it were an input

to the program.For example,assume that Fig. 3 gives memory and that the input is

((10110)(11100)(10011))C2?

After noting that the object is not simple the test 3rd eD is evaluated. Using the abovealgorithm the input (1001 1)D? is generated internally. This input through a series ofequality tests finds its way to the leftmost node where the nameat node is A instead of D.Thus, the original input goes to the rightmost node because the test 3rd eD is false. Now,the test 3rd c A must be evaluated. Another input (1001 1) Al is generated by the program.The answer to this input is + and thus the program outputs a + to the original questionbecause both names are C2.

This example illustrates the recursive nature of the retrieval process. The recursionstems from the fact that the only description of a pattern whose name occurs in a test, isone or more terminal nodes somewhere in the tree. Note that since all questions (theoriginal one and those generated internally by the program) contain names, they will goto a unique terminal node and have a unique answer. If the name weremissing then nametests, such as name = A, could not be evaluated.

In passing we note that an object-name pair is encoded as a two element compoundwhere lsl denotes the object and 2nd denotes the name. This convention simplifies theprogramming. The names can be either atomic symbols which we have used exclusivelyabove or compound objects. If they arecompound objects then the program assumes thatthey are members of some unique pattern and this pattern is the name. For example, thepattern (l sl eC) and (2nd eA) and (3rd £P) is used as the name of the set ofall capital letters.Thus, the name never occurs in an input but only an instance of the name. However,throughout this paper we have assumed that names are atomic symbols to simplify thedescription. A description of the use ofcompound objects as names is given in Sherman.' 51

4. THE LEARNING ALGORITHM

The learning algorithm consists of two mechanisms: image elaboration and treemodification. Image elaboration adds new tests to a terminal node. Tree modification addsnew nodes to the memory tree and notes that certain tests are irrelevant to describingcertain patterns. Of these two processes the image elaboration is the more interesting eventhough it is the simpler of the two. Below, both of these processes are described and theimportance of image elaboration is discussed in Section 6.

linageelaborationMost previously learned patterns will not berelevant to the description ofa newpattern.

For example, in describing patterns of letters, digits are not relevant, and since each wordonly contains a few letters, most letters are also irrelevant. Image elaboration selects afewpreviously learned patterns that may be relevant in describing the new pattern.


Image elaboration finds, for some given object, all patterns containing a subobject ofthe given object. For each such pattern found a test is formed and these tests are used asthe basis of a description for thepattern to be learned.For example, if the input aBD C2+is given to the programand it knows nothing about C2, it will use the a in aBD for imageelaboration. Since the Is' subobject is selected the tests (Is' c sm) and (l sl eA) are generated.This example assumes that the set of all small letters (sm) and the set of all A's are theonly previously learned patterns containing the Is' subobject. Other possibilities mightbe the first is a vowel and a roman letter.

These tests may not be sufficient to describe the pattern. For example, suppose thatC2is the set ofall words whose l sl subobject is aDoran A. Such patterns require the imageelaboration process to be executed several times on several different objects contained inthe pattern.

Finding the previously learned patterns containing a given subobject is quite similarto processing an input whose sign is ?. The subobject is paired with a variable to form anobject name pair that is sorted through the tree. If no name tests areencountered on thepath, thepair is sorted to a unique terminal node. If tests at the terminal are true, the object isin the pattern whose name is at the terminal. For example, if the Is' subobject is (1001 1), itgets sorted to the leftmost node ofFig. 3 and the test lsl eA is formed.

This case is the simplest because the subobject is in precisely one previously learnedpattern. Another possibility is when the subobject is in no pattern. For example, if the Is'subobject is (100001), it is sorted to the leftmost terminal node in Fig. 3 and fails the test4th = 1. Thus it is in no previously learned patterns and image elaboration is done on itssubparts. But since it is a simple object, the tests produced by image elaboration areon thebasic features, e.g., (1st I s') =1, (1 st 2nd) =0, etc. If I s' were not a simple object imageelaboration would be done on all its subparts producing tests such as, ( 1sl Ist)1 st) eA(Ist1st 3rd) eD.Sherman'5* discusses the necessity of such testsfor analogyproblemson aptitudetests.

The third and most likely case is when the subobject is in several patterns. For example,if the I s' subobject is (10011) it is sorted to the test, name =A, in Fig. 4. This test cannotbe answered because the program is not given the name but is trying tofind names. Sinceit cannot answer the tests, both branches are explored and thus the subobject goes to atleast two terminal nodes. For each of these terminal nodes if it passes the tests at theterminal node, the name is retrieved. Thus, in this example image elaboration generatestwo tests : lsl c A and Is' c cap.

Tree modificationWhen the program is given an input whose sign is + or — , the object name pair is

sorted to a terminal node. At this point, tests may be marked irrelevant or a new nodemay be added to the tree. This tree modification depends upon three things:

1. The sign of the input.2. The result of applying the tests at the terminal node.3. Whether the name in the object-name pair is the same as the name at the node.

Thus, there are eight different cases which are summarized in Table 1

;

examples are givenbelow:

Case I. The input, ABC C3+, is sorted to a terminal node containing the test l sl c Aand the name is C3. In this case the object passes all tests and the names are the same.The tree is not modified because it appears that the program has correctly learned C3.

R. Sherman and G. W. Ernst308

■

Table 1 . A summary of tree

modificatk>ns

Case 2. The input, ABCC3+, is sorted to a node containing F*

e£>,

C3. Since thenames are the same but a test

fails,

the test is marked X to indicate that it is irrelevantto describing C3.

Case 3. The input, ABC C3 — , is sorted to a node containing two tests, YeA and2nd c B and the name C3. Since all tests are true and the namesare the same, mark all testswith a Y. A V indicates conditional irrelevance because it may beremoved later, but, if not,the test is irrelevant.An example of removing a V is when ABCC3— is sorted to a nodecontaining (lsI G/4), (2nd gD V), C3. In this case, all tests that are not marked irrelevantare true,but there is a V irrelevant test that is false. The V isremoved and thenode becomes(l s, e/l)(2nd 6D),C3.

Case 4. The input, ABC C3— , is sorted to a node containing Is'eD, C3. One test isfalse and the namesare the same.The program has learned that ABC is not in C3and nomodification takes place.

Case 5. The input, ABCC3+, is sorted to a terminal node containing Is' eA, C4. AHtestsare true but the namesare different

;

thus the tree is grown by adding the non-terminalnode, name = C4.

Case 6. The input, ABC C3+, is sorted to a node containing Is'eD, CA. A test is falseand the namesaredifferent, thus the treeis grownby adding the non-terminal node Ist1st eD.

Case 7 and 8. If the sign of the input is — and the names aredifferent, the tree is notmodified because it appearsthat this information has already been learned by theprogram.

Table 1 and the above description only include the cases when there are tests at theterminal node. When an input gets sorted to a terminal which contains no relevant tests,image elaboration is evoked and the new tests are added to the terminal. If no new testsare found by image elaboration, then the tree is grown on oneof the irrelevant tests. Theseadditional cases are illustrated by way of examplein the next section.

5. EXAMPLES

Theexamples described in this section wererun on the Univac 1 107.The program wascoded in assembly language using listing processing macros. To learn patterns of lettersthe programmust first learn the letters and other basic patterns. First, the learning of basicpatterns is described and then a more complex example is described.

Basic patterns

The program does not process real data because of the difficulty involved in "pre-processing" real data. Instead, it is assumed that such preprocessing produces a 15-tuple

Casenumber

InputsignNames Tests Action

I23

456

7&8

samesamesame

samedifferentdifferentdifferent

truefalseall tests true

falseall tests truefalsetrueor false

++

++

do nothingmake test X irrelevant or growtreemake all tests V irrelevant orerase Virrelevant on false testdo nothinggrowtreeon name testsgrowtreeon object testsdo nothing


i

of binary digits for each simple object. Each binary digit represents the presence or absenceofsome basic feature that thepreprocessors look for. In general, the inputs will be compoundobjects (i.e. groups of simple objects). For such inputs, the output of the preprocessors is atree, whose terminal nodes are 15 place binary numbers.

Letters were encoded in terms of the basic features so that seven of the basic featuresgave good discrimination while the other eight gave no discrimination at all. For example,all capital roman A's had the same value for the good features while a random numbergenerator supplied the values of the other features. Thus, there are many different simpleobjects that are capital roman As. We denote these by A X ,A2 , Similarly, bx , b 2 , . . .denote small roman B's, i.e., simple objects that have the key features of a b but havedifferent values for the noise features.

The following inputs typify the inputs used to learn the set of .4's:

Sixty inputs were sufficient to find the seven key features of an A. Of the sixty, forty-eighthad a + sign.

All capital roman letters and several small roman letters werelearned in 30 sec.The only other patterns ofsimple objects that were learned were lhe set of vowels(vow),

the set of capital letters (cap), and the set of small letters (sm). This took an additional20 sec of computer time.

Complexpatterns

The learning of a pattern in terms of a basic pattern is described in detail to illustratethe mechanics of the learning process. The program learned the pattern that either the3rd letter is a vowel or the second letter is aC. The inputs to the program were:

(object) (name) (sign)1. A i B 1 E 12. G l C l A 23. C 2E 2 B 2

4. 83C3.435- C 1D l F l

6. D 2 C 3 B4

7. A 4 G3 B 5

CICICICICICI(1

+-I-

+

I

Again subscripts denote different instances ofa pattern, e.g. B2 and B 3 are different simpleobjects but are both capital B's. There are many different descriptions of CI that explainthese seven inputs. (The number of inputs was restricted for ease of exposition.) Belowwe describe how the program learned one description of CI.

simple objects) (name) (signA x A +A2 A +h AA,C x

AA

+b2-I4

.1A +


r

The initial action is to use image elaboration on A x producing the new terminal node(I s '

g

A), (l sl e vow),(1 st 6 cap), CI. Input #2 causes the first two tests to be marked Xirrelevant while input #3 causes the third test to be marked V irrelevant. At this point thenode is (1 st c A X), (l sl c vow X), (lsl c cap V), CI.

Since there are no relevant tests at the node when input #4 arrives, image elaborationon C 3 causes two new tests to be added to the node : (2nd eC) and (2nd c cap). Inputs #5,#6, #7 are cases 4, 1, 4 in Table 1, respectively; thus, no modification takes place. Sincethe pattern has not been learned yet, the same inputs are presented to the program again.

Input # 1 causes 2nd eCto be marked X irrelevant. Input #2 causes no modification(case lin Table 1). Input #3 causes 2nd e cap to be marked with aY. When input #4arrives at the terminal all tests are irrelevant. Image elaboration on A 3 causes the followingtests to be added to the node : (3rd c A), (3rd c cap), (3rd c vow).

Input #5 causes no modification (case 4 of Table 1). Input #6 marks 3rd c A and3rd c vow with A"s. Input #7 marks 3rd c cap with a Y. Again the same seven inputs aregiven to the program. At this point all tests are irrelevant and image elaboration on input# 1 produces no new tests because all three letter positions have been used in imageelaboration. Thus, the program knows that the pattern cannot be a single terminal nodeand attempts to grow the tree.

The test selected for the newnon-terminal nodes is a test marked X irrelevantbecauseone member of the pattern passes this and one member of the pattern fails the tests. Tofurther ensure the importance of the test, the programwill only use a test after it sees anew member of the pattern that passes the test. Such a test is 3rd c vow

;

it is marked withan X and input # 1 satisfies it. Figure 5 shows the new subtree thatreplaces the old terminal.The left terminal node contains no tests; the right contains the 2nd c C because input # 1does not pass this test.

Fig. 5. The subtree ofthe memory tree that describes pattern Cl

The program has learned the patternbecause all seven inputs are either case 1 or case 4in Table 1. The description of the pattern formed by the program is the set of three-letterwords whose second letter is a C or whose third letter is a vowel. This pattern was learnedin four seconds.

This example illustrates how the program attempts to represent patterns by singleterminal nodes. When this is impossible the program attempts to use two terminal nodes.Ofcourse, in this example the programwas lucky because, by growing the tree, the programwas successful. In general, image elaboration and tree modification will occur at each ofthe two new terminal nodes.


i

6. DISCUSSION

The learning of patterns in terms of other patterns is based upon the assumption thatat least some of the previously learned patterns are useful in describing a new pattern.However, this implies that pattern learning tasks should be presented to theprogram in alogical order. Forexample, if theprogramwereasked to learnwords before learning letters,its performance would be poor.

In some sense the program is adaptive because theprogram improves its performanceby increasing its repertoire of previously learned patterns. However, the human is verymuch a part of this adaptive process because he composes the training sequence, i.e. hedecides which pattern learning tasks are given to the program,and in what order. This isa flexible way to incorporate human "insight" into a pattern learner which seems to benecessary at the present state of the art. The advantage of this scheme is that a human cangive the program information about new classes of patterns merely by composing newtraining techniques.

Theperformance oftheprogramis good whenthe patternto belearned can be succinctlydescribed in terms of the previously learned patterns. In other words, the program isrelatively efficient whenrelatively few tests areneededto describe the patterns tobe learned.In terms of the program the number of tests in the description ofa pattern is the sum ofthe number of tests at each node that contains the name of the pattern. There are otherfactors that affect theefficiency, such as the number of previously learned patterns and theextent to which patterns "overlap". (Ifan object is in two differentpatterns wesay the twopatterns overlap.) However, the main factor that determines the efficiency of the programis how succinctly apattern can be described. This underscores the importance ofthetrainingsequencebecausecertain patternsmaybevery useful for describing otherpatternssuccinctly.

It is somewhat surprising the efficiency of the program is not proportional to thenumber ofpreviously learned patterns, since the program attempts to describe newpatternsin terms of these. The reason is that the program only considers those previously learnedpatterns that are "relevant" to describing the new pattern and the relevant ones area smallportion of the total. In addition the time needed to find the relevant previously learnedpatterns is not proportional to the total number of previously learned patterns.

The program considers any pattern that contains a subpart ofan object to berelevantto describing apatterncontaining the object. Forexample, in describing apatterncontainingbed, B, small and roman would be relevant because b is in all these patterns. Thus, theprocess of finding all previously learned patterns containing an object (or subobject) isa fundamental process of our learning algorithm. This process will be called FPLP(Finding Previously Learned Patterns). Since the previously learned patterns are storedin memory, the efficiency of FPLP depends upon the memory organization,i.e. the waypatterns are stored in memory.Themost interesting aspect ofourpattern learningprogramis its memory organization because the FPLP process is relatively efficient.

To see that this claim is justifiable,first consider the alternative memory organizationconsisting of a list of pairs. Thefirst element ofa pair is a nameofa pattern and the secondelement is its description (in some language). Most pattern recognition schemes seem touse this memory organization.* Its disadvantage is that the FPLP process is grossly

* Most papers on pattern recognition do not discuss memory organization because they do not consider thepossibility of having a large number of patterns stored in memory. From their description,

eg

Banerii"' andEvans,12 ' we infer that this memory organization is used.


;

inefficient The only waytofind the patterns thatcontain a given object is,for each pair inmemory,to apply the description (the second element of thepair) to theobject. Thepointis that theprocess must apply every description in memory to the given object and, thus,the time required to execute this process is proportional to the number of previouslylearned patterns. If the process finds a description that the object satisfies it cannot ter-minatebecause the object may be in several different patterns, i.e. satisfy several differentdescriptions. To make matters worse, some seemingly simple patterns, such as the set ofall A\ have quite complex descriptions because an A may be lower case or upper case,roman, script, etc

Given an object and the memory organization described in Section 3 all patternscon-taining the given object can be found much more efficiently.All the tests on the path to aterminal node can be answered except the name tests (which cannot be answered becausethe name is missing}. Thus, for each name test both branches must be explored and theobject goes to several terminal nodes. The names at these terminal nodes are the onlypatternsthatcontain thegiven object. (Note theobject alsomust pass thetests at a terminalnode in order tobe in thepattern.) Thepoint is that this processrequires a search ofa verysmall part ofmemory instead ofan exhaustive search of memory.The gain in efficiency isofcourse, a function of the number ofpatterns stored in memory.

The next question is how important is FPLP, or how often does FPLP get executed?In the case ofapattern of three-letterwords, this process is executed once for each letterin the word, if thepattern can be represented by a single terminal node(can be expressedas the conjunction of tests). The more terminal nodes required to represent the pattern,the more often the process must be executed. If the number ofpatterns in memoryis nottoo large, severalexhaustive searches of memoryfor the learning of a new pattern mightbe feasible. In

fact,

this might result in an efficient program because a simpler memoryorganizationprobably would cause other parts of the pattern learner tobe more efficient.

However, FPLP seems too central to pattern recognition to be dismissed so easily.There areother classes of patterns that use FPLP not only in learning a description ofapattern but also in applying a description to an object. One such pattern is the set of three-letter wordsin which thefirst and third subobjects are the same, e.g. ABA, DAD, BZB, etc.A member of this pattern cannot be recognized merely by testing if the first and thirdsubobjects are identical. Even the assumption that the preprocessors remove all noise(which seems unreasonable)would not be sufficient to allow testing identity of the firstand third subobjects. For consider the pattern whose first letter is a capital roman letterand whose third letter isa small roman letter and whosefirst and third letters are the same,e.g. ABa, DAd, BZb. Certainly we cannot assume that A = a, D = d, B = b.

What is meant by "first and third subjects are the same" is that they both are in acommon pattern. But alas, the name of thepattern is not known. One wayto evaluate thetest is:

I. apply FPLP to thefirst subobject2. for each name found by step 1, ask if the third subject is in thepattern.

The original test is true ifoneof the questions in step 2 is true. Thepoint is that often namesare not available and, therefore, the FPLP process is of central importance and must beefficient.

In the aboveexample it may berequired that the first and the third are the same letter,i.e. (Is1 gX) and (3rd gX)and (X g letter). Thus, if we know which patterns are letters, weonlyhave to worry about 26 patterns, instead ofall patterns in memory. But a linear search


\

through a list of26 things seemstoo inefficient for practical purposes. One waytoefficientlyprocess inequality tests, such as first and third are the same letter, is discussed by Sherman.' 51

However, it has not been incorporatedinto the program.Equality is just one relation, although a very important one. Eventually, our pattern

recognizer must be able to deal with more complex relations, such as, larger than, next insequence, etc. Since relations are sets they can be learned in the same wayas other patterns.For example, to learn the relation, next, the inputs to the program would be (A B) Next + ,(B C) Next + , (A C) Next - , etc. To use "next" in describing a new pattern requires thatwe extend our language for tests. However, the technique used by Banerji' 1 ' for dealingwith relation seems to be directly applicable.

In this paper, most of ourexamples were patterns of words, some of which may seemunreal. However, we believe that these examples are not pathological but rather typicalexamples. Even the problems on simple aptitude tests require the recognition of patternsinvolving equality and other relations such as "this shape is the same as that shape" or"this letter follows that letter in the alphabet" or "this number is larger than that number".And certainly, real life patterns, such as patterns involved in biomedical diagnosis, aremore complex than aptitude tests.

We maintain that in learning and recognizing patterns it is often the case that thepatterns containing a given object must be determined. Thus, the FPLP process must beefficient and its efficiency is determined largely by the memory organization. The programdescribed in this paper can learn patterns in terms of previously learned patterns, and theFPLP process required for learning such patterns only searches a small portionof memory.

Acknowledgements- -The authors are grateful to R. B. Banerji for his helpful suggestions throughout thiswork. This research was sponsored jointly by USA Air Force Office of Scientific Research under Grant#AF-AFOSR- 125-67 and by the National Science Foundation underGrant #GK-1386. In addition R.Shermanwas supported by an NDEA fellowship.

REFERENCES

1. R. B. Banerji, Pat. Recog. 1. 63 (1968).2. T. G. Evans. Proc. ofIFIP (1968).3. E. A. Feigenbaum and H. A. Simon. Proc. of IFIP (1962).4. R. S. Ledi.ey, J. Jacobson and M. Bei.son, BUGSYS: A programming system for picture processing not

for debugging. Comm. ACM, Feb. 1966.5. R. Sherman. A model of concept learning. Systems Research Center, Report No. SR6-68-8, Case Western

Reserve University.

Date post:	16-May-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

®PERGAMON PRESS - Stacksdn006mm9648/dn006... · 2015-10-17 · Figure3 givesatreefor...

Documents