12
PERCEPTION, PICTUREPROCESSING AND COMPUTERS
DR M. B. CLOWES*M.R.C. PSYCHO-LINGUISTICS RESEARCH UNITUNIVERSITY OF OXFORD
INTRODUCTION
The ability to interpret and respond to the significant content of pictures isshared by a large proportion of the animal kingdom and a small but growingnumber of machines. The machines (usually digital computers) will classifysimple shapes and printed letters and digits represented (by means of asuitable television scanner) as a matrix of I s and Os. Animals, and specificallyman, will do this and more; that is, not only will a human subject correctlyrecognise letters written badly but also comment, 'That's a funny way towrite a two', or 'It's a good portrait, but the eyes are a little too far apart'.No machine can as yet approach this sort of performance. We may wellask, therefore, what if anything we can learn from a study of 'machineperception' which could conceivably help us to understand human perception.The answer lies in the fact that any realistic account of the mechanismunderlying perceptual or any other human skills will be complex. The virtueof a computer lies in its ability to capture in a definite form processes ofindefinite complexity and subtlety, and moreover, to permit an evaluationof the efficacy of the proposed description by trying it out on the actualtask.This paper will be concerned with a description of a new method of
interpreting pictures, now under development. The method arises directlyout of earlier attempts (Clowes & Parks 1960, Clowes 1962) to develop letterrecognition machines for commercial use. However, it now appears thatthe approach is capable of a wider interpretation in terms both of psycho-logical observations and recent physiological evidence.
* Present address: Computing Research Section, C.S.I.R.O., Canberra, Australia.
181
PATTERN RECOGNITION
HIERARCHIES OF DESCRIPTION
Psychologists concerned with analysing complex behaviour often have toselect an appropriate level of description. For example, driving a motorcar can be characterised in terms such as 'overtaking', 'giving way', 'filtering',or at a more detailed level in terms of 'changing from 2nd to 3rd gear','turning the wheel anti-clockwise', 'depressing the clutch pedal' or yet moredetailed still in terms of the positions of the limbs as a function of time.Choosing the correct level of description might be vital to the assessment ofthe relative skills of two drivers, or the design of two different types of car.On the other hand perhaps all these levels of description should be employedsimultaneously—if so, how are they related? This question forms the basisof a book Plans and the structure of behaviour (Miller, Galanter & Pribram1960) in which it is shown that the problem arises not only for manual skills,but also in many cognitive processes and especially in language. It is inthis latter case that perhaps the clearest expression of a hierarchy of descrip-tion is evident—phonemes, syllables, words, phrases, clauses, sentences.and it is also the area in which the most progress has been made in answeringthese questions.
LINGUISTIC GRAMMARS
The problem is to formulate a reasonably precise set of rules by means ofwhich we can take an utterance, e.g., 'the boy kicked the red ball' and, by
S = NP VP1
S2 NP VP1
VP =V + NP
VP1
V + ANP
NP = T+N
ANP = T+N1
N1 A+N
V •• Icioked,/rolled/puehed/ • • • •
,T = the/a/every/ . OO ••••
N = man/boy/ball/ • • • • • • ••
A = re d/young/stripea/
FIG. 1. A set of ordered substitution rules specifying the grammar of asimple sentence.
182
CLOWES
applying the rules in the prescribed manner, can obtain a description of thisutterance at a number of different levels.Let us confine our attention to three levels: words, phrases, sentences.
Fig. 1 shows a set of rules sub-divided into numbered groups (1, 2, 3, . . . , 6).We can regard each rule as defining a permissible substitution. Thus
group 6 rules state that we may replace any of the words red, young, striped(i.e., any adjective) by the symbol A, and similarly for verbs, nouns andarticles. So that the sentence
the boy kicked the red ball
becomes by application of group 6
(6)TNV T AN
Group 5 states that a pair of symbols A, N appearing contiguously and in thatorder can be replaced by N1 so that the string becomes
(5) T N V TM
and so by application of successive rules, in descending order of their groupnumber, we obtain
(4) NP V ANP(3) NP VP'(2) NP V132(1) S2
If we now combine these different strings of symbols into one diagram(Fig. 2) we have a description of the utterance at a number of levels in afashion which indicates the relation between the levels.
7The boy kicked
/ANPNAt
the rod ball
FIG. 2. The structural analysis of a simple sentence asobtained by applying the rules of FIG. I.
The rules may be used to generate or analyse sentences. A typical genera-tive sequence starting from the schema S2 would be S2 NP VP2 T N
the N VP, -4 the boy V132 the boy V ANP ... -4- the boy kicked thered ball. The generative path can be diagrammed in tree form as in Fig. 2.This diagram—a so-called tree—indicates not only that the utterance is a
sentence, but also what type of sentence it is (S2 rather than S1), as well asindicating what features (i.e., phrases like ANP, etc.) it had which caused
183
PATTERN RECOGNITION
the label S2 to be applicable. Let me hasten to add at this point that thisis not presented as a serious piece of linguistics (there are grounds forbelieving it to be incorrect in several ways) but merely as an introductionto the notion of a set of rules (the grammar) and a procedure for applyingthem (a parsing algorithm) by which we can rigorously derive a structuraldescription of a complex object like a sentence. A mechanism or mechanismsof this type undoubtedly underlie our ability to understand sentences.
PICTURE PROCESSING
Comparable mechanisms for processing pictures have not appeared to anysignificant extent either in Artificial Intelligence or in Psychology. Insteadthe emphasis has been on classifying pictures, i.e., upon the equivalent ofdeciding the type (S1 or S2) of a sentence The possibility that a structuraldescription of a pictorial object is necessary has only recently emerged inArtificial Intelligence (Kirsch 1964). It appears to be crucial to the automa-tion of many picture processing tasks (e.g., interpreting bubble-chamberphotographs, recognising fingerprints). That perceptual behaviour involvesdescription as well as classification could be supported by innumerableexamples beyond the two already quoted in the introduction to this paper.However, it seems to have received scant attention in recent psychologicalliterature. This may well be because overtly descriptive (rather than classi-ficatory) behaviour has a large verbal element. There is a strong temptationto avoid this 'complication' by designing experiments which merely requiresimple YES/NO responses and which have, therefore, no significant verbalcomponent.
It follows from the foregoing remarks that our own objective is thedevelopment of a formal (i.e., rigorous) theory of picture processing whichaims to provide a structural description of pictorial objects as well as aclassification of them (wherever appropriate). The particular system I shalldescribe is by no means the only possible approach, both Ledley (1964) andNarasimhan (1964) have proposed alternatives. I shall illustrate the tech-nique in terms of a limited class of pictures—the handwritten numerals.There are several reasons for choosing this class. (1) They are (hopefully)relatively simple objects; (2) at least two levels of description naturallyoccur, e.g., the symbol 7 is readily described as 'a horizontal stroke above adiagonal stroke. . .'; (3) psycholinguistic skills, e.g., reading (with which theUnitt is primarily concerned), involve pictorial objects of this type.
A PICTURE GRAMMAR FOR HANDWRITTEN NUMERALS: NOTATION
The substitution rules in the linguistic grammar (Fig. 1) contain on theirright-hand side (typically) two elements, e.g., T, N or NP, VP, linked by arelational operator '+' meaning 'followed by'. This operator expressesthe notion of contiguity as it arises in a string of symbols; contiguity in a
t MRC Psycholinguistics Research Unit, University of Oxford.184
CLOWES
geometrical sense is the more general relationship 'next to'. Thus a simplepictorial property like 'Edge' (i.e., boundary between black and whiteregions of the picture) might be defined as 'black area next to white area'.Specifically we might define a North Edge as 'white area above black area',which can be expressed in a rigorous (computable) form ̀with the aid of asuitable notation. For example, in a pictorial notation the definition ofNorth Edge for a picture composed only of black or white elements mighttake the form illustrated in Fig. 3. This asserts that the point arrowed
FIG. 3.
will be said to have the property N EDGE if it and the points to the left andthe right of it are black (i.e., True) and the three points denoted '0' are white(i.e., False). If we introduce a Cartesian co-ordinate system we can writethis definition as a Boolean expression:
N EDGE (x, y) = PICTURE (X — 1, y) & PICTURE (x y)& PICTURE (X + 1, y) PICTURE (X — 1, y + 2)
PICTURE (X, y+2) PICTURE (X + 1, y + 2) (1)
where (x, y) is the 'arrowed' point, a black point has the value True and awhite point the value False.The assignment of a label (e.g., N EDGE) to a picture point involves a
decision based upon the state of a number of neighbouring points. Fig. 3is a picture of the decision criterion. The values of only 6 specified pointsare considered. All other points are ignored, for the purpose of assigningthis label to the arrowed point.
IMPLEMENTATION
This notation is used in the picture analysis program with some smalldifferences for reasons of economy and ease of programming. Fig. 4 shows asystem of such definitions as they are input to the computer. The first twolines specify various program options relating to print out, etc., and alsodefine the size (48 x 48) of the input picture. The third line is the definitionfor N EDGE given above (1), but specified in Reverse Polish (i.e., parenthesisfree) form.t Repetitions of the label PICTURE have been suppressed and thelabel N EDGE appears at the end of the definition rather than preceding it.The x, y appearing on both sides of the expressions (1) above have alsobeen omitted, it being understood that this rule is applied to the picture(Fig. 5(a)) for all values of (x, y) so as to derive another picture called (Z)N EDGE (Fig. 5(b)). The axes have been rotated so that xis measured down the
t The Reverse Polish form of the Boolean expression (A & B) v C & D) isA, B, &, C, D, &, v.
185
FIG. A A par
tial
'picture grammar' fo
r handwritten nu
mera
ls.
-F CCWANNIL
02BY02.
4oLINEYYX4OBY411
4,000ENNL•026Y02
Each group of rul
es is preceded by the con
trol
instruction
Z106CTURE.(00,00) NOT
8(00.01)
NOT
AND
0000021 NOT
AND
.602600)
AND
002,01) AND
.102021
AND
2(02.01)ZN
EtnE
xo4e
.de
zpic
TuRF
.(oo
goo)
NOT
.600801)
NOT
AND
6600,031 NOT
AND
.602,001
AND
8602001)AND
66020026
AND
2(02,01)2
BLOSX00002
zPitTORE.(00.00).(01,00)
AND
6602,00)AND
.6000021 NOT
AND
.102,021
NOT
AND
6601,02)NOT
AND
0(01,00)ZE E00EX024,00
ZPICTUNE.(00,001.601000)
006.00)2
Emu000.72
AND
.6020001
AND
.100,021 NOT
AND
002.02) NOT
AND
.601,021
NOT
AND
zPicTuRE.c0o,00).00.0t)
2(00
,01)
25
E00E
x024
.411
zPic
TuRE
.(oo
poo)
.(oo
,01)
200,
01)2
eL
olm0
0,72
AND
AND
.600.021
.(00,02)
AND
AND
.602,00) NOT
.602,001 NOT
AND
AND
.602.011 NOT
.602,01) NOT
AND
AND
.(02,02)
.602.02)
NOT
NOT
AND
AND
ZPICTURE.(00,00) NOT
.101,00)
NOT
AND
.602,00) NOT
AND
.600,02)
AND
.606,02)
AND
0602.02)
AND
2601,02)2W EOGEXWks00
ZPICTURE.(00.001 NOT
.101.001
NOT
AND
.602.00) NOT
AND
000002)
AND
.601802)
AND
0602,02)AND
2106,0262
SL00X000.72
ZPICTURE.(00,021.601.01)AND
.602,00)
AND
.102.03) NOT
AND
.603002)NOT
AND
6604,06)NOT
AND
2(01.01)ZSE EDGE/6024,24
ZPICTURE.600,02).(01,01)AND
.102,00)AND
.602.031 NOT
AND
003,02) NOT
AND
.604.01)NOT
AND
210101)2
0LORX000,72
ZPICTURE.100.02).101,03)AND
.602,041
AND
0602,011 NOT
AND
.103.02)NOT
AND
6104,031
NOT
AND
2(01,03)Z3W EDGFX024,72
ZPICTUNE.(00$02).601/031
AND
.602,04)
AND
.602,011 NOT
ANO
6603,021
NOT
AND
.104003)NOT
AND
2601,03)Z
DLOSX000,72
ZPICTUREe(00.03) Not
$101.021
NOT
AND
6(02/01) NOT
AND
.104802)
AND
6603,03)AND
.60:04)
AND
2603,03)264M EDOEX041024
ZPICTURE.(00,03) NOT
4601,021
NOT
AND
.602,06) NOT
AND
004,02) AND
.603,03)AND
.60204)
AND
2(0303)2
409%00002
ZPICTUREe(00,01) Not
.101802)
NOT
AND
*102,031 NOT
AND
9602,001
AND
.603,01)
AND
.60402)
AND
2(03,06)264E E00EX040.72
ZPICTUNE.6001,01) NOT
.601,021
NOT
AND
.102,031 NOT
AND
.602000)
AND
.603801)AND
.604,021
AND
2603.0162
403%00002
NOLLIND00311 ts11131Idd
FIG. 4.
(continued)
e000ENNLA.02BTOZ
ZW
EDOE.(02.01)ZE EOCE002.01)002,02) OR
AND
z nowol,00)
NOT
AND
.(00.02) NOT
AND
2(02.01)ZN
ENDX048$60
NOT
ANO
4101803) NOT
AND
4(0041
ZN
EDGE.(01002)ZS EDGEA(01,02)002/02) OR
AND
z oL
oe.c
olso
o, N
OT
NOT
AND
of03$01) NOT
AND
2(01#02)ZW
ENDX048.12
AND
002,00) NOT
AND
000,01
ZN
EDGEs(0)1,00)ZS
EOGEo(01,00).(02s00) OR
AND
Z
SL0B/J(00801)NOT
AND
001,021 NOT
AND
002002
NOT
AND
.107,01) NOT
AND
2(01•00)ZE
ENDX0415/00
ZW
EDGE.(00,01)ZE ECCE.(00.01)000.02) OR
AND
Z
B1.O8o(Olo00)
NOT
AND
.(01/D3) NOT
AND
2(00.01)21
EN0A040$41
NOT
AND
41020011 NOT
AND
002,02
ZNW EDGE.(00002)ZSE EDGE.(00.02).(01,03) OR
AND
Z
BLOW00,00)
NOT
AND
.(01.03) NOT
AND
2(00,02)ZSW ENDX04802
NOT
AND
.101,01) NOT
AND
002,02
s...
.ZNE E0GE.(021,03)ZSW EDGE.(02.03).(03,02) OR
AND
z no
s.(0
3,00
) NOT
NOT
AND
111(00.03) NOT
AND
2(02.03)2NW
£1401048.36
AND
4(02001) NOT
ANO
e101,02
00
-.4
ZNW EDGE41(02/00)ZAE EDGE.(02.00)003.01) CR
AND
2
nos.
(oo.
00)
NOT
AND
11(03.03) NOT
AND
2(02,00)ZNE
ENDX048.84
NOT
AND
otOlp01) NOT
AND
41(02,102
ZNE E00E.(00001)ZSW E00E.000.01).001,00) OR
AND
Z
81.08e103,001
NOT
AND
.(01/02) NOT
AND
2(00.01)280 ENDX048.24
NOT
AND
4(02001) NOT
AND
1,101,02
ZN
EDGE.(00,00)ZS E00EA(00.00).(01,00) OR
AND
ZOO 1.INEW0601100
ZNW EDGE.(00.00)ZSE E00E.(00,40).(01$01) OR
AND
Z45 LINEX090/36
ZW
EDOE.(00.00)ZE
E00E.(00,00).(00,01) CR
AND
Z90 LINEX060#24 .
ZNE E009.(00sOl)ZSW E00E.(00,01)001800) OR
AND
2135LINEX060.12
ZE E00E.(00.00).(01/00) AND
9(02,00) AND
2(01$00)ZE
EDGEX036.00
ZS EDGEA(00o00).(00.01) ANO
.100,02) AND
2(00,01)ZS EDOEXO36,24
ZW
E00E.(00•00).(01.00) ANO
61028001 ANO
2(01000)ZW
000E1036,48
ZN
E06E.(00.00)41(00,01) AND
.(00,02) AND
2(00$01)ZR
EDGEA034.T2
ZSE E00E.(00002).(0)$01) ANO
.102,001 AND
2(0l.0))ZSE ED0EX076$12
ZSW E00E.(00000).(01,01) AND
.102,02) AND
2(01,01)23W n0E7(026406
ZNW E00E.(00.02)A(0).01) AND
.102,00) AND
2(01,01)ZNW E00EXO3b$60
ZNE EDGE.(00#00).(01,01, ANO
002,02) AND
2(01,01)214E E00E1036,6A
FIG. 4. (c
onti
nued
)
6000ENNO)024702
290 L1NEs(00/00).(01,00) OR
ZNW
ENDe(00/00)ZN
END/(0O/00) OR
ZNE END/COO/001 OR
ANO
ZN
LINIX071/00
ZOO LINE/(00000)/(0O.01) OR
2NE
END.(o0.01)zE
orm.(oo.011 OR
1st RNo.(oosol) OR
AND
2(00/01)2Z LIN
X072/12
Z90 LINE.(00/00).(01/00) OR
25E END.(01000)ZS
END/1019001 OR
ZSM
END/101/00) OR
AND
2(014,00)ZS LIN
X072/24
ZOO wir.toopoo).too.ol) OR
25W
ENO.(00,00)ZM ' END/(00/00) OR
ZNM
END.(00.00) OR
AND
ZW
1.11)8X072041
ZOO LINE/C00/01)245 LINE.(00/00).(01/00) OR
AND
ZWLINE.(00.02)/(01802) OR
AND
2(00/01)2N
CURVX072/48
Z90 LINE/(01/01)Z125L1NE.(00/00)/(004101) OR
• AND
245 LINE/(02/00)/(02.01) OR
AND
2101.01110
CURYX072,90
ZOO LINE/(01/01)Z135LINE/(00800).(01/00) OR
AND
245 LINE.(00/02).(01/02) OR
AND
2(0)/0112S
OURVX072,72
290 L1NE/(010001245 LINE/CO0/001/(0001) OR
AND
2135LINE.(02/00)/102.011 OR
AND
2(01.00)29 CURYX07244
ZN
E00E/(OO/00)2S E00E.(00.00).(01,00) OR
AND
200 UNEX060.00
ZNW E00E.(00/00)ZSE EDGE/C0O/0010019011 OR
AND
245 LINEX000,72
ZN
E000.(004,00)ZE
EDGE.(00/001.(00801) OR
AND
Z90 LINEX060.48
...
ZNE E00E.(00.011ZSW EDGE/C00/0))4(01/00) OR
AND
Z135LINEX060/24
00
200 L1NE.(00.00).(00.01).100.021 AND
AND
2(00,011200 L1NEX080/00
00
245 LINEe102/00)/(01,011.00/02) AND
AND
2(01/01)Z45 LtNEX000/72
290 LINE.100/001.(01/00).(02/00) AND
AND
2101/00)290 LINEX060/80
Z125LINE6(00,00).101/01).(02/02) AND
ANO
2(01/01)Z135LINEX060/28
*C00ENNL•028702
.100/00)22ERO
xleo.00
litoo,00,zoNo
9180,09
245 LINE/1(02/01)2S
LIMP1)(00000).(01/00)
OR
AND
ZN
OURV.(00.01)1E.
CORV/(01(02)OR
AND
ZE
LIMB/103/01
/10202) OR
AND
2(02,01)Z7M0
o(001,011)ZTHREE
X100,27
.9180/10
.(oo.00)zrouR
9150.36
.coo,00izrivE
9i50,45
0(00/00)251Y
9180,54
.(0000)zsEvEN
9100/63
o(00/00)ZZIONT
X180/72
.(00.00)2NINE
xiso,al
NOLLIN90.)111 N1131,1Vd
A 2.05? 2
.012
3456
7591
1234
5675
7212
3456
7873
1234
5675
,412
3456
7895
1234
5675961234567577
1234
567075
1234567599
12345.
2810
2URE
000 00
0. r
001 0 01.
002 00
2.00
3 003.
004 004.
005 0050
006 006.
007 00
7.00
5 00
8.009 00
9.010 010.
011 01
1.012 012.
013 01
2.014 01
4.01
5 01
5.016 01
6.01
7 017.
- 015
015
.019 01
9.02
0 020.
021 02)-
022 02
2.023 023.
024 02
4.02
5 025.
026 02
6.02
7 02
7.02
5 02
5.029 02
9.03
0 03
0.031 03
1.032 03
2.033 03
3.034 034.
035 035-
036 03
6.03
7 03
7.03
5 03
4.03
9 03
9.04
0 040.
041 04
1.042 042-
043 04
3.044 04
4.04
5 0450
046 04
6.
4567
0134
5670
1212
3456
7012
3012345670123
7012
34670123
67012
6701
6701
670
670
122
123
123
0123
0123
67012
4567
012
4567
0112345610
123456
7012
345
7012
37012
67012
6701
6701
5670
1234
5670
1234
S40
6701
2345
4967
0123
456
45670123 567012
4567
0123
4567
ig
56701
45
6701
567
5670
.6
.00 000
.01
001
.02 002
.03 003
.04 004
.09
005
.06 006
.07 00
7.oe 006
.09
009
.10 010
.11
011
.12 01
2.13 013
.14
014
.15 01
5.16 016
.17 01
7.15 01
5.1
9 019
.20
020
.21
021
.22 0 22
.23 023
.24 024
.25 02
5.
26 026
.27 02 2
.25
020
.29 029
.30 030
.31
031
.32 032
.33 0 33
.34 034
.35 0 35
.36 036
.37 0 3
7.35 0 28
.39
039
.40 040
.41
041
.42 04
2.43 043
.44 044
.45 045
.46 0 66
047 04
7.
.47 0 47
A L0
37 2
.01234567891122e5e1ee21230678,3122e067eeet2245e1s921224901,61224567597123e5e759e122e5ele9el234e.
FIG. 5(a
). The picture to be ana
lyse
d. The ori
gin of coordinates i
s the
cell
immediately beneath t
he Z of
Z PicruRE. The pic
ture
is def
ined
to be only 48 x 48 elements hence in
the
direction (down the pag
e) we can
only pr
int
lines 0 to
47. Each 'bl
ack'
elem
ent is represented int
erna
lly as a bin
ary di
git.
For display purposes, however, the
'bla
ck' elements are
numbered (on a sca
le of 8) in
the
horizontal di
rect
ion.
A LO
ST 2
.012345678911234547892123454789312345478941234567895123456789412345478371234567898123454789912345.
2
BLOB
000000.
.00000
001001.
.01
001
002002.
234
.02002
003003.
12345
.03003
004004.
7012345
.04004
005005.
701 45
.05005
006006.
70
45
.06006
007007.
70 2445
.07002
008008.
1234
.08000
009009.
0123
.09009
010010.
701
.10010
011011.
70
.11011
012012.
7 1
.12012
013013.
6
12
.13013
014014.
6
234
.14014
015015.
6701234
.15015
016016.
7 23
.16016
017017.
.17017
018018.
.18018
019019.
.19019
020020.
.20020
021021.
.21021
022022.
.22022
023023.
.23023
ZE
EDGE
ZSE EDGE
ZS
EDGE
UV EDGE
000024.
.24000
001025.
.25001
002
026..
.26002
003027.
53
.27003
004020.
512
23
44.28004
005029.
501
.29005
006030.
05
0
5.30006
007031.
045
70
45
77
.31007
008032.
234
34
.32008
009033.
123
2.33009
010034.
t1
.34010
011035.
0.35011
Fio. 5(b). The outcome of the fir
st group of rul
es when app
lied
to Fi
g. 5(a). Note tha
t the 'origin
cell
' of the
array
ZN EDGE is
at th
e ab
solu
te location x = 48, y = 48 as stated in th
e picture grammar.
NOLLI4DODTh N1311Vd
FIG. 5(b). (continued)
012 036-
013 037.
2014 038.
015 039.
4
70
4
701 4
67 23
016 040.
3
7
3
2017 041.
018 042.
019 042.
020 044-
02* 045.
022 04ô-
023 047. 21
1 EDGE
ZOO EDGE
ZN
EDGE
ZAE EDGE
000 048.
.48 000
001 049.
.49 001
002 050.
234
.50 002
002 051.
12
1234
45
.51 002
004 052.
701
0
5
.52 004
.... 1..■.
005 053.
7
45
70
.53 004
006 054.
7
4
4
.54 006
007 055.
7
234
23
.55 007
008 054.
12
1 .54 008
009 057.
01
1 .57 009
010 05111.
70
70
.58 010
011 OGT.
7
.59 014
012 040.
7
7
1 .60 012
013 061.
6
6
12
12
.61 013
014 062.
6
34
23
.62 014
015 063.
6
.63 015
016 064.
.64 016
Cl? 065.
.65 017
018 066-
.66 018
019 067.
.67 01,
020 068.
.45 OGO
02L 069.
.69 021
022 070.
.70 022
023 071.
.71 023
A LOST 2
.012245478411234567892123456789312345678441234567895123456789612345678971234567898123456789912345.
.36 012
.37 013
.38 014
.39 015
.40 010
.41 017
.42 018
.43 019
.44 020
.44 024
.44 022
.47 023
V)
A tosT2
.012345478911234547892123454789312345478941234567805123454789612345678971234547898123456789912345.
ZE
EDGE
ZSE EDGE
ZS
EDGE
25o EDGE
ZO
EDGE
ZNO EDGE
ZN
EDGE
ZNE EDGE
000036.
.36000
001037.
5-3700:
002038.
6
o 0
.38002
003039.
6
0 2
3
1.39003
004040.
12
01
.40004
005041.
3.41005
006042.
.42
pos
007043-
4
3.
43007
006044-
.44004
009045.
.45009
010044.
.44
DIO
011047.
.47011
000
ZE
END
ZW
END
ZSE
END
ZNW
END
ZS
END
ZN
END
ZSO
END
ZNE
END
048.
.48
.49000
001
001049.
-50002
002
003
004
050.051-3
052.
.51
.52
.53
003
004
005
005
006
007
053.
054.
055.
6.54
.55
-56
00S
007
008
000054.
.57009
009
010
011
057-
058.
059-
.58
.59010
011
Z00 LINE
EI331.I1E
290 LINE
245 LINE
000060.
.60000
001061-
5
2
- 0
.61
001
002062.
6
70
.62002
003063.
5
3 6
IZ
.63003
004064.
Cl
.64004
005065-
34
.65005
006064.
.66006
007067.
4
1.41'007
008068.
.68008
009069.
.69009
010070.
-70010
01I071.
.71011
S 1.05,
2.01234547891123456789212345678931234,478441234567895123454789612345618971234567898123454789912345.
FIG. 5(c). The outcome of the second group of rules applied to the contents of Fig. 5(
b).
mouthl000ni N1131.LVd
A LOST2
.01234567891122456,892123456769212345678941234567895123456789612345678971224567898123456789912345.
A r 0ZOO
LINE
zloeLiNE
290 LINE
245 LINE
000060.
.60000
en u)001061.
2
.61
001
002062.
2
.62002
003063.
.63003
004064.
.64004
005065.
.65005
ZN
LINB
ZE
LIMB
28
LIMB
ZW
LIMB
ZN
CURV
ZE
CURV
ZS
CURV
ZN MIN
000072.
2
.72000
001073.
1 7
.73001
002074.
04
002
003075-
i
.78003
004076.
.76004
005077.
.77005
A LO
ST2
.01234567891123456,892122456,89212345676041234567095123456713,612245676,71224567898123456749912345.
FIG. 5(d). The outcome of the third group of rules applied to the contents of Fig. 5(c).
A LOST 2
.012345678911234567892123456789312345678941234567895123456789612345678971234567898123456789912345.
ZZERO
ZONE
21110
!THREE
ZROUR
ZrIWE
251X
22EVEN
2EIGHT
2NINE
000 leo-
meo 000
001 181.
3
.41 001
002 182.
-82 ooz
A LOST 2
■0
12
34
56
78
91
12
34
56
78
92
12
24
50
09
21
23
45
67
09
41
23
45
67
89
512345678961224567897123456,890123456789912245.
FIG. 5(e). 'Recognition' as obtained by applying the fourth group of ru
les to the contents of Fig. 5(d).
Note that
the array size is now 3 x 3 elements. In principle more than one figure could have been presented in the initial Z
PICTURE. Their relative position and identity would have been reflected in this display.
PATTERN RECOGNITION
page and y across it, following the numbering appearing in the picture
frame. Following the label (Z) N EDGE a further pair of coordinates (048, 48)
specify the location in computer memory at which the resultant picture (Z)
N EDGE is to be stored. The first group of definitions in the grammar expresses
eight EDGE definitions corresponding to eight points of the compass, and a
further definition (Z) BLOB which merely collects together the results of all
the EDGE definitions in a single picture called (Z) BLOB. The next set of rules
in the grammar contains concatenations of EDGES to define LINES having
four orientations 00, 45, 90, 135 and line ENDS with eight orientations. These
in turn are concatenated to form CURVES (combinations of appropriately
oriented lines), LIMBS (combinations of lines and ends) and LINES (combina-
tions of lines). Each set of rules in the picture grammar is separated by a
line beginning + CODE. . . .
The action of the program is to apply each set of rules to the outcome of
the previous set starting with a picture stored in computer memory (cf. the
linguistic grammar). The performance of the program is monitored by
having it print out those portions of its memory which currently contain a
picture after each set of rules has been applied. The area of memory is—in
effect—a matrix of 192 x 96 elements. At any one stage only a portion of
this is in use, the portions being allocated by the coordinates associated '
with picture name given in the grammar, i.e., Z (N) EDGE (048, 48) is stored
starting from location 48 (down), 48 (across). Thus the output for the first
set of rules is as shown in Fig. 5(b).The size of the initial picture (Z) PICTURE is 48 x 48 cells, the EDGE versions
are, however, reduced in size to occupy only 24 x 24 cells. The necessity
for this reduction in size is intimately connected with the form of picture
grammar we have chosen to implement. A feature of the linguistic grammarwas that substitutions were always defined for pairs of adjacent symbols,nowhere do we find a rule which calls for the replacement of symbols widely
separated in the string being analysed. In spite of this immediate constituentor phrase structure restriction it is still possible to reflect with a single
label (S) the structure of an extended utterance, because as we proceed with
the analysis the string automatically shrinks by the very nature of the
substitution operation. In the geometrical analysis we perform on a picture,
the same immediate constituent limitation is imposed. If, however, we are
able to label an extended property of a picture, e.g., a long line, with 'next to'
concatenations, we need to draw together these local properties—hence this
reduction in the size of the EDGE versions of the picture.
We can summarise the outcome of this 'parsing' process in a differentform by collecting together the contents of several pictures, e.g., ENDS and
LINES (Fig. 5(c)), and combining them into a single picture with a suitablepictorial convention to indicate the type of picture from which the entry
originates (Fig. 6(b)). The results of the next level of description (Fig. 5(d))containing CURVES, LIMBS and LINES can similarly be displayed pictorially
(Fig. 6(c)).
194
CLOWES
They show that the program has adequately labelled the main features ofthis '2' and for the purposes of recognition we can define each numeral asbeing some combination of features. For a '2' this definition will involve a
curve, a diagonal and a horizontal; and the last group of rules (Fig. 4)contains such a definition. Comparable definitions for the other numeralscan be written.
Definitions of this type do not, strictly speaking, constitute a part of the
grammar, and it is with the adequacy of the grammar that we are, at this
point, concerned. Fig. 6(c) shows that certain regions of the '2', e.g., theintersection of the diagonal and the horizontal, are unlabelled. This arises
from the adoption of 'lines' as syntactic categories: we are now studying a
grammar having the same form but concerned exclusively with EDGES.
45670134567011
12349670113012 33333 0113701114 113670123 12367011 12367111 01136701 0123670 67012640 4s470r2
4367011/349670123496
7011343301133011
•701167016701
96701134967012364494701134349670113496436701/3 967011496,01 333333 II96701 436,0196/ 5670
(a)
—
7/i/
KEY
00 LINE — I umB45 UNE / E LIM5
90 UNE I N CURV
S END E CURV )
E END > 45UNE /
(b) (c)
Fro. 6. A pictorial 'summary' of the results of the analysis.(a) The input picture Z PICTURE. (b) The contents of Fig. 5(c)super imposed and coded according to the convention given inthe KEY. (c) The contents of Fig. 5(d) similarly encoded.
RELATION TO PHYSIOLOGY
It is possible to compare the organisation of this system with the organisa-
tion of the visual system, as revealed by microelectrode studies in the cat(Hubel & Wiesel 1962, 1965) and the frog (Lettvin, Maturana, McCulloch &
Pitts 1959). Briefly the following points emerge:(1) Cells in the visual cortex only respond to local properties of the visual
scene, e.g., edges, line segments. This mirrors the immediate constituentconstraint imposed for economic reasons in the picture grammar.
(2) The visual scene is portrayed many times over in the cortex in terms
of the different picture properties, each portrayal being laid-out' two-
dimensionally and 'in register' with all other portrayals. This corresponds
to the sets of arrays which are also preserved 'in register', in the computer.
(3) In addition to cells portraying the local properties of correspondingregions of the retinal image, there are also cells (so-called 'complex' cells-
195
PATTERN RECOGNITION
Hubel & Wiesel 1962) which indicate whether a particular local propertyoccurs anywhere over a small area of the retinal image. Such cells areprecisely what one would expect to find mediating the 'reduction' of scale(and resolution) injected into the picture grammar.The detailed nature of the correspondence means that we can usefully
consider the possible significance of discrepancies. For example, there aregrounds for believing that the picture grammar is premature in beginningwith EDGE definitions. In the visual system a considerable amount ofprocessing—so called 'lateral inhibition'—precedes EDGE operations and isperformed in the retina. Conversely we may ask of the physiologists whatmight correspond to certain features of our picture grammar which aredesigned to provide size invariance. This type of detailed cross-fertilisationis in many ways quite novel, at least with this subject matter.The notation can also be used to express physiological results, e.g., the
properties of a receptive field unit.
RELATION TO PSYCHOLOGY
Ultimately, the psychological relevance of this form of picture analysiswill be in its ability to derive structural descriptions of pictorial objects whichreflect our intuitions of their shape. To a limited extent the labelling alreadydemonstrated achieves this: it will be more convincing, however, if we can,for example, show that the same grammar assigns adequate descriptions toobjects which are generically different, e.g., fingerprints, chromosomes. Itcan already be shown, however, that by virtue of the way in which it computespicture labels, the system has some interesting weaknesses. Given a numericalvalue for the notion of local property, e.g., that no definition can refer toarray elements separated by more than 10 units, the system is unable tojudge the positions of an object relative to some frame, to better than 10 percent (i.e., 1/10). This limitation appears in many forms of sensory judgment(Miller 1956). Again the system is unable to 'pick out' (i.e., uniquely label)one member of a large matrix of similar objects, e.g., a matrix of A's. This isa familiar experience. We overcome the limitation by 'tracing' or countingalong a line with finger or pencil point. Significantly this use of a pointerchanges the structure of the picture in a way which would also greatly helpthe computer system.
CONCLUSION
We can now return to a general remark made in the introduction. Theessential characteristic of the system we have described is the volume ofoperations it involves. To calculate by hand the structural analysis illustratedearlier would occupy perhaps 2-3 hours for each numeral and wouldinevitably contain many mistakes. The validation of a picture grammar interms of a whole alphabet and subsequently for other classes of picturewould, therefore, be unthinkable except by computer. In that we already
196
CLOWES
sense directions in which our approach requires further complications dueto over-simplifications like assuming pictures to be only black or white, theclaim that psychological theories are necessarily complex, and require a
computer to test them, will need no further elaboration. In one way it isunfortunate that Artificial Intelligence has been so called, since it suggests
that it is concerned with problems quite different from those traditionally
studied by the biologist. In fact, however, both disciplines are concerned
with the study of very complex systems and collaboration is becoming
increasingly fruitful.
ACKNOWLEDGEMENTS
I would like to acknowledge with gratitude the use of the computingfacilities at the Culham Fusion Research Laboratory, U.K.A.E.A., theassistance of Douglas Brand and Barry Astbury, and valuable discussions
with John Marshall, Professor N. S. Sutherland and Professor R. C. Oldfield.
REFERENCES
Clowes, M. B.(1962). The use of multiple auto-correlation in character recognition.Optical Character Recognition. Fischer, Pollack, Radack & Stevens, Eds. Pp.305-318. Baltimore: Spartan Books, Inc.
Clowes, M. B., & Parks, J. R. (1961). A new technique in character recognition.Comput. J., 4, 121-126.
Hubei, D. H., & Wiese!, T. N. (1962). Receptive fields, binocular interaction andfunctional architecture in the cat's visual cortex. J. Physiol., 160, 106-154.
Hubei, D. H., & Wiesel, T. N. (1965). Receptive fields and functional architecturein two non-striate visual areas (18 & 19) of the cat. J. Neurophysiol., 28, 229-289.
Kirsch, R. A. (1964). Computer interpretation of English language test and picturepatterns. Instn. elect. Engrs. Trans. on Electronic Computers, EC-13, 363-376.
Ledley, R. S. (1964). High speed automatic analysis of biomedical pictures.Science, 146, 216-223.
Lettvin, J. Y., Maturana, H. R., McCulloch, W. S., & Pitts, W. H. (1959). Whatthe frog's eye tells the frog's brain. Proc. Inst. Radio Engrs., 47, 1940-1951.
Miller, G. A. (1956). The magical number seven. Psych. Rev., 63, 81.Miller, G. A., Galanter, E., & Pribram, K. H. (1960). Plans and the Structure of
Behavior. New York: Henry Holt and Company, Inc.Narasimhan, R. (1964). Labelling schemata and syntactic descriptions of pictures.
Inf. Control, 7, 151-179.
197