Optical iconic filters for large class recognition

Optical iconic filters for large class recognition

David Casasent and Abhijit Mahalamobis

Approaches are advanced for pattern recognition when a large number of classes must be identified.Multilevel encoded multiple-iconic filters are considered for this problem. Hierarchical arrangements oficonic filters and/or preprocessing stages are described. A theoretical basis for the sidelobe level and noiseeffects of filters designed for large class problems is advanced. Experimental data are provided for an opticalcharacter recognition case study.

1. Introduction

Advanced artificial intelligence, symbolic, and otherprocessors required to operate on large knowledgebasesl'2 need techniques to handle a large number ofobject classes. We consider pattern recognition appli-cations when the number of object classes to be identi-fied is large. Our approach can be applied to logicprocessors (in which the input is a query) and to sym-bolic and associative3 processors. However, patternrecognition offers a more easily defined problem, andthus we pursue this specific application. We employan optical character recognition (OCR) case study ex-ample to quantify and demonstrate remarks and re-sults, since such a data base is easily available. Muchrecent pattern recognition research has addressed al-gorithms to achieve distortion-invariance, i.e., recogni-tion of geometrically distorted versions of an object.46

In this paper we consider large class problems in whichthe number of different objects is large. Incorporationof distortion-invariant techniques into the filters wediscuss can further broaden their use. Since the filterswe discuss operate on input image pixel representa-tions, we refer to them as iconic filters.7

Section II describes our OCR data base, and Sec. IIIreviews several basic iconic filter synthesis algorithms.In Sec. IV we advance a theoretical analysis of theeffect of the number of training images and objectclasses on the output sidelobe level and the noise sensi-tivity of iconic filters. Section V describes several

The authors are with Carnegie Mellon University, Department ofElectrical & Computer Engineering, Pittsburgh, Pennsylvania15213.

Received 10 October 1986.0003-6935/87/112266-08$02.00/0.

@1987 Optical Society of America.

systems to achieve large class recognition without theiconic filter problems associated with large trainingsets of data. Experimental data are then provided(Sec. VI) to quantify and demonstrate all major pointsadvanced.

II. Data Base and Case Study

As an easily obtainable data base we selected recog-nition of the 62 characters (26 lower-case and 26 upper-case letters, plus the 10 number digits) in a variety offonts. We obtain 80 X 80 pixel images of the 62 char-acters from 15 different magazines: Time, ScientificAmerican (Scienam), Datamation (Datama), Busi-ness Week (Busweek), etc. We will refer to the fifteenversions of each character as fonts (although they rep-resent different point sizes of each character as well).In our experiments, we will view these as in-class varia-tions. Font identification can be achieved by othermethods.8 Our filters are thus designed to be able toprovide the recognition of each character independentof the input font, but without the requirement to iden-tify the input data font. This choice also allows us testdata that are not present in the training set used tosynthesize the filters. Figure 1 shows several charac-ters from three of the magazines to demonstrate thesimilarity and differences in the fonts present in ourdata base.

Ill. Iconic Filter Synthesis

The basic filters considered are extensions of onetype9'10 of distortion-invariant matched spatial filterswith attention to our present application. For com-pleteness we review three types of these filters andthree classes of filters possible. This section also al-lows the terminology to be defined.

We denote objects in one class by {fn} and objects in asecond class by gn). The members within each classare generally different 3-D geometrically distortedversions (e.g., aspect views) of each object. In our

2266 APPLIED OPTICS / Vol. 26, No. 11 / 1 June 1987

B

c

Fig. 1. Typical characters from three different publications: (a)

The New York Times; (b) Datamation; and (c) Scientific American.

OCR application the members within each class will bedifferent font representations of each input character/object. We denote vector versions (e.g., lexicographi-cally ordered images) of the objects by fn and gn andthe filters designed by hk (all are 2-D images, or vec-tors). When fn and gn are similar (such a filter torecognize one class must also have information on theother class), we specify a filter h so that

(fn * h) = 1, (9n * h) = 0(1

for all n, where ( ) denotes the vector inner productoperation fTh. We restrict all filters to be linear com-binations of all training set images

N1 (N1+N2)

h(x,y) = anfn(xy) + E angn(xsy) (2)n=1 n=N1+1

For N1 images in 1f n} and N2 images in 1g9}, the N2 + N2

coefficients an define the filter function. The coeffi-cient vector a and hence the filter function h are thesolution of V a = u, where V is the vector inner productmatrix of the data set, and u = ul = [1... 1, 0... ]T is

set by Eq. (1) to yield 1 outputs for all N images inclass one and 0 outputs for all N2 images in class two.The filter is thus specified by

a = V-1u,. (3)

To recognize 1g9n and reject If,}, the control vector ul inEq. (3) is simply changed to [0 ... 0, 1... 1] T, and a newa set of weights is determined.

A multilevel filter with outputs equal to one for classone objects and two for class two objects can easily befabricated using the control vector ul = [1... 1, 2... 2,3 ... 3]T in Eq. (3). As shown, extensions of this filterto more than two classes are possible. Binary-encodedmultiple filters can also be employed. In this case theoutputs from the filters define a digital word (e.g., 10,01, 11, for the case of F = 2 filters) that denotes theobject class (e.g., if the outputs from the two filters areboth 1, the code word is 11 and the input test object isin class three). Synthesis of these filters uses the samebasic technique in Eq. (3) with different u controlvectors.

INPUT h t~L P2 L2 . P3

SCENE I

FILTER DECISIONI I ~~~~~~~~NET

GENERATION (SYMBOLIC)

Fig. 2. Multichannel frequency plane correlator with F = 4 iconicmatched spatial filters for large class pattern recognition."

For large class problems we propose the use of multi-level multiple iconic filters (specifically F filters with Loutput levels). The output from such a system is nowan F-digit word (one output/filter) and is thus capableof representing LF different states or object classes (inpractice LF - 1 states are obtained since the all-zerostate can also occur for no input object). Prior work onsuch filters has shown quite promising results. How-ever, attention has been given to their distortion-in-variance and no more than four object classes havebeen considered for use in such filters.

Three different classes of such iconic filters can beidentified.10 The filters described above are projec-tion filters since the formulation specifies only thecentral or peak value in the correlation of h and theinput object. For many object classes (especiallywhen the total number of training images NT is small),control of the central peak value in the correlationfunction allows sufficient performance and speciallylow sidelobe levels. We address this issue in detail inSec. IV. For cases when the sidelobes for one objectclass are larger than the peak values for other classes(or larger than the value at the center of the correlationfunction for the same object class), correlation filterscan be used. These filters1 0 use shifted versions (typi-cally four) of each training set image to control theshape of true correlation peaks (i.e. they specify afixed value at the center of the correlation function andzero values at id, pixels away, horizontally and verti-cally). These filters require five times the number oftraining images that are needed in the projection filter,and hence NT effects for these filters will be worse.The best peak to sidelobe ratio (PSR) in the outputcorrelation pattern is obtained with a PSR iconic fil-ter.10 The disadvantage of this filter is that its peakvalue cannot be specified. Thus since multilevel en-coding is not possible with such a filter, the number ofclasses that one can accommodate using multiple PSRfilters is significantly reduced.

These three filters are typically used as the filters ina frequency plane correlator. Figure 2 shows the clas-sic frequency plane correlator with four frequency-multiplexed filters at P2 and four output correlationplanes at P3. These F = 4 correlation planes are readout in parallel in raster format in synchronization.From the F = 4 digit output word obtained for each

1 June 1987 / Vol. 26, No. 11 / APPLIED OPTICS 2267

pixel location in the output correlation planes, theclass or category of each region of the input image atP,can be obtained.'1 The use of more than four parallelcorrelation planes is generally prohibitive, and thussuch an architecture can accommodate LF = L4 objectclasses. To accommodate large class problems, multi-level filters (L > 2) are thus essential.

These filters can also be applied to associative mem-ories as detailed elsewhere.' 2 The classic system isshown in Fig. 3. Here the input 1-D vector data x atP1describes an input object, and the F filters at P2 are thecolumns of the associative memory matrix M. The P3output vector v is the F-digit encoding of the inputobject from which one can decode the object into amember of one of LF classes.

IV. Large Training Class Effects on Iconic FilterPerformance (Theory)

In numerous tests of the iconic filters described inSec. III we noted that the performance of the projec-tion and correlation filters degraded (i.e., large side-lobe levels occurred) as the number of training setimages NTwas increased. For our large class problemsof present concern NT will also be large, and thus thisissue is of significant concern. Thus we now addressthis issue theoretically for the case of correlation iconicfilters. Solution of large matrices that arise in largeclass problems can be addressed by advanced tech-niques and is not of immediate concern here. Theanalysis is simplified by considering the Fourier trans-form of the correlation plane. Specifically, we consid-er the average (or mean) and scatter S of the magni-tude of the Fourier transform of the correlationfunction. The average pt value equals the peak value inthe correlation plane (this follows from Parseval'stheorem)

2f(i)h(i) = (1/M)2F(k)H*(k), (4)

where f and h are 1-D sequences, and F and H are theirFourier transforms, and the summation is over thenumber of pixels M in each image. We thus write theaverage for an input image k and a linear combinationfilter h (described by coefficients a) as

I = E[H*Fh] = Z E[a.fl*Fk =E anUk = k, (5)n n

where Vn denotes element (k,n) of the matrix V, and ukis element k of the control vector u in Eq. (3). Thescatter S in the Fourier transform of the correlaton is ameasure of the ripple or sidelobes present in the outputcorrelation plane. Using Eq. (4) and the filter synthe-sis of Eq. (3), the scatter is shown to satisfy

S = E[IH*Fklj] -,2 > EEaavnVkk 2n m

S > vkh(aTV a) s2. (6)

We now consider how S varies as the number oftraining images NT increases. Since the matrix V issymmetric and positive definite, we decompose it andeasily show

M

Pi P2 P3

Fig. 3. Multiple iconic projection filter associative processor sys-tem.

where n are the eigenvalues of V, and a are positiveconstants. The term Vkk in Eq. (6) is positive (sincethese diagonal elements correspond to the autocorrela-tion of positive images). Similarly An 0 in Eq. (7)since V is a positive definite matrix. Although theterms anAn in Eq. (7) are positive, the values of theindividual an and An change with NT. Hence for in-creasing NT the sum in Eq. (7) [and hence the scatter inEq. (6)] may increase or decrease. It can be shownthat

A a2X = cNT,

where c is a positive constant. This sum clearly in-creases with NT and is an upper bound on Eq. (7).Thus the scatter S in Eq. (6) (and hence the correlationplane sidelobes) increases as the number of trainingimages increases. Extensions of this theoretical treat-ment to the various other classes of iconic filters yieldthe same trend for the correlation sidelobes and thescatter S to increase with NT.

In numerous tests we also observed (when moretraining images were used) that the dynamic rangerequirements of the filter and its noise requirementsbecame more severe. We now advance a theoreticalbasis for this effect. We consider the average !LF andthe scatter SF of the pixels in the filter image (denotedby the subscript F). The average and scatter nowconsidered apply to the image plane representation ofthe filter function and not the output correlationplane. As SF increases, the variations in the pixelvalues in the filter image itself increase and hence sodoes the number of levels required in the filter imageand also the effects of noise (we will demonstrate thisexperimentally in Sec. VI). The mean of the filterimage is

YF = E[h] = EE anfn]= anE[f,,] = E anvn/M, (8)n n n

where a linear combination filter is again assumed, andwhere the last equality in Eq. (8) is obtained by esti-mating E[fn] by vnn/M, where Mis the number of pixelsin the image. This approximation is realistic for ourbinary images, where Vnn is the dot product of image fnand itself. From Eq. (8), the mean of the filter is thusseen to be proportional to the sum of the diagonal V

2268 APPLIED OPTICS / Vol. 26, No. 11 1 June 1987

2aTV a = Y an\n, (7)

weighted by the a linear combination filter coeffi-cients.

Proceeding similarly, the scatter is found to be

SF = E[h] - E2 [h]

= (1/M) annm(1-Vnn/M)]

SI CORRELATORS PROJECTION ONLY

INPUT PSR _ PROJECTION CHARACTERS ANDTEXT ' FILTER(S) FILTERS LOCATIONS

(9a)

WHERE TO LOOK(PEAK LOCATIONS)

(9C)

For cross products vnm we have used a similar estima-tion for the expected value E[fnfm] = vnm/M. Thesecond double summation in Eq. (9b) does not includen = m. The final relation in (9c) assumes 1 -VnnM

1 and (Vnm -Vnnvmm/M) vnm. These approxima-tions are valid for our OCR character example, wherethe average auto projection value is Vnn = 100, and theaverage cross projection value is Vnm 50, and thenumber of pixels per image is M = 6400. From Eq. (7)we see that SF in Eq. (9) increases with the number oftraining images NT. This increases the filter's dynam-ic range. As we quantify in Sec. VI, this makes theeffect of noise more significant in filters synthesizedfrom a large number of training images NT. In Sec. Vwe advance various ways to reduce NT and yet achievelarge class recognition.

V. Large Class Solutions

In this section we advance several solutions to thelarge class recognition problem with attention to thedegraded performance of iconic correlation filters ex-pected when a large set of training set images is used.In Sec. VI we advance experimental verifications ofmany of the suggested solutions. We note that ourtheory in Sec. IV applies not only to correlation filters,but also to projection filters if one does not look only atthe correlation peak point. If projection filters areinterrogated at the peak point only, the only limitationon NT is in solving the synthesis Eq. (3). We will usethis fact in several of our suggested solutions. Figure 4shows the block diagram of a hierarchical iconic filtersystem." The first stage of this processor employsmultiple PSR filters in a shift-invariant correlator.The purpose of this first stage is only to locate candi-date objects in the input field of view. The filters usedare designed with this in mind, and thus they do notprovide discrimination information. To provide en-hanced detectability, PSR iconic filters are preferablefor this stage of the processor. The second stage of theprocessor can employ multiple correlation or projec-tion filters in the same processor. These filters allowlarge class identification (when multilevel outputs areprovided), but they can have large sidelobe levels. Byusing the outputs from the PSR correlator in the firststage to determine where to look in the output correla-tion planes from the second stage, sidelobe effects canbe avoided. In Fig. 4, we show a projection filtersecond stage, since it allows LF class identification with

WHERE AREOBJECTS

SPECIAL CHARACTERSCASES

Fig. 4. Block diagram of a hierarchical iconic filter system for largeclass pattern recognition.

F filters and with a simpler processor such as that ofFig. 3. This filter (and its associated matrix) alsorequires fewer training set images (less by a factor of 5)than are needed in the correlation iconic filter synthe-sis. An additional stage with correlation filters is of-ten preferable in such a system, since some false peakswill occur in the first-stage processor, and the investi-gation of these points using only projection filters willforce some object class decision for all regions of inter-est in the input scene (detected by the first filter stage).Error correlation'3 is another solution that can allowprojection filters to be used directly without an addi-tional stage of correlation filters to remove false regionof interest peaks from the PSR filter.

Another modification to the system of Fig. 4 is toperform feature space analysis in windows around thecandidate region of interest areas indicated by thePSR iconic filters in the first stage. When F featurespace discrimination functions are used and encodedin an F output L-level manner, a larger number ofclasses (LF) can again be identified and classified. Ifwe restrict analysis to only the central value of theoutput from the projection filters, these filters are inessence feature space linear discriminant functionsthat can operate on image pixel data (iconic filters) oron image features with equal facility.

In cases when the object size is known or can bebounded, the window around each region of interestimage area can be set and simple techniques can beused to place the object in each region of interest intoone of several super classes (e.g., one of 4 sets of 16characters each). For the OCR case we have foundsimple object histograms and the number of pixels inthe character and in different parts of it to work quitewell to provide such super-class separation. Such in-formation then allows the use of separate filters, eachoptimized on the smaller super class of possible objectsand each with significantly fewer NT training images.We have demonstrated iconic multilevel multiple fil-ters in which the object class is known and the purposeof the filters is to determine the object orientation.12

This represents yet another extension of this hierar-chical filtering concept.

For a specific problem (such as OCR) other informa-tin is available such as: letters lie on lines with regular


+ 2 E Eanam(nm -Vnnvmm/M)]n n

(1/M)aTV a.

Table 1. Correlation Plane PSR = jilS for Multilevel Multiple IconicCorrelation Filters as a Function of the Number of Object Classes.

Number oftraining images NT Correlation plane

(5/class) PSR

10 2.0420 1.9840 1.7660 1.48

100 1.52200 0.98400 0.04930 0.006

Table 11. Filter Image Plane Scatter SF and Largest Pixel Value as aFunction of the Number of Object Classes NT for Different Multilevel

Multiple Iconic Projection Filters.

NT SF (scatter) Maximum pixel value

2 0.02 0.0515 0.03 0.0625 0.18 0.1035 0.35 0.2275 0.78 0.82

115 0.87 0.95130 0.89 0.96150 0.92 1.29170 1.05 1.62190 1.16 1.66248 1.33 2.31930 18.10 9.90

spacings dependent on the font of the input data. Forthis case we find that simple horizontal and verticalprojections can locate lines of text and isolate theletters on each line. In this case the center of eachcharacter can be determined quite simply with such asimple preprocessing step.

A related issue of concern is training set selection.In many cases attention to this issue can significantlyreduce NT. As an example we refer to our OCR casestudy with 15 fonts of each character available. Wemust select at least one image of each character. How-ever, not all 15 fonts/character are required to be in-cluded in the training set. To select the fonts to beincluded we look at the cross correlations of each andselect those with the smallest vector inner productmatrix entry vmn. This ensures us of the most newinformation for each additional training set image cho-sen. If the separation between output levels in a mul-tilevel filter is AL, we select vmn < 0.5AL as a usefulguideline to determine when to include a given fontimage in our training set. In Sec. VI we show quantita-tive, data on the ability of iconic filtrs to recognizecharacters in new fonts not included in the training setdata.

VI. Experimental Results

To obtain a quantitative estimate of a number ofobject classes one can include in a correlation filter,multilevel multiple iconic correlation filters were com-

puted with one object (character)/class or font and fiveshifted versions of each (thus NT/5 equals the numberof classes and fonts). For each case, gi and S of the FTof the correlation plane were calculated. The resul-tant PSR = u/S is listed in Table I. Assuming PSR 21.5 is required, we find that only NT = 100, or 20 objectclasses, could be included in one OCR correlation fil-ter. We note that we have found that this value is*much less for characters than for other objects, andthus OCR appears to represent a worst-case guideline.

To quantify the effect of NT on the dynamic range ofthe filter and its image plane variance, we computedthe mean ,UF and scatter SF in the filter's image formultilevel multiple projection iconic filters with dif-ferent numbers of training images used (with one im-age/class and with NT now equal to the total number ofobject classes or fonts). These data are shown in Ta-ble II. In Table II we also include the value of thelargest pixel in the iconic image plane filter. We notethat the scatter (variance of the pixel values in thefilter) increases with NT. The maximum pixel value inthe filter image increases with NT. The number offilter image pixels with large values also increases withNT. Thus more dynamic range or gray levels are re-quired to represent filters synthesized with large NT.Also, when noise is present, if the noise changes one ofthe large-valued (or key) image pixels, this will have amuch larger effect than if other image pixels arechanged. Since the number of such key pixels andtheir relative significance increases with NT, we expectnoise effects to become worse for large class filterssynthesized from a large number of images. We nowquantify this result and the amount of noise allowable.

The filters considered in subsequent tests were syn-thesized from 62 characters with 4 fonts of each (thefonts used were NY Times, Datama, Busweek, andForbes). The multilevel multiple projection filtersused F = 4 filters with L = 3 levels (0.33,0.66, and 0.99),thus allowingLF = 34 = 81 classes, which is sufficient toaccommodate the 62 character classes. When these F= 4 filters were shown any of the 62 X 4 = 248 charac-ters, the projection values were ideal and perfect 100%recognition was obtained. Table III shows the worst-case outputs (all are within 10-3 of the exact projectionvalues).

We now consider the effect of noise on the perfor-mance of these filters. To produce the noise we gener-ated a random array of numbers between 0 and 1. Bythresholding this array at a, we produced a binarynoise array N(x,y) with pixels equal to 1 if their valuewas a. We then applied the same N(xy) to eachcharacter image with image pixels changed (0 to 1 or 1to 0) if the corresponding (xy) pixel in N(x,y) is 1. Werefer to the result as an image with binary noise. Testresults for a = 0.5 corresponding to oise = 0.25 for thefont Busweek are shown in Table IV. Only the worst-case results are shown (those data with projection val-ues which departed by the most from the ideal values).The projection values are shown with their differencefrom the ideal values given in parentheses. As seen, 61of the 62 images were correctly identified. We assume


Table Ill. Worst-case Tests of 100% Perfect Performance 248 Class Set of Four Multi-level (0.33, 0.66, 0.99)Iconic Filters

Input test Response for filters F1-F4character F1 F2 F3 F4

E 0.3299 0.6601 0.6600 0.6599T 0.6600 0.3299 0.3301 0.9899h 0.6599 0.6600 0.9899 0.65991 0.9900 0.3299 0.3300 0.99016 0.3301 0.3299 0.6599 0.3300

Table IV. Worst-Case Binary Noise Test Results (a = 0.5, o2 = 0.25, Busweek)

Input test Response (and error) for filters F1-F4character F1 F2 F3 F4

E 0.24(0.09) 0.55(0.11) 0.62(0.04) 0.75(0.24)*T 0.57(0.09) 0.28(0.05) 0.29(0.04) 0.91(0.08)W 0.58(0.08) 0.35(0.02) 0.60(0.06) 1.03(0.04)h 0.68(0.02) 0.60(0.06) 0.89(0.10) 0.62(0.04)u 0.62(0.04) 0.85(0.14) 0.92(0.07) 0.90(0.09)1 0.90(0.09) 0.24(0.09) 0.28(0.05) 0.91(0.08)3 0.28(0.05) 0.20(0.13) 0.34(0.01) 0.56(0.10)6 0.27(0.06) 0.29(0.04) 0.57(0.09) 0.30(0.03)9 0.28(0.05) 0.58(0.08) 0.29(0.04) 0.31(0.02)

Table V. Worst-Case Gray Level Noise Test Results g2 = 0.1, Forbes)

Input test Response (and error) for filters F1-F4character F1 F2 F3 F4

Q 0.37(0.04) 0.91(0.08) 0.95(0.04) 0.90(0.09)R 0.33(0.33) 0.31(0.02) 0.38(0.05) 0.32(0.01)V 0.58(0.08) 0.31(0.02) 0.62(0.04) 0.70(0.04)t 1.17(0.18) 0.39(0.06) 0.20(0.13) 0.54(0.12)U 0.91(0.08) 0.34(0.01) 0.37(0.04) 0.96(0.03)x 1.03(0.04) 0.34(0.01) 0.60(0.06) 0.91(0.08)5 0.41(0.08) 0.27(0.06) 0.62(0.04) 0.97(0.02)6 0.21(0.12) 0.28(0.05) 1.02(0.03) 0.31(0.02)9 0.38(0.05) 0.68(0.02) 0.35(0.02) 0.32(0.01)0 0.37(0.04) 0.32(0.01) 0.39(0.06) 0.27(0.06)

projection values with errors below (AL)/2 = 0.165 willbe correctly thresholded.

Binary noise is typical of the noise expected in OCRapplications.'4 We next provided gray-level noisetests. We generated zero-mean Gaussian noise at dif-ferent variances and added this to each image. We setpixels below 0 to 0 and pixels above 1 to 1, but retainedall noise gray levels between 0 and 1. Test were con-ducted of all 248 images with noise present. Theworst-case results for the font Busweek are shown inTable V in the same format used in Table IV. As seen,60 of the 62 images were correctly identified. Thegray-level noise used had 2 _ = 0.1. When the noisevariance was reduced to anoise = 0.08, we obtained 100%correct recognition of all characters. We note that theinput SNR is about 31 for noise = 0.08. Figure 5 showsseveral binary and gray-level noisy input images cor-rectly identified.

We now return to Table II and our theoretical analy-sis indicating that noise sensitivity and the number ofkey image pixels increases with NT. Refer to Table V,which shows that the projection of the letter E on filter

F4 was 0.75 (in error by 0.24) with anoise = 0.25. Wereduced the noise threshold to produce noise with anOise= 0.24 (only 0.01 different from the prior value). Forthis noisy image of the letter E we found the projectionof the letter E on the fourth filter to be 0.98 (nearly theideal 0.99 level). Thus with a slightly different noiserealization or a slightly different noise level (such thatkey image pixels were not affected), much larger noiselevels can be tolerated. By selecting different projec-tion values for different images and by assigning simi-lar projection codes to similar characters, control overthe number of key filter pixels and a reduction in theirvalue is possible.

We now consider tests of these iconic filters withinput test images in fonts that were never seen duringfilter synthesis. Table VI shows the worst case resultsfor tests on input data in the font Scienam. As seen,only one error in all 62 characters occurred. Thusproperly designed iconic filters can recognize test datathat they have never seen. By including fonts of sever-al selected characters, full 100% recognition is possible.The present tests were included to show performancewith a limited training set.


Table VI. Worst-Case New Font (Scienam) Test Results (Error From Ideal Level In Parenthesis)

Input test Response (and error) for filters F1-F4character Font F1 F2 F3 F4

r 0.48(0.18) 0.14(0.15) 0.98(0.01) 0.92(0.07)S 0.56(0.10) 0.31(0.02) 0.28(0.05) 0.67(0.01)s Times 0.97(0.02) 0.30(0.03) 0.30(0.03) 0.36(0.03)2 0.37(0.04) 0.32(0.01) 0.36(0 03) 1.01(0.02)4 0.31(0.02) 0.33(0.00) 0.69(0.03) 0.31(0.02)

cFig. 5. Typical noisy characters with different noise variances: (a)a2 = 0.08 (gray-level noise); (b) a2 = 0.1 (gray-level noise); (c) 2

=

0.24 (binary noise).

For practical optical realization the dynamic rangeof the filter function cannot be seven decimal digits asin digital simulations. To quantify the amplitude andphase dynamic range required in the frequency-do-main iconic filter, we computed the filters used todigital machine accuracy and then quantized thesefilters to different numbers of amplitude and phaselevels. The worst-case test results were analyzed forthe correlation of our multilevel (L = 3) multiple (F =4) filters in tests against the 62 characters in the train-ing set font New York Times data. These results aretypical of those obtained for other fonts. These re-sults showed that a filter quantized in the frequencydomain to 32 amplitude levels and 360 (10 resolution)phase levels in the frequency plane performed mostexcellent, with only two errors out of the 62 characters(96% recognition) with these low quantized filter lev-els. The use of slightly more amplitude levels andmuch less phase levels also yielded perfect 100% recog-nition. Other tests performed considered the unifor-mity of response of the input spatial light modulatorused. These tests showed excellent performance for5% worst case variation in the spatial uniformity of theinput image plane data. We found that up to 30%worst-case nonuniform spatial response in the inputdevice could be tolerated and acceptable results still

obtained. Other tests involved rotations of the inputobject which showed no degradation loss with severaldegrees of rotation of the input object.

Vil. Summary and Conclusion

The issue of large class object recognition has beenaddressed. New filters for such problems have beendescribed and several hierarchical architectures usingthem have been discussed. Attention was given tofilter synthesis problems foreseen when the number ofclasses is large. A theoretical basis for the sidelobeand noise performance of such filters was advancedand quantified by experiment. Initial results arequite attractive. Hierarchical correlators and multi-level multiple iconic filters are a viable and attractivesolution. They appear preferable to an exhaustivesearch of all available training images.15 Training setselection can reduce the number of images necessaryand hence clutter. Proper code selection can improveperformance and reduce various error sources. Near-.perfect recognition of a large number of objects(-1000) with only four filters with moderate filterdynamic range requirements appears possible. InitialOCR tests have quantified these remarks.

The support of this research by the IndependentResearch and Development Funds of General Dynam-ics Pomona and by a grant from the Air Force Office ofScientific Research is gratefully acknowledged.

References

1. R. Davis, D. B. Lenat, Knowledge-Based Systems in ArtificialIntelligence (McGraw-Hill, New York, 1982).

2. R. Davis, B. Buchanan, and E. Shortliffe, "Production Rules as aRepresentation for a Knowledge-Based Consultation Program,"Artif. Intell. 8, (1977).

3. T. Kohonen, Self Organization and Associative Memory(Springer, New York, 1984).

4. D. Casasent and D. Psaltis, "Deformation-Invariant, Space-.Variant Optical Pattern Recognition," Prog. Opt. 16,291 (1979).

5. Y. N. Hsu and H. H. Arsenault, "Optical Pattern RecognitionUsing Circular Harmonic Expansion," Appl. Opt. 21, 4016(1982).

6. H. J. Caulfield and M. H. Weinberg, "Computer Recognition of2-D Patterns Using Generalized Matched Filters," Appl. Opt.21, 1699 (1982).

7. A. Mahalanobis and D. Casasent, "Large Class Iconic PatternRecognition: An OCR Case Study," Proc. SPIE 726, in press,October 1986.

8. R. G. Casey and C. R. Jih, "A Processor-Based OCR System,"IBM J. Res. Develop. 27, 386 (1983).

9. D. Casasent, "Unified Synthetic Discriminant Function Com-putational Formulation," Appl. Opt. 23, 1620 (1984).


10. D. Casasent and W. T. Chang, "Correlation Synthetic Discrimi-nant Functions," Appl. Opt. 25, 2343 (1986).

11. D. Casasent, "Optical AI Symbolic Correlators: Architectureand Filter Considerations," Proc. Soc. Photo-Opt. Instrum. Eng.625, 220 (1986).

12. D. Casasent and S. A. Liebowitz, "Model-Based Knowledge-Based Optical Processors," Appl. Opt. 26, 15 May issue (1987).

13. S. A. Liebowitz and D. Casasent, "Error Correction Coding in anAssociative Processor," Appl. Opt. 26, 999 (1987).

14. Y. X. Gu, Q. R. Wang, and S. Y. Suen, "Application of a Multi-layer Decision Tree in Computer Recognition of Chinese Char-acters," IEEE Trans. PAMI 5, 83 (1983).

15. R. R. Kallman, "Construction of Low Noise Optical Correla-tion," Appl. Opt. 25, 1032 (1986).

Patents continued from page 2187

4,643,534 17 Feb. 1987 (Cl. 350-403)Optical transmission filter for far field beam correction.M. K. CHUN. Assigned to General Electric Co. Filed 20 Aug. 1984.Continuation-in-part of 450,972, 20 Dec. 1982; 642,332, filed concur-rently.

This patent describes a specific type of correction filter composed of twoadjacent crystalline birefringent sections, each having a curved surface, oneconvex, the other concave, both of the same curvature. The variation inthickness from the center axis to the outer edge of each causes a varying phasedelay in a polarized beam passing through the pair. This phase delay can beutilized to provide an improvement in beam quality in large aperture lasersystems. Several examples are described. J.J.J.S.

11

17

4,645,300 24 Feb. 1987 (Cl. 350-162.12)Fourier plane recursive optical filter.R. W. BRANDSTETTER, N. J. FONNELAND, and C. E. LINDIG.Assigned to Grumman Aerospace Corp. Filed 30 July 1984.

While other recursive Fourier filtering methods have been proposed by suchworkers as S.H. Lee and R. Marks, this approach uses multiple displaced pathsin the Fourier plane. It also features heterodyne detection of the output.This short review does not allow a full comparison of these approaches. Theprior methods use beam splitters to avoid displacement effects and, in sodoing, make it impossible to give a number of passes through the system.

H.J.C.

0

4,645,312 24 Feb. 1987 (Cl. 350-464)Photographic lens system.K. IKARI. Assigned to Olympus Optical Co., Ltd. Filed 17 Nov.1983 (in Japan 24 Nov 1982).

This is an unusual design for a compact f/3.5 wide-angle lens covering +36'suitable for use in a focal length of 30 mm on a 35-mm camera, but the backfocus is much too short for use on an SLR. In front of the stop are two largedeep positive menisci followed by a negative meniscus and a positive singlet orcemented doublet. Behind the stop is a large weak biconvex element closelyfollowed by an even larger negative meniscus concave to the front. Fourembodiments are given involving refractive indices as high as 1.88 controlledby four conditions. R.K.

4,645,340 24 Feb. 1987 (Cl. 356-301)Optically reflective sphere for efficient collection of Ramanscattered light.D. J. GRAHAM and R. A. MUSHLIN. Assigned to Boston Univer-sity. Filed 1 June 1983.

This invention covers an adaptation of the integrating sphere to the collect-ing of Raman scattered light. An internally reflecting sphere having aper-tures for introducing a laser beam, sample holder, and collecting optics for thespectrograph is described. J.B.

4,647,158 3 Mar. 1987 (Cl. 350-358)Modifying coherent radiation.E. C. YEADON. Assigned to Crosfield Electronics, Ltd. Filed 30Apr. 1985.

In many imaging systems lasers are not suitable light sources because theircoherent beams create speckle patterns instead of uniform illumination. Thispatent describes a method of using modulated phase diffraction gratings toshift (average) speckle patterns so that convenient laser sources may beemployed. The shifts must be fast compared to the dwell time of the lightsource on the recording media. G.D.

4,647,160 3 Mar. 1987 (Cl. 350-426)Small-sized wide angle zoom objective.K. IKEMORI. Assigned to Canon K.K. Filed 13 May 1985 (inJapan 24 Dec. 1981, and others). Continuation of Ser. 450,742, 17Dec. 1982, abandoned.

Nine examples are given of a conventional two-component zoom with anadditional fixed component in the rear. The front negative component maycontain three, four, or five elements; the inner positive component four, five, orsix elements; and the fixed rear component may have one, two, or threeelements in the various examples. The aperture ranges from f/3.5 to f/4.5,while the focal-length range may be 29-53,29-82, or 36-68 mm. The design iscovered by six conditions, and excellent aberration correction is claimed. R.K.

4,647,161 3 Mar. 1987 (Cl. 350-462)Fish eye lens system.R. MULLER. Filed 24 June 1985. Continuation-in-part of Ser.379,762, 19 May 1982, Pat. 4,525,038.

A fisheye lens has two large negative menisci and a small positive componentin front of the stop, with a compound positive component behind it, making sixor seven elements in all. The nine embodiments have an aperture of f/4 and afocal length of -15 mm, covering ±75 or 90° in the object space. The design iscontrolled by eleven conditions. R.K.

4,647,778 3 Mar. 1987 (Cl. 250-352)Clear aperture cryostat for an infrared detector.M. L. KLINE and 0. G. ROSS. Assigned to General Dynamics,Pomona Division. Filed 23 Jan. 1985.

A Joule-Thompson effect cryostat is disclosed where the high-pressureincoming gas and the low-pressure exhaust gas pass through adjacent helicalpassages in a relatively thin walled cylinder. These passages are in good'thermal contact to enable the cold exhaust gas to cool the entering highpressure gas and produce regeneration. The objective of the arrangement isto avoid obscuring either side of a cooled infrared upconverter disk where anIR image impinging on one side of the disk produces a corresponding image ofshorter wavelength on the other. R.A.


Date post:	05-Oct-2016
Category:	Documents
Upload:	abhijit
View:	213 times
Download:	0 times

Optical iconic filters for large class recognition

Documents