+ All Categories
Home > Documents > Object-oriented image analysis using the cnn universal ... · implemented in the form of a CNN...

Object-oriented image analysis using the cnn universal ... · implemented in the form of a CNN...

Date post: 20-Aug-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
12
488 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL. 50, NO. 4, APRIL 2003 Object-Oriented Image Analysis Using the CNN Universal Machine: New Analogic CNN Algorithms for Motion Compensation, Image Synthesis, and Consistency Observation Giuseppe Grassi, Senior Member, IEEE, and Luigi Alfredo Grieco, Student Member, IEEE Abstract—Image-analysis algorithms are of great interest in the context of object-oriented coding schemes. With reference to the utilization of the cellular neural network (CNN) universal machine for object-oriented image analysis, this paper presents new analogic CNN algorithms for obtaining motion compensation, image synthesis, and consistency observation. Along with the already developed segmentation and object labeling technique, the proposed method represents a framework for implementing CNN-based real-time image analysis. Simulation results, carried out for different video sequences, confirm the validity of the approach developed herein. Index Terms—Analogic CNN algorithms, cellular neural net- works (CNNs), CNN based video coding, neural circuits for image processing, object-oriented image and analysis, spatiotemporal dynamics via cellular array. I. INTRODUCTION I N RECENT years, great efforts have been devoted to the study of coding techniques providing high compression ra- tios, while maintaining good picture quality. Among these tech- niques, the object-oriented coding approach represents an in- teresting and sophisticated method for video coding [2]–[5]. It mainly consists of a strong image-analysis stage (see Fig. 1), that can locate and label shape, color, and motion of objects ap- pearing in a frame of a video sequence. In particular, objects of great importance for image understanding can be detected and coded using special parameter sets, whereas still background objects are not included in the coding procedure. This approach enables the transmission rate to be greatly reduced, since for reconstructing some objects the receiver can exploit the infor- mation already obtained from the previous frames. Namely, the “important” objects are fed adaptively into the channel until the upper bound of the transmission rate is reached, whereas the remaining parts of the frame can be obtained by restoring the information of the already received frames. Since object-oriented coding schemes require very powerful and flexible devices [6], the idea recently proposed in [1], [7] is to exploit the great computational power offered by the cellular Manuscript received March 1, 2001; revised July 12, 2001, and June 27, 2002. This paper was recommended by Associate Editor P. Szolgay. The authors are with the Dipartimento di Ingegneria dell’Innovazione, Università di Lecce, 73100 Lecce, Italy (e-mail: [email protected]; [email protected]). Digital Object Identifier 10.1109/TCSI.2003.809812 Fig. 1. Block diagram of the image-analysis algorithm of an object-oriented coding scheme. neural network universal machine (CNNUM) for obtaining ac- curate image analysis. Notice that the CNNUM has been already implemented in the form of a CNN universal chip [8]–[15], which represents an analog fully programmable supercomputer available on a single chip. When applied to object-oriented coding schemes, the CNNUM is the engine responsible for the image-analysis operations, whereas the remaining coding and transmitting operations can be managed by digital coprocessors [7]. However, regarding the utilization of the CNNUM for image analysis, it should be pointed out that only few results are available in literature. This is because object-oriented image 1057-7122/03$17.00 © 2003 IEEE
Transcript
Page 1: Object-oriented image analysis using the cnn universal ... · implemented in the form of a CNN universal chip [8]–[15], which represents an analog fully programmable supercomputer

488 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL. 50, NO. 4, APRIL 2003

Object-Oriented Image Analysis Using the CNNUniversal Machine: New Analogic CNN Algorithms

for Motion Compensation, Image Synthesis, andConsistency Observation

Giuseppe Grassi, Senior Member, IEEE,and Luigi Alfredo Grieco, Student Member, IEEE

Abstract—Image-analysis algorithms are of great interest inthe context of object-oriented coding schemes. With reference tothe utilization of the cellular neural network (CNN) universalmachine for object-oriented image analysis, this paper presentsnew analogic CNN algorithms for obtaining motion compensation,image synthesis, and consistency observation. Along with thealready developed segmentation and object labeling technique,the proposed method represents a framework for implementingCNN-based real-time image analysis. Simulation results, carriedout for different video sequences, confirm the validity of theapproach developed herein.

Index Terms—Analogic CNN algorithms, cellular neural net-works (CNNs), CNN based video coding, neural circuits for imageprocessing, object-oriented image and analysis, spatiotemporaldynamics via cellular array.

I. INTRODUCTION

I N RECENT years, great efforts have been devoted to thestudy of coding techniques providing high compression ra-

tios, while maintaining good picture quality. Among these tech-niques, the object-oriented coding approach represents an in-teresting and sophisticated method for video coding [2]–[5]. Itmainly consists of a strong image-analysis stage (see Fig. 1),that can locate and label shape, color, and motion of objects ap-pearing in a frame of a video sequence. In particular, objects ofgreat importance for image understanding can be detected andcoded using special parameter sets, whereas still backgroundobjects are not included in the coding procedure. This approachenables the transmission rate to be greatly reduced, since forreconstructing some objects the receiver can exploit the infor-mation already obtained from the previous frames. Namely, the“important” objects are fed adaptively into the channel until theupper bound of the transmission rate is reached, whereas theremaining parts of the frame can be obtained by restoring theinformation of the already received frames.

Since object-oriented coding schemes require very powerfuland flexible devices [6], the idea recently proposed in [1], [7] isto exploit the great computational power offered by the cellular

Manuscript received March 1, 2001; revised July 12, 2001, and June 27, 2002.This paper was recommended by Associate Editor P. Szolgay.

The authors are with the Dipartimento di Ingegneria dell’Innovazione,Università di Lecce, 73100 Lecce, Italy (e-mail: [email protected];[email protected]).

Digital Object Identifier 10.1109/TCSI.2003.809812

Fig. 1. Block diagram of the image-analysis algorithm of an object-orientedcoding scheme.

neural network universal machine (CNNUM) for obtaining ac-curate image analysis. Notice that the CNNUM has been alreadyimplemented in the form of a CNN universal chip [8]–[15],which represents an analog fully programmable supercomputeravailable on a single chip.

When applied to object-oriented coding schemes, theCNNUM is the engine responsible for the image-analysisoperations, whereas the remaining coding and transmittingoperations can be managed by digital coprocessors [7].However, regarding the utilization of the CNNUM for imageanalysis, it should be pointed out that only few results areavailable in literature. This is because object-oriented image

1057-7122/03$17.00 © 2003 IEEE

boggia
Rettangolo
boggia
Rettangolo
boggia
Rettangolo
Page 2: Object-oriented image analysis using the cnn universal ... · implemented in the form of a CNN universal chip [8]–[15], which represents an analog fully programmable supercomputer

GRASSI AND GRIECO: OBJECT-ORIENTED IMAGE ANALYSIS USING THE CNN UM 489

analysis (Fig. 1) mainly consists of four parts: segmentationand object labeling, motion compensation, image synthesis andconsistency observation. The results available in literature onlyconcern the development of CNN algorithms for obtainingthe segmentation and object labeling stage [1], [7]. No resultis available concerning the utilization of CNNUM for motioncompensation, image synthesis and consistency observation.

The aim of the paper is to bridge this gap by developing theremaining stages of the block diagram reported in Fig. 1. Thisobjective is achieved by illustrating newanalogicCNN algo-rithms, which are based on existing CNN templates [16] as wellas on new ones. The paper is organized as follows. In Section II,the CNN algorithm for implementing the motion-compensa-tion stage is described. This stage mainly consists of modelingthe geometrical and motion properties of the objects, and en-coding them using a special parameter set. In Section III, theCNN algorithm for implementing the image-synthesis stage isillustrated. In particular, the algorithm enables all parameterizedimage fragments to be linked in order to obtain a synthesizedimage. In Section IV, the CNN algorithm for implementing theconsistency observation block is presented. This stage mainlyconsists of a comparison between the synthesized image andthe original frame, with the aim of obtaining a “filtered differ-ence image.” In Section V, simulation results and comparisonsare reported to show the effectiveness of the proposed approach.These simulations have been carried out usingMiss America,Claire, andStefanvideo sequences. Finally, in Section VI, a dis-cussion is reported, with the aim of illustrating strengthens andweaknesses of the approach developed herein.

II. M OTION COMPENSATION

Before illustrating the proposed CNN algorithm for motioncompensation (Fig. 2), we would briefly remind that the CNNmodel considered through the paper is the following [17]:

(1a)

(1b)

where is the state, is the output, is the input,is the bias, , and are the linear and nonlinear

feedback parameters, respectively, and are thelinear and nonlinear control parameters, respectively, whereas

is a grid point in the neighborhood within the radiusof the cell . Five different images can describe a CNN layer,that is, the input , the state , the output , the bias , andthe mask . Along with the model, we would also remindthat the CNNUM incorporates some advanced computationalcapabilities, such as [18] the following:

capability to performADDITION and SUBTRACTION ofgrayscale images, also pixel by pixel;capability to combine, pixel by pixel, two binary imagesthrough logic operations such asAND, OR;capability to select which cells are going to be processed(the so called mask ).

Fig. 2. Block diagram of the proposed “motion-compensation algorithm.”

The motion-compensation approach proposed herein consistsof determining in the frame , the position of those ob-jects belonging to frame , which represent the result of thesegmentation technique described in [1]. The position of theseobjects in the frame can be computed by moving each ob-ject (belonging to frame ) in a -pixel window and bycomparing the result with the frame . Notice that the di-mensions and of the search-window depend on the motionfeatures of the video sequence to be processed (see Example 2in Section V). Finally, the position of the object that correspondsto the smallest error is stored for the next stage.

More precisely, the proposed CNN algorithm can bedescribed as follows. For each object individuated by thesegmentation stage, the first step consists in computing thedifference between the frames and , with reference tothe object extension. This can be easily done by the CNNUM,since it is possible to exploit theSUBTRACTIONoperation, alongwith the mask (which describes the extension of thethobject belonging to frame ) [18]. The result (that is, the image

) is processed by theABSOLUTE VALUE template [7]

(2a)

(2b)

where is the nonlinear function illustrated in [7], is the com-putation time, whereas is the result of the “absolute

boggia
Rettangolo
Page 3: Object-oriented image analysis using the cnn universal ... · implemented in the form of a CNN universal chip [8]–[15], which represents an analog fully programmable supercomputer

490 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL. 50, NO. 4, APRIL 2003

value” for the th object. From the implementation point of view,the problem of having nonlinear templates can be easily solvedby exploiting the results in [19] and [20], where it is shownhow nonlinear templates can be decomposed into a sequenceof linear template executions.

The third step consists in computing the following “mean ab-solute error”:

m errorpixel values

black pixels(3)

Equation (3) can be implemented onto the CNNUM by con-sidering the followingDIFFUSION template [16]:

(4a)

average value of (4b)

Namely, the application of (4) to the image generatesa diffusion process. It forces all the pixels to reach a uniquegraylevel value, which represents the average value of .

Now, it is necessary to take into account that, in order to findthe position of theth object in the frame , the object andits mask have to be moved pixel by pixel within the frame. Bychoosing a spiral trajectory (that is, a trajectory with movementsUP, RIGHT, DOWN, DOWN, LEFT, LEFT and so on), the fourth stepof the algorithm consists in moving the object using the fol-lowing UP template [16]:

(5a)

(5b)

where is the image ofth object (obtained from usingthe mask , represents the image where theth ob-ject has been moved one pixel toward theUP direction, whereasthe black mask indicates that all the cells are in activestate. Along with the object, its mask is moved one pixeltoward theUP direction, so that the mask is obtained.Successively, the image has to be compared with theframe (referring to the object extension). This objectiveis achieved using theSUBTRACTION capability offered by theCNNUM. The result is the image ,where is the image obtained from using the

mask .Successively, after having computed the error using (2)–(4),

both the object and the mask have to be moved, along the spiraltrajectory, using the templatesRIGHT, DOWN andLEFT reportedin [16]. Therefore, the proposed algorithm is iterated until allthe movements of the object have been carried out within the

-pixel search-window of frame . The last iteration givesthe image , where and

are the object and the mask that correspond to the last

movement, respectively, whereas is the image ob-

tained using . When all the movements have been carriedout, a comparison among the errors is made, until the smallesterror is found. The iteration that corresponds to the minimumerror gives the image

(6)

where is the compensated image of theth objectwhereas is its compensated mask. These compensatedimages represent the results of the algorithm, which is appliedto each object individuated by the segmentation stage describedin [1]. If the segmented objects detected in frameare , thecompensated images available for the next stages arefor and for .

III. I MAGE SYNTHESIS

In this section, a CNN algorithm for image synthesis is de-veloped. The algorithm consists of two parts (Fig. 3). The ob-jective of the first part is to link the objects belonging to frame

in their new positions in frame (i.e., in those positionsalready detected by the “motion-compensation algorithm” de-veloped in Section II). Then, the objective of the second partis to combine the previous result (called “object composition”)with the background.

A. Object Composition

In order to link the objects in their new positions, it is worthnoting that the objects can be added to each other only if thecommon parts are taken into account. Roughly speaking, thebehavior of proposed algorithm [Fig. 3(a)] can be explained asfollows. Given, for instance, the fourth object detected in theframe by the segmentation stage, let’s add it to the first, thesecond and the third object (using the blockADDITION). By con-sidering that the fourth object and the remaining ones can sharesome parts, before adding the objects, it is necessary to deletethese common parts from the fourth object. These parts can bedeleted (using the blockZERO) only if a proper maskis computed. This mask is the result of the blockAND betweenthe compensated mask of the fourth object ( ) and the“partial mask” obtained by combining the compensated masksof first, second, and third object ( , , ,respectively) using the blockOR. Therefore, the parts individu-ated by the mask can be deleted from the fourth objectand the result can be added to first, second and third object.

More precisely, the proposed CNN algorithm works as fol-lows. Given theth object, let’s add it to the objects 1, 2 .This task can be carried out by computing the mask ,which takes into account the parts shared by theth object andthe remaining ones. This mask can be easily computed by ex-ploiting the capability of the CNNUM of combining binary im-ages through any user-selectable logic operations [18]. In par-ticular, by applying theOR operation to the masks( ), it is possible to derive the binary image

. Successively, by exploiting the logicAND be-tween the binary images and , it is pos-sible to obtain the desired mask .

boggia
Rettangolo
Page 4: Object-oriented image analysis using the cnn universal ... · implemented in the form of a CNN universal chip [8]–[15], which represents an analog fully programmable supercomputer

GRASSI AND GRIECO: OBJECT-ORIENTED IMAGE ANALYSIS USING THE CNN UM 491

(a)

(b)

Fig. 3. Block diagram of the proposed “image synthesis algorithm”; (a) objectcomposition; (b) combination with the background.

Now, the part individualized by the mask can bedeleted from the compensated image of theth objectusing the followingZERO template:

(7a)

(7b)

Therefore, the image can be added to theimage , which in turn combines the objects

, using theADDITION operation. The result is theimage . The proposed algorithm is iterateduntil all the compensated objects are processed. Theresulting image is the “object composition,” that is

(8)

B. Combination With the Background

In order to combine the “object composition” with thebackground, the algorithm illustrated in Fig. 3(b) is proposed.First of all, it is necessary to compute the “mask composition”

. This mask, obtained through theOR operation,is given by . Successively, byusing theZERO template (7a) with

(9)

Fig. 4. Block diagram of the proposed “consistency observation algorithm.”

Fig. 5. Block diagram of the proposed “mask of significant difference”algorithm.

it is possible to delete the moving objects from the frame,so that the background image is obtained. Finally,by exploiting theADDITION, it is possible to combine the “ob-ject composition” with the background, so that the synthesizedimage is obtained

(10)

IV. CONSISTENCYOBSERVATION

In this section, a CNN algorithm for consistency observa-tion is proposed (Fig. 4). Since the objective of the algorithmis to compute the “filtered difference image,” the first step con-sists in making a comparison between the frame and thesynthesized image. This task, which can be carried out usingthe SUBTRACTION operation, leads to the “difference image”

. Successively, it is necessary tocompute a proper mask, called “difference mask” . Thismask, which individualizes the parts of the “difference image”characterized by values greater than a prefixed threshold, can

boggia
Rettangolo
Page 5: Object-oriented image analysis using the cnn universal ... · implemented in the form of a CNN universal chip [8]–[15], which represents an analog fully programmable supercomputer

492 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL. 50, NO. 4, APRIL 2003

(a) (b)

(c)

Fig. 6. Video sequence ofMiss America. (a) Frame 3. (b) Segmented frame 3. (c) Frame 7.

be computed using the block called “mask of significant differ-ence.” The algorithm for implementing this block (Fig. 5) worksas follows.

Given the “difference image” , it is necessary tocompute theABSOLUTE VALUE [see (2)], so that the image

is obtained. Successively, in order to detect only thosepixels where the “difference image” is significant, a thresholdis required. This objective can be achieved by applying theTHRESHOLD template reported in [16]. However, owing to thebias value reported in [16], the “threshold operation” cannot bedirectly applied to the image . Namely, an amplificationof is required, with a gain that equals 10. Such amplifi-cation can be achieved by recursively applying theADDITION

operation. The result is the image , which can be nowused as input for the “threshold operation.” The result is theimage . The last step for obtaining consistsin removing small objects as well as groups of isolated pixels.This step can be carried out using theSMALL OBJECT REMOVER

template reported in [16]. By considering as input,the result is the image .

Finally, it is necessary to delete from the “difference image”those pixels that are not indicated by the “difference

mask” . This objective can be achieved using the fol-lowing ZERO template:

(11a)

(11b)

where represents the “filtered difference image.”

V. SIMULATION RESULTS

Two examples are illustrated in this Section. The aim of thefirst one is to show how the proposed approach works, whereasthe aim of the second one is to evaluate the performances of theproposed technique for different video sequences.

Example 1: Herein, the Miss America sequence in thecommon intermediate format(CIF) is considered [1]. Westart by reporting in Fig. 6(a) and Fig. 6(c) , the thirdand the seventh frame of the sequence, respectively,whereas in Fig. 6(b), the result of the segmentation stage isillustrated. From Fig. 6(b), it can be argued that the segmentedobjects detected in frame are . By applying themotion-compensation algorithm to the ninth object, with a15- 15-pixel window, the selected images reported in Fig. 7are obtained. In particular, the first iteration of the algorithm(UP movement) gives the imagereported in Fig. 7(a), whereas the mask is illustrated inFig. 7(b). The last iteration of the algorithm gives the image

illustrated in Fig. 7(c), whereas the

mask is reported in Fig. 7(d). Since all the movementshave been carried out, a comparison among the errors enablesthe compensated images to be found. Namely, the iteration thatcorresponds to the minimum error (see (6)) gives the image

reported in Fig. 7(e), whereasthe final result of the algorithm, that is, the compensated objectimage and the compensated mask , arereported in Fig. 7(f) and Fig. 7(g), respectively.

Now, by exploiting the images and for, the algorithm for obtaining the object composition

boggia
Rettangolo
Page 6: Object-oriented image analysis using the cnn universal ... · implemented in the form of a CNN universal chip [8]–[15], which represents an analog fully programmable supercomputer

GRASSI AND GRIECO: OBJECT-ORIENTED IMAGE ANALYSIS USING THE CNN UM 493

(a) (b)

(c) (d)

(e) (f)

(g)

Fig. 7. Motion-compensation algorithm applied to the ninth object. (a) ImageYYY = [(III ) �YYY ] (first iteration, m_error= 0.0464). (b) MaskMMM (firstiteration). (c) ImageYYY = [(III ) �YYY ] (last iteration, m_error= 0.1535). (d) MaskMMM (last iteration). (e) ImageYYY = [(I ) �YYY ],

(iteration at minimum error, m_error= 0.0343). (f) Compensated object imageYYY . (g) Compensated maskMMM .

is applied. Selected images resulting from its application arereported in Figs. 8 and 9. In particular, by considering the firstsix objects, their composition is reported inFig. 8(a) whereas the corresponding mask isillustrated in Fig. 8(b). By applying theAND operation between

and the mask of the seventh object

[Fig. 8(c)], the image is obtained [Fig. 8(d)]. Suchmask is used for obtaining the image [Fig. 8(e)],which can be added to for obtaining theimage [Fig. 8(f)]. Finally, by starting fromthe composition of the first eight objects[Fig. 9(a)] and by considering theAND operation between the

boggia
Rettangolo
Page 7: Object-oriented image analysis using the cnn universal ... · implemented in the form of a CNN universal chip [8]–[15], which represents an analog fully programmable supercomputer

494 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL. 50, NO. 4, APRIL 2003

(a) (b)

(c) (d)

(e) (f)

Fig. 8. Object composition algorithm. (a) Image YYY (i.e., composition of the first six objects). (b) Image MMM (i.e., mask of the firstsix objects). (c) ImageMMM (i.e., mask of the seventh object). (d) ImageMMM (i.e., result of the AND operation). (e) ImageYYY (i.e., result of theZERO operation). (f) Image YYY (i.e., composition of the first seven objects).

mask of the first eight objects [Fig. 9(b)] andthe mask of the ninth object [Fig. 9(c)], the image

is obtained [Fig. 9(d)]. The utilization of such a maskleads to the image [Fig. 9(e)], which can be addedto for obtaining the “object composition”

[Fig. 9(f)].Now, in order to combine the “object composition” with the

background, it is necessary to compute the “mask composition”[see Fig. 10(a)], which is

required for obtaining the background [Fig. 10(b)].Finally, by applying theADDITION, the synthesized image

is obtained [Fig. 10(c)].Now, the algorithm for consistency observation has to be ap-

plied for computing the “filtered difference image.” At first, byapplying theSUBTRACTION, the “difference image”

is determined [Fig. 11(a)]. Then, the applica-tion of the algorithm in Fig. 5 leads to “difference mask”[Fig. 11(b)]. By deleting from those pixels that are not

indicated by , it is possible to obtain the “filtered differ-ence image” [Fig. 11(c)]. Finally, Fig. 12 showsthe image obtained by adding the “filtered difference image”

to the synthesized image . Notice thatin object-oriented video coding the receiver has to reconstruct

and add it to , which is provided by thetransmitter.

We would conclude this example by giving an estimation ofthe complete execution time of the proposed CNN algorithmson the CNNUM. The results are summarized in Table I, wherethe estimated processing time has been given for each CNN al-gorithm without taking into account the I/O operations as wellas the digital operations such asAND/OR. We would stress the

dependence of the proposed approach, which will becomemore and more efficient, as a function of, in the future chipimplementations.

Example 2: The aim of this example is to evaluate theperformances of the proposed motion-compensation algorithm

boggia
Rettangolo
Page 8: Object-oriented image analysis using the cnn universal ... · implemented in the form of a CNN universal chip [8]–[15], which represents an analog fully programmable supercomputer

GRASSI AND GRIECO: OBJECT-ORIENTED IMAGE ANALYSIS USING THE CNN UM 495

(a) (b)

(c) (d)

(e) (f)

Fig. 9. Object composition algorithm. (a) Image YYY (i.e., composition of the first eight objects). (b) Image MMM (i.e., mask of the firsteight objects). (c) ImageMMM (i.e., mask of the ninth object). (d) ImageMMM (i.e., result of the AND operation). (e) ImageYYY (i.e., result of theZERO operation). (f) ImageYYY = YYY (i.e., the “object composition”).

in terms of precision of the prediction. To this purpose, acomparison among different video sequences is carried out. Atfirst, we apply the motion-compensation algorithm to the first50 frames of theMiss Americasequence, starting from frame3 with a 15 15-pixel window. Successively, we apply themotion-compensation algorithm to the first 50 frames of theClaire sequence, with a 19 19-pixel window. We start byframe 3, where 7 objects are found [see Fig. 13(a)]. Notice that,since the motion inClaire is larger than the motion inMissa,a larger search-window has been chosen. Finally, we considera worst-case benchmark, that is, theStefanvideo sequence.We apply the algorithm to the first 10 frames by choosing a31 31-pixel window. We start by frame 3, where 12 objectsare found [see Fig. 13(b)]. Notice that, since in theStefansequence camera motions are applied and several objects movein different way, it is necessary to consider search-windowlarger than 15 15-pixel size. RegardingMissa and Claire,the results are reported in Table II, where for each sequencethe energies of the difference image with motion compensation

and without motion compensation have been evaluated. Noticethat the energy has been computed by considering that the pixelvalues of the “difference image” are within the range [1, 1].Moreover, the gains in decibels are reported in Fig. 14. Theobtained results (that is, the improvements in dB) clearly high-light the performances of the proposed algorithm in estimatingthe scene transformation. RegardingStefan, the results fromframe 5 to frame 10 are summarized in Table III. By analyzingthese results, it can be argued that the gain obtained for frame5 is satisfying, whereas from frame 6 to frame 10, there isa performance degradation due to fast scene transformationinduced by camera motion. We would point out that, in ouropinion, this degradation does not depend on the behavior ofthe “motion-compensation” algorithm, but on the fact thatthe “area of the fans” in theStefansequence continuouslychanges due to camera motion. By taking into account theresults in Table III, we can argue that forStefansequence the“segmentation and object labeling” stage has to be carried outapproximately every three frames.

boggia
Rettangolo
Page 9: Object-oriented image analysis using the cnn universal ... · implemented in the form of a CNN universal chip [8]–[15], which represents an analog fully programmable supercomputer

496 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL. 50, NO. 4, APRIL 2003

(a) (b)

(c)

Fig. 10. Combination with the background. (a) Mask compositionMMM = MMM . (b) BackgroundYYY . (c) Synthesized imageYYY .

(a) (b)

(c)

Fig. 11. Consistency observation algorithm. (a) “Difference image”YYY = (III �YYY ). (b) “Difference mask”MMM . (c) “Filtered difference image”YYY .

VI. DISCUSSION

The aim of this section is to illustrate strengthens and weak-nesses of the proposed approach. In particular, the followingissues are discussed.

1) In the context of object-oriented video coding, interestingblock-matching and region-matching algorithms have

already been developed (for instance, some of them arecurrently used in standards such as MPEG4) [7]. Theseapproaches have first been proposed for digital serialprocessor architectures. Successively, they have beenimproved through parallelization based on MMX [12].On the other hand, the processor considered throughthe paper is a new analog dynamic processor array,

boggia
Rettangolo
Page 10: Object-oriented image analysis using the cnn universal ... · implemented in the form of a CNN universal chip [8]–[15], which represents an analog fully programmable supercomputer

GRASSI AND GRIECO: OBJECT-ORIENTED IMAGE ANALYSIS USING THE CNN UM 497

TABLE ICNN-BASED IMAGE ANALYSIS: PROCESSINGTIMES OF “SEGMENTATION,” “M OTION COMPENSATION,” “I MAGE SYNTHESIS,” AND

“CONSISTENCY-OBSERVATION” A LGORITHMS

Fig. 12. The image obtained by addingYYY toYYY .

(a)

(b)

Fig. 13. Video sequence comparisons. (a) Segmented frame 3 ofClaire.(b) Segmented frame 3 ofStefan.

called CNN Universal chip [10]. Therefore we wouldpoint out that the algorithms proposed herein, designedfor this new computational paradigm, represent a newimplementation for CNN-based object-oriented videocoding architectures.

2) Referring to the objectives to be achieved, the proposedalgorithms could appear close to those used in digital pro-cessors. However, we would stress that the approach illus-trated herein consists ofanalogicCNN algorithms. No-tice that the key idea in CNNUM processors is the conceptof “analog instruction” [10]. It consists of CNN templatesequence, which generates nonlinear spatio-temporal dy-namics [10]. This idea is lacking in digital hardware pro-cessors. For this reason the algorithms proposed for dig-ital processors cannot be used for CNN architectures. Asa consequence, we feel that our paper presents “new ana-logic CNN algorithms” for motion compensation, imagesynthesis and consistency observation.

3) Advantages and drawbacks of standard block-matchingalgorithms and object-oriented motion-compensation al-gorithms are now discussed. Block-matching algorithmsdo not usually require complex image-analysis tasks[7]. At the same time, they are able to guarantee satis-fying motion prediction. However, referring to scenescharacterized by fast motion, block-matching algorithmsusually require a large number of blocks to be compen-sated, if compared to the number of segmented objects[7]. This clearly represents a drawback for real-timehardware implementations of these algorithms. Further-more, the presence of blocking artifacts can significantlydecrease the quality of human visual perception. Onthe other hand, object-oriented motion-compensationalgorithms usually require complex image-analysistasks in order to detect object contour [7]. As shownthrough the paper, this drawback can be overcome byexploiting the great computational power offered by theCNNUM. Additionally, object-oriented motion-compen-sation algorithms do not generate blocking artifacts and,consequently, do not affect the image quality.

Finally, referring to Fig. 14, it should be pointed outthat those results do not underline if an object-based com-pensation improves the block-based approach used in thestandard. We feel that this issue needs to be further inves-tigated in the future.

4) Regarding the motion model, through the paper only asimple translation model has been considered. Neverthe-less, this has led to a complex analogic CNN algorithm formotion compensation. However, it should be noted thatwhen a region adapted to an object in the scene is used toestimate the spatio-temporal transformation between twoframes, the simple translation model is less precise thanmore general affine motion models. This issue needs to

boggia
Rettangolo
Page 11: Object-oriented image analysis using the cnn universal ... · implemented in the form of a CNN universal chip [8]–[15], which represents an analog fully programmable supercomputer

498 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL. 50, NO. 4, APRIL 2003

Fig. 14. MissaandClaire video sequences: gains in decibels.

TABLE IIMISSA AND CLAIRE VIDEO SEQUENCES: ENERGIES OF THE“DIFFERENCE IMAGE” WITH MOTION COMPENSATION,

AND WITHOUT MOTION COMPENSATION(FRAMES FROM 5 TO 50)

be further investigated, with the aim of developing evenmore sophisticated analogic CNN algorithms for motioncompensation.

VII. CONCLUSION

The CNN UM has proved to be a powerful tool for imple-menting image processing algorithms via elementary CNN in-

structions. In this paper, the attention has been focused on ob-ject-oriented image analysis for video-coding systems. In par-ticular, new analogic CNN algorithms for motion compensation,image synthesis and consistency observation have been devel-oped. Simulation results using different video sequences havebeen reported to show the effectiveness of the technique. It canbe concluded that, along with the segmentation technique de-

boggia
Rettangolo
Page 12: Object-oriented image analysis using the cnn universal ... · implemented in the form of a CNN universal chip [8]–[15], which represents an analog fully programmable supercomputer

GRASSI AND GRIECO: OBJECT-ORIENTED IMAGE ANALYSIS USING THE CNN UM 499

TABLE IIISTEFAN VIDEO SEQUENCE: ENERGIES OF THE“DIFFERENCE IMAGE” WITH MOTION COMPENSATION,

AND WITHOUT MOTION COMPENSATION(FRAMES FROM 5 TO 10)

veloped in [1], the proposed approach make perceive the pos-sibility of implementing the whole object-oriented image-anal-ysis scheme on the CNN universal chip.

REFERENCES

[1] A. Stoffels, T. Roska, and L. O. Chua, “On object-oriented video-codingusing the CNN Universal Machine,”IEEE Trans. Circuits Syst. I, vol. 43,pp. 948–952, Nov. 1996.

[2] M. Kunt, A. Ikonomopoulos, and M. Kocher, “Second generation imagecoding techniques,”Proc. IEEE, vol. 73, pp. 549–575, Apr. 1985.

[3] N. Diehl, “Object-oriented motion estimation and segmentation inimage sequences,”Signal Processing: Image Communication, vol. 3,pp. 23–56, 1991.

[4] M. Hotter, “Optimization and efficiency of an object-oriented analysis-synthesis coder,”IEEE Trans. Circuits Syst. Video Technol., vol. 4, pp.181–194, Apr. 1994.

[5] J. Ostermann, “Object-oriented analysis-synthesis coding based on thesource model of moving rigid 3D objects,”Signal Processing: ImageCommunication, vol. 6, pp. 143–161, 1994.

[6] P. Pirsch, N. Demassieux, and W. Gehrke, “VLSI architectures for videocompression: A survey,”Proc. IEEE, vol. 83, pp. 220–246, Feb. 1995.

[7] A. Stoffels, T. Roska, and L. O. Chua, “Object-oriented image analysisfor very-low-bitrate video-coding systems using the CNN universal ma-chine,” Int. J. Circuit Theory and Applications, vol. 25, pp. 235–258,1997.

[8] T. Roska and L. O. Chua, “The CNN universal machine: An analogicarray computer,”IEEE Trans. Circuits Syst. II, vol. 40, pp. 163–173,Mar. 1993.

[9] R. Dominguez-Castro, S. Espejo, A. Rodriguez-Vazquez, and R.Carmona, “A CNN universal chip in CMOS technology,” inProc.Third IEEE Int. Workshop Cellular Neural Networks Their Applications(CNNA ’94), Rome, Italy, 1994, pp. 91–96.

[10] T. Roska, “Computer-sensors: Spatial-temporal computers for analogarray signals, dynamically integrated with sensors,”J. VLSI Signal Pro-cessing, vol. 23, no. 2/3, pp. 221–237, 1999.

[11] A. Rodriguez-Vazquez, E. Roca, M. Delgado-Restituto, S. Espejo, andR. Dominguez-Castro, “Most-based design and scaling of synaptic in-terconnections in VLSI analog array processing CNN chips,”J. VLSISignal Processing, vol. 23, no. 2/3, pp. 239–266, 1999.

[12] T. Sziranyi, K. Laszlo, L. Czuni, and F. Ziliani, “Object-oriented motion-segmentation for video-compression in the CNN-UM,”J. VLSI SignalProcessing, vol. 23, no. 2/3, pp. 479–496, 1999.

[13] G. Linan, S. Espejo, R. Dominguez-Castro, and A. Rodriguez-Vazquez,“The CNNUC3: An analog I/O 64� 64 CNN Universal Machine chipprototype with 7-bit analog accuracy,” inProc. 6th IEEE Int. Work. onCellular Neural Networks and Their Applications (CNNA 2000), 2000,pp. 201–206.

[14] A. Zarandy, “ACE box: High-performance visual computer based on theACE4k analogic array processor chip,” inProc. Europ. Conf. CircuitTheory and Design (ECCTD 2001), vol. I, 2001, pp. 361–364.

[15] G. Linan, R. Dominguez-Castro, S. Espejo, and A. Rodriguez-Vazquez,“ACE 16K: A programmable focal plane vision processor with 128�128 resolution,” inProc. Europ. Conf. Circuit Theory Design (ECCTD2001), vol. I, 2001, pp. 345–348.

[16] AnaLogic Computers Ltd, “CSL-CNN software library, version 1.1,”AnaLogic Computers Ltd, Budapest, Hungary, 2000.

[17] L. O. Chua and T. Roska, “The CNN paradigm,”IEEE Trans. CircuitsSyst. I, vol. 40, pp. 147–156, Mar. 1993.

[18] G. Linan, P. Foldesy, A. Rodriguez-Vazquez, S. Espejo, and R.Dominguez-Castro, “Realization of nonlinear templates using theCNNUC3 prototype,” inProc. 6th IEEE Int. Workshop Cellular NeuralNetworks Their Applications (CNNA 2000), 2000, pp. 219–224.

[19] G. Linan, P. Foldesy, A. Rodriguez-Vazquez, S. Espejo, and R.Dominguez-Castro, “Implementation of nonlinear templates using adecomposition technique by a 0.5�m CMOS CNN Universal Chip,” inProc. IEEE Int. Symp. on Circuits Systems (ISCAS 2000), vol. II, 2000,pp. 401–404.

[20] L. Kek and A. Zarandy, “Implementation of large neighborhood non-linear templates on the CNN universal machine,”Int. J. Circuit TheoryApplicat., vol. 26, pp. 551–566, 1998.

Giuseppe Grassi(S’93–M’95–SM’02) received thelaurea degree in electronic engineering from the Uni-versità di Bari, Bari, Italy, in 1991, and the Ph.D. de-gree in electrical engineering from the Politecnico diBari, Bari, Italy, in 1995.

From October 1994 to August 2000 he was an As-sistant Professor in the Department of Innovation En-gineering, Università di Lecce, Lecce, Italy, wherehe is currently an Associate Professor, and teachesbasic circuit theory. His research interests are cellularneural networks theory and applications, associative

memory design, object-oriented image analysis and motion compensation usingcellular arrays, stability properties of nonlinear systems, dynamics of cellular ar-rays, chaotic and hyperchaotic circuits, synchronization properties, hyperchaos-based cryptography, and control of chaos and hyperchaos. He has published30 papers in international journals and 45 papers in proceeding of internationalconferences, and four papers in international books.

Dr. Grassi serves as a Reviewer for several international journals, in-cluding the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I, andIEEE TRANSACTIONS ONNEURAL NETWORKS.

Luigi Alfredo Grieco (S’02) received the Laurea degree in electronic engi-neering (cum laude) from the Politecnico di Bari, Bari, Italy, in 1999. Currently,he is working toward the Ph.D. degree in information engineering at the Uni-versità di Lecce, Lecce, Italy.

His research interests are cellular neural networks applications for real-timevideo processing and congestion control for packet switching networks. He haspublished 15 papers in international conferences.

boggia
Rettangolo
boggia
Rettangolo

Recommended