Automatic Analysis of Moving Images

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-3, NO. 1, JANUARY 1981

Automatic Analysis of Moving Images

MASAHIKO YACHIDA, MINORU ASADA, AND SABURO TSUJI, SENIOR MEMBER, IEEE

Abstract-Cine film and videotape are used to record a variety ofnatural processes in biology, medicine, meteorology, etc. This paper de-scribes a system which detects and tracks moving objects from theserecords to obtain meaningful measures of their movements, such as lin-ear and angular velocities. Features of the system are as follows.

1) In order to detect moving objects that are usually blurred, temporaldifferences of gray values (differences between consecutive frames) areused to separate moving objects from stationary objects, in addition tospatial differences of gray values.2) The results of previous frames are used to guide feature extraction

process of the next frame so that efficient processing of moving pic-tures which consists of a large number of frames is possible.3) Uncertain parts in the current frame, such as occluded objects, are

deduced using information of previous frames.4) Misinterpreted or unknown parts in previous frames are reanalyzed

using the results of later frames where those parts could be found.

Index Tenns-Change detection, feature extraction, motion analysis,moving images, occlusion, pattern recognition, picture processing, sceneanalysis, temporal difference.

I. INTRODUCTION

A WIDE variety of natural processes are recorded on video-tape or cine film as experimental data for later analysis.

Typical examples are moving pictures of living things rangingfrom fish swimming in a vat to microorgans moving in a micro-scope slide to study their behavior under a variety of stimulusconditions. Automatic analysis of moving pictures of thesereal phenomena produces difficult problems as follows.

1) Since the moving objects are usually blurred, it is difficultto detect their boundaries. Thus, an effective method shouldbe developed to extract the boundary of moving images.2) Since a large number of frames must be processed, effi-

cient procedures should be developed to process the movingimages.3) Even if the shape of an object does change from frame to

frame, we must be able to correspond or identify it.4) Occlusion is an inherent problem for analysis of moving

pictures of natural processes, since objects move in differentdirections and at different velocities.There have been several studies which deal with moving pic-

tures [11, [2]. Potter [3] measured velocity of picture pointsbetween consecutive frames and classified the points based onvelocity measurements. Jain et al. [4] measured gray-valuedifferences between consecutive frames and separated non-stationary from stationary components. These methods will

Manuscript received January 10, 1979; revised December 21, 1979.This work was supported in part by a Grant-in-Aid for Scientific Re-search from the Ministry of Education of the Government of Japan.The authors are with the Department of Control Engineering, Osaka

University, Osaka, Japan.

be useful as a part of a motion analysis system to detect candi-dates for moving objects. Jones et al. [5] proposed a systemto find and track prominent gray value features from the mov-ing pictures. Although the method is efficient since only prom-inent features are tracked, it seems to be difficult to apply themethod for more complex scenes, such as the ones includingocculsion. Chow et al. [6] proposed a method to track mov-ing objects by matching properties of objects between consecu-tive frames. The occlusion problem was studied under anassumption that the velocity and shape of each object are con-stant during occlusion, which seems to be difficult to apply forreal world problems. Since they work with objects showinghigh contrast against background, there was little problem indetecting boundaries of objects.In this paper we describe a system that detects and tracks

moving objects from natural processes recorded on videotapeor cine film. We develop further an approach which has beenrecently proposed [7] and which seems to overcome the diffi-culties mentioned before. An advantage in analysis of movingpictures compared with static pictures is that we can use tem-poral or interframe information. Then, our key idea to solvevarious problems in analysis of moving pictures mentioned be-fore is to use this temporal information, and we incorporatethe system with the following sophisticated techniques.

1) In order to detect moving objects that are usually blurred,temporal differences of gray values (differences between con-secutive frames) are used to separate moving objects from sta-tionary objects, in addition to spatial differences ofgray values.2) The results of previous frames are used to guide the fea-

ture extraction process of the next frame, so that efficientprocessing of moving pictures which consists of a large numberof frames is possible.3) Uncertain parts in the current frame, such as occluded ob-

jects, are analyzed or deduced using information of previousframes.4) Misinterpreted or unknown parts in previous frames are

reanalyzed using the results of later frames, where those partscould be successfully found.The system has been applied to moving pictures of fishes

to study behavior of various fish under variety of stimulusconditions.

II. HARDWARE CONFIGURATION AND INPUT SCENE

A. Hardware ConfigurationFig. 1 shows a block diagram of the hardware system. The

computer used is a HP-2108, a 16-bit machine with 32K wordsof memory, a disk (1 5M bytes), a printer/plotter, a storage tubedisplay, and three terminals under a multiuser operating sys-

0162-8828/81/0100-0012$00.75 © 1981 IEEE

12

13YACHIDA et al.: AUTOMATIC ANALYSIS OF MOVING IMAGES

COLORDISPLAY

Fig. 1. Block diagram of the hardware system.

tem. An external memory of 256K bytes is provided to storepicture data of several consecutive frames so that processingbetween consecutive frames is easy.The system can input digital pictures either from videotape

or cine film. The video signal from a TV camera (or a video-tape deck) is sampled and digitized to a 4-bit number by anA/D converter and stored in the picture memory. There aretwo modes for sampling the frame: COARSE MODE and FINEMODE. In COARSE MODE an entire frame is divided into128 X 128 points, and in FINE MODE the entire frame is di-vided into 256 X 256 points, but only a specified region of theframe is stored in the picture memory. When one frame hasbeen sampled, a digitized picture stored in the picture memoryis sent to and stored in the disk by a DMA channel via thecomputer. Frame rate can be controlled by the computer;however, the maximum frame rate is limited by access timeand transfer rate of the disk and is 15 frames/s for frames of128 X 128 points. Any sequence of picture data stored in thedisk or the picture memory can be displayed as dynamic pic-tures on a color display, which was very useful to examine andevaluate the processed results.

B. Input SceneFish swimming in a vat are viewed from a vertical direction

by an overhead TV camera and recorded on videotape. Duringexperiments a variety of stimuli, such as light and tone, are in-troduced to study behavior of fishes to these stimuli. Interest-ing sequences of data to be analyzed are denoted by a tonerecorded on an audio channel of the videotape deck by a

biologist. Duration of each sequence is from 2-30 s; therefore,each sequence consists of 120-1800 frames.

Fig. 2 shows a typical input picture taken by COARSE MODE.Detection of moving objects is not easy, since they are usuallyblurred and there is much noise caused by shadows, waves, andan edge of the vat. Occlusion often occurs as the objects movein different directions and at different velocities. Since thereare small gray level differences between objects, it is difficultto distinguish whether they are occluding or not. Althoughobjects in a neighborhood have similar gray levels, objects indifferent locations have different gray levels because of non-uniform lighting conditions. The objects move and rotate indifferent directions and even their shapes change from frameto frame. However, it is assumed that the changes in consecu-

Fig. 2. Example of input picture taken in COARSE MODE.

tive frames are small because the pictures are taken at a fastrate.

III. COARSE MODE ANALYSIS

A. Efficient Analysis ofMoving ImagesSince moving pictures have an enormous amount of informa-

tion, it takes too much computation time if whole areas ofconsecutive frames are processed in details. However, it is usu-ally not necessary to analyze all information of the movingpictures, and it is sufficient to analyze only small portions ofthem to extract significant information from the moving pic-tures. For example, in Fig. 2 it is not necessary to analyze theentire frame in details, but it is sufficient to analyze only asmall subregion of the frame covering the moving objects. Itcan also be said that we do not need to analyze every frame,but we need to analyze abruptly changing frames more care-fully than almost still ones. This means that selection of re-gions and frames to be analyzed is very important to efficientprocessing of moving pictures.Therefore, we have two levels of analysis; namely, coarse

mode analysis and fine mode analysis. In the coarse modeanalysis, we input pictures in the low resolution and at slowframe rate, and plans which regions of which frames should beanalyzed in more details. Then, in the fine mode analysis se-lected regions of the selected frames are input in the high reso-lution and analyzed in more details.

B. Coarse Mode AnalysisFig. 3 shows a block diagram of the coarse mode analysis.

The data are first sampled by COARSE MODE at a slow rate,that is, 2 frames/s. Then, the coarse mode analyzer finds ap-proximate locations of the moving objects in each frame by asimple temporal-difference method as follows.Let us denote a sequence of pictures as Ii1(l), Ii(2),*

I,1(N), where N is a number of pictures in the sequence. Then,if we make a point-to-point subtraction between two picturesof kth frame and (k + n)th frame defined as

Dij,(k, k + n) = Ij,(k + n) - Iij,(k), (1)


Take the data by COARSE MODE at a rate of 2frames/sec

Compute temporal differences between.consecutive frames and finds approximatelocations of moving objects in each frame

4_PLAN; 1) Frame rate

2) Location to be sampled

Rewind the video tape and replay for finer.

samplingFig. 3. Block diagram of the coarse mode analysis.

Dij (k,k + n

Fig. 4. Simple temporal-differentiation method.

then it will be obvious that the resulting picture Dij (k, k + n)has much larger values for the moving components of the pic-tures than for the stationary components. A moving objectproduces two regions having large values in Dij(k, k + n): a

front region of the object caused by covering of the backgroundby the object and a rear region of the object caused by uncov-

ering of the object from the background, as shown in Fig. 4.It is easy to discriminate these two regions because they have

opposite signs of values; for example, positive values for therear region and negative values for the front region, in our case,

since objects are darker than the background. Therefore, if wethreshold the picture Di,j(k, k + n) as

Di,j(k,k+n)= I if Di,j(k,k+n)>T0 otherwise, (2)

then we can detect the rear region of the moving object at thekth frame. The threshold T is determined by a histogram ofvalues in Dij (k, k + n). This simple method is computation-ally fast and is sufficient to detect approximate locations ofmoving objects in the scene. Therefore, it is used to locate ob-jects in the coarse mode analysis. The parameter n is selectedas 1 in the coarse mode analysis, since the coarse data are sam-

pled at slow rate (2 frames/s), and fish move faster than thisframe rate.From this coarse information on the locations of moving ob-

jects, a process of next finer sampling is planned. Let us repre-

sent the locations of moving objects at each frame by theircentroids Pi(xi(k), y1(k)), i = 1, * *ON, k = 1, * , N, whereON is a number of moving objects. Then, we first obtain thecentroid of moving objects P(x(k), y(k)) as

Fig. 5. Examples of input pictures taken in FINE MODE. (a) 1st frame.(b) 3rd frame. (c) 5th frame.

ONx(k) = E xi(k)ION,

i=i

ONy(k) = E3 yi(k)ION

i= 1

and calculate the movement of the centroid from the kthframe to the (k + I)th frame as

M(k, k + 1) = V(x(k) - x(k + 1))2 + (y(k) - y(k + 1))2.

This M(k, k + 1) shows an average movement of objects fromthe kth frame to the (k + I)th frame. Next, we obtain a ratioof the movement M(k, k + 1) to a given constant Mc asR(k, k + 1) = M(k, k + 1)/M,. Then, in the fine mode analysisframes are sampled at the rate of R times of the coarse sam-pling (that is 2 X R frames/s) during kth frame to (k + 1)thframe of the coarse data. Sampling regions of the selectedframes are determined in such a way that regions cover themoving objects detected in the coarse mode analysis.

C. Sampling ofFine DataWhen the process of the finer sampling has been determined,

then the videotape is rewound and replayed for finer sampling.At this time, selected regions of the selected frames are sampledin FINE MODE. Since the average rate of sampling is about8 frames/s, each sequence usually consists of 20-250 pictures.Fig. 5 shows examples of input pictures taken in FINE MODE.

14

YACHIDA etal.: AUTOMATIC ANALYSIS OF MOVING IMAGES

OErTECrTO PROcESS: Finds moving objects In thefirst frame

TRACKING PROCESS: Tracks the moving objects Inthe subsequent frames using the results of theprevious frames

SACETRACKINO PROCESS: Reanalyze those objectsthat were misinterpreted in the trackingprocess

Fig. 6. Block diagram of fine mode analysis system.

Fig. 8. Gradient picture of Fig. 5(a).

(a) n=O (b) n=l (c) n=2 (d) n=.

Is. (k+n)

. .- I --

0 5 10 15 GRAY

Gray level histogram of the local region enclosed by a rectanglein Fig. 5(a).

D. Fine Mode AnalysisThese fine data are, then, analyzed by the fine mode ana-

lyzer to detect and track the moving objects in these sequencesmore precisely and measure quantitative parameters, such as

velocities, which are of interest to research biologists. A blockdiagram of the fine mode analysis system is shown in Fig. 6.The system consists of three processes: a detection processwhich finds moving objects in the first frame, a tracking processwhich tracks the moving objects in the subsequent frames us-

ing the results of the previous frames, and a backtracking pro-

cess which reanalyzes those objects that were misinterpretedor unknown in the tracking process.

IV. DETECTION OF MOVING OBJECTSIN THE FIRST FRAME

A priori knowledge available in detecting objects is that ob-jects are darker than the background in their neighborhoods.Then a reasonable method to detect objects is to scan the pic-ture and detect the regions having gray levels smaller than a

certain threshold G. Since the gray levels of the objects andthe background vary depending on the locations, the thresholdG should be adapted to these variances. A usual method to de-termine G is to compute gray level histograms of local regionsand then decide G as a gray level between two prominentpeaks corresponding to the object and the background [8].However, it is difficult to apply this method for our problemsince no prominent peaks can be found in the local histogramof gray levels because the objects are usually blurred and theirshadows have intermediate gray values between those of theobjects and the background as shown in Fig. 7. The edge de-tection method is also difficult to apply since it is not easy to

Di, j(k,k+n)

U (k,k+n)

size

yJ/

3 (e) n=4

6>/ 'I

t

Fig. 9. Successive temporal-differentiation method.

discriminate edges of the objects from noise caused by shadows,waves, and an edge of the vat, as seen in Fig. 8. Since it is noteasy to detect objects from the spatial information on gray

values, we use temporal changes of gray values.

A. Successive Temporal-Difference MethodAlthough the temporal-difference method described in Sec-

tion III-B is good for locating objects, it is difficult to extractprecise shapes of objects in the fine data by this method. Forexample, if consecutive frames (n is small) are compared, onlya part of the object can be detected, since it will not movemuch as shown in Fig. 9(b), while if distant frames (n is large)are compared, the object at Dij(k, k + n) will be largely de-formed by other objects as, shown in Fig. 9(d). To solve thisproblem, we use the following procedure.

1) Obtain the subtracted pictures

Dij,(k,k + 1),--- ,D ,j(k,k +N)

No.of POINTS ( % )

12,5 -

10 -

7.5

5 _

2,5-

Fig. 7.

1 1 1 1 1 1 1 1 1 1 1

15


.*02 05

__ 03 04

07

Fig. 10. Result obtained by the moving object detector from the pic-ture shown in Fig. 5(a).

Edge points having large Boundary of objectgradient value detected by temporal

differentration

Search regionfor edges

Fig. 11. Method to determine the threshold of each object.

between Iij (k) and Iij(k + n) for n = 1, . - * ,N and thresholdthe pictures to obtain the rear regions of objects.2) Take the union of the obtained regions Ui,j(k, k + n) de-

fined as

Uij(k,k+n)= Ui,j(k,k+n - 1)UDi,(k,k+n).The procedure is illustrated in Fig. 9. An advantage of this

method is that the shape of the object is retained in the unionU(k, k + n), even if the object at the kth frame is occluded byother objects at later frames. The appropriate value of the pa-rameter N will be that the object at (k + N)th frame is sepa-rated from the object at the kth frame, as shown in Fig. 9(d).This is easily recognized by calculating the size of the detectedregion of U(k, k + n). The size of U(k, k + n) increases accord-ing to n, and becomes constant after the object is separatedfrom the object at the kth frame. Therefore, the size ofU(k, k + n) is calculated for each object and the above-men-tioned procedure is terminated if it becomes constant. A re-sult obtained by this method from the input picture shown inFig. 5(a) is shown in Fig. 10.

B. Boundary Detection Guided by the Temporal InformationSince the objects detected by the temporal-difference method

are deformed by shadows, as can be seen in Fig. 10, gray valuesin their neighborhoods are examined to obtain more preciseboundaries of the objects. Fig. 11 shows a method to deter-mine the theshold to separate the object from the background.We first set a search region around the boundary of each of thedetected objects and then find edge points having large gradientvalues in this region. Since these points can be considered aspoints on the true boundary of the object, the gray value whichis most frequently observed at these points is determined asthe threshold G to find the boundary. Then, the boundary ofeach object is obtained by tracing those points having grayvalues less than the threshold G. Fig. 12 shows the boundarydetected by the above procedure from the input picture shownin Fig. 5(a). In the figure it will be noticed that the object 07is separated from 05 and °6 that could not be discriminatedby the successive temporal-difference method.

01

*7J0203 A~~~

Fig. 12. Boundaries of objects extracted by the boundary detector.

IBUUNUNARY UDET1ECTUIO -_

0

- MATCHING ROUTINES]Yes -

SUCCESSFUL? 3No

ANALYZERI r_

UPDATE MODE

Nex't YesObject MRE OBJEC_T?

NoPROCESS NEXT FRAME

Fig. 13. Block diagram of the tracking process.

P(02}~~~~P06

CCZ:Z_( P(03) 04

Fig. 14. Prediction map generated from the model of the third frameM3.

C. Model of the First FrameWhen the moving objects have been detected in the first

frame, a model of the first frame M1 (01 , 02, -* * ) iS gener-ated, where a model of each object Ml (0°) is given the follow-ing descriptions: 1) area, 2) thinness ratio, 3) x,y coordinatesof the centroid, 4) threshold, and 5) chain coded boundary.Since there are small differences of gray levels between objects,we cannot discriminate as to whether an object is occluded ornot. Therefore, if the objects are occluded in the first frame,they are interpreted as a single object.

V. TRACKING THE OBJECTS USINGTHE PREVIOUS MODELS

A. Prediction from Previous ModelsA block diagram of the tracking process is shown in Fig. 13.

Models of previous frames Ml, * - *,Mk- 1 are used to analyzea picture of a new frame Fk. First of all, a predictor generatesa prediction map P(01 , 02, * -) from the last model Mk- 1and the velocities estimated from the last two models. Sincethe velocities are not known at the second frame, the modelof the first frame Ml is used as the prediction map. Fig. 14shows a prediction map for the fourth frame F4 generatedfrom the model of the third frame M3. In the figure, thickand thin lines show the result of the third frame and the pre-diction for the fourth frame, respectively. The prediction mapis first used to predict whether an object is occluded or not,and which objects are overlapping which others. If the predic-tion P(Oi) of an object °i is intersected by predictions of

16

YACHIDA et al.: AUTOMATIC ANALYSIS OF MOVING IMAGES

other objects, then O, is predicted as "occluded" in the cur-rent frame. Those objects whose predictions are intersectedwith each other, such as the objects 05 and 06 in Fig. 14, arecalled overlapping objects and are denoted by °C1, 0C2, .* *

B. Boundary Detection Guided by the Last ModelWhen the prediction map has been generated, then each ob-

ject is searched for in the picture of the current frame Fk. Thethreshold method described before is used to detect the bound-ary of an object Oi at the predicted location P(Oi). Thethreshold utilized to detect 0i at the last frame Fk l1is usedto detect 0i in the current frame, since gray level change ofobjects is usually small in consecutive frames. If the object 0iis predicted as "occluded," then the boundary is searched forin the union of P(OC1 ),P(0),, * * *, where O°C, c2XC2 areoverlapping objects defined before.

C. Comparing with the Last ModelWhen the boundary has been obtained, then the obtained re-

gion R is compared to the prediction. There are two matchingroutines: Rl and R2. If the object 0i is predicted as "not oc-cluded," then RI is used. In Rl descriptions (area and thin-ness ratio) of the obtained region R are compared to those ofMk- 1 (0i). If the relative error of each of the descriptors is lessthan a certain threshold (20 percent), then the obtained regionR is considered as "matched" to Mk (0i). Then the modelMk is updated by the descriptions of the obtained region R.

D. Occlusion ProblemIf the object is predicted as "occluded," then R2 is used to

match the obtained region R to the predictions of overlappingobjects P(Ocl),P(Oc2),- - . Chow et al. [6] proposed a

method that matches descriptions of the obtained region R tothose of the union OfP(OCI ), P(OC2),-- . The disadvantagesof this method are: 1) positions and shapes of the objects arenot updated by the obtained result R, but only estimated fromtheir velocities and shapes before occlusion; and 2) therefore,it must be assumed that velocity and shape of each object areconstant as long as objects are occluded.To solve the above mentioned problem, we propose a new

method to match R to the overlapping objects Oc1 °c2, * - -

In this method, the obtained region R is first segmented intosubregions R1 , R2,... utilizing predictions of objects, andthen each subregion is matched to each prediction. The regionR is segmented as follows:

1) An intersecting region of R with P(Oc) is colored as Rifor i = 1, 2,* ,as shown in Fig. 15(a).2) Those points in R that are not colored in the above pro-

cess are colored by the color that occurs most frequently inthe 5 X 5 neighbors, as shown in Fig. 15(b).When the region R is segmented into subregions R ,R2,

then the boundary ofRi is matched to the boundary ofP(O0i)to find the best matched location. In this boundary matching,only those points colored by one color as shown by a solid linein Fig. 15(c) are used, since the boundary points colored bymultiple colors are estimated ones from the predictions. Letus denote the boundary points of Ri and P(O,i) as B, andBi(j = I * * *, n), respectively, then a matching error is mea-

sured by

Fig. 15. Matching process of occluded objects. Solid and broken linesshow the obtained region R and the predicted location P(Ocg) of eachobject, respectively.

esss

.cs,,'..'. .

°5 ..... 06, ....

Fig. 16. Matching process for the objects 05 and 06 shown in Fig.5(b). ,: Points classified as 05. #: Points classified as 06. $: Pointsclassified as "uncolored." M: Points classified as both of 05 and 06-

Ei =-,1a(B)}n tj=i

(3)

where a(Bi) = 0 if B;(j = 1, 2, - * * ) exist at the location ofB1,1, if they exist in the 3 X 3 neighbors of BJ, and 2 otherwise,and n is a number of points of B,. The P(O0i) is moved in5 X 5 neighbors and the best matching location is determinedas the location having least value ofEi.When all of the overlapping objects have been matched and

the prediction error has been corrected, the region R is seg-mented into subregions R1 , R2, ' again by the previouslymentioned procedure to obtain more precise segmentation us-ing the updated predictions as shown in Fig. 15(d). Each seg-mented region is used to update the models of these objectsin Mk.

Fig. 16 shows the result of the above mentioned process forthe objects 05 and 06 in Fig. 5(b). These objects were not oc-cluded in the previous frame, but predicted as occluded at thisframe. Since the object 05 moved faster than expected, manypoints are left uncolored when the obtained regionR is coloredby taking intersection of R with predictions as shown in Fig.16(a). These points are colored by examining colors of 5 X 5neighbors, as shown in Fig. 16(b). Then each region is matchedto each prediction and R is segmented again, as shown inFig. 16(c).

E. Recovering from Prediction ErrorIf matching is unsuccessful, then the analyzer first considers

that the gray level of the object has changed and therefore, thethreshold should be updated. The threshold is updated by thegray level of those points having large gradient values in theneighbors of the predicted location of the object, as describedin Section III-B. Then the boundary of the object is extracted

17


P(Q1)

Fig. 17. Four reasons of unsuccessful matching. Solid and broken lines

show the obtained region R and the predicted location of each object,respectively.

02

01

0Z5

,'4

O........

... .. .....

06

07

Fig. 18. Result of the fifth frame shown in Fig. 5(c). (a) Detected

boundary. (b) Closeup Of 05, 06, and 07 and the segmentation re-

SUlt. ,: 05, #: 06, @: 07, and M: 05 and 06*

by the updated threshold and matched to the prediction by

the matching routines. If matching is successful, then the model

Mk is updated. However, if it is still unsuccessful, then the an-

alyzer considers that the prediction was wrong and analyzes its

cause by comparing the obtained region R and the predictions.

There are four cases, as follows.

1) The object that was predicted as "not occluded" is actually

occluded by some objects, or the objects are occluded by some

objects other than those that were predicted as the overlapping

objects. The analyzer discriminates in this case by examining

the intersection of the obtained region R with the prediction

map. If some objects other than those predicted as overlapping

have large intersection with R, such as P(03) in Fig. 18(a),

then the analyzer considers that this case happens and reana-

lyzes R by the new prediction. For example, 01 and 02 in

Fig. 17(a) were predicted as overlapping objects at first, how-

ever, since P(03) has large intersection with R, the analyzer

considers that01, 02, and03 are overlapping objects.

2) Some of the objects that were predicted as -"overlapping")are actually not overlapping, such as 03 in Fig. 17(b). If some

objects predicted as overlapping do not have a large intersec-

tion with R, then the analyzer considers that this case happens

and reanalyzes R by the new prediction. For example, in

Fig. 187(b), 01, 02,andt 3 are first predicted as overlapping

objects, however, since P(03) does not have large intersectionwith R, the analyzer considers that 01 and02 are overlappingobjects.

0)2...................

1ii 1Li 111111...M..IIliIllliIlillil 111111i£1ii11i11111111

03Fig. 19. Result obtained in the backtracking process for the objects 02

and 03 in Fig. 5(a). .: 02, i: 03, andM: 02 and 03.

3) The object that has been interpreted as a single object atprevious models is separated into two objects at this frame.This case happens for the objects that were occluded at thefirst frame since the occlusion cannot be discriminated. Theanalyzer compares the area of the prediction with R. If thearea of the prediction is much larger than R, then another ob-ject is searched for in the hatched part shown in Fig. 17(c).When another object is found, then the analyzer considers thatthis case happens and generates its model in Mk. Since the ob-ject has been misinterpreted as a single object in the previousmodels, they are marked as "misinterpreted" and reanalyzedin the backtracking process.4) Otherwise, the analyzer considers that the object cannot

be found at this frame. This case occurs when an object ismostly occluded by other objects. In this case, its model ismarked as uncertain and the prediction is given as its model atthis frame.The above process is iterated until all the objects in the pre-

diction map are found in the picture of the current frame.Fig. 18 shows the result of the fifth frame shown in Fig. 5(c).As can be seen from this figure, the objects 05, 06, and 07are overlapping each other. However, since P(07) does notintersect with P(05) or P(06), 05 and 06 are predicted asoverlapping, and 07 is predicted as "nonoccluded." This pre-diction error is recovered while the objects O5 and 06 are pro-cessed, since the obtained region R does not match with P(05)or P(06). Then the analyzer compares the obtained regionwith the predictions and recognizes that 05, 06, and 07 areoverlapping each other. Fig. 18(b) shows the result of segmen-tation of this region.

VI. BACKTRACKING THE OBJECTS THAT WERE MARKEDAS MISINTERPRETED OR UNCERTAIN

When all the frames are processed, then those objects markedas misinterpreted or uncertain are reanalyzed in the backtrack-ing process utilizing the results of later frames, where those ob-jects are successfully analyzed. The backtracking process usesthe same procedure as the tracking process, except that theframes are processed in the reverse order and only the objectsmarked as misinterpreted or uncertain are processed. For ex-ample, the objects 02 and 03 are overlapping at the flrstframe, as shown in Fig. 5(a), and thus, they were interpretedas a single object. However, they are separated at the secondframe, therefore their models in M1 are marked as misinter-preted in the tracking process. Then, at the backtracking pro-cess they are reanalyzed using M2 and M3 and are recognizedas two objects, as shown in Fig. 19.

18

YACHIDA et al.: AUTOMATIC ANALYSIS OF MOVING IMAGES

Fig. 20. A pass trajectory of the centroid of each object.

Fig. 20 shows a pass trajectory of the centroid of each objectfrom the first frame to the tenth frame. When the locations ofobjects are known, it is easy to obtain quantitative parameters,such as linear and angular velocities, relative distance betweenfishes, and so on.

VII. CONCLUSION

We have presented a system that detects and tracks movingobjects and measures quantitative parameters, such as linearvelocity from the records of natural processes. The system isvery efficient since, by the coarse mode analysis, it can selectregions and frames to be analyzed from the enormous amountof data contained in moving pictures, and since the results ofprevious frames are used to restrict the search regions of objectsand to predict the gray values of objects of the next frame.The computation time depends on the number of objects,complexity, and velocities of objects in the moving pictures.For the pictures shown in this paper, it was 0.3 s/frame forthe coarse mode analysis, 11 s for the first frame, and about1 s/frame for other frames. Therefore, it takes about 870 s toprocess the moving pictures of 100 s, 60 s for the coarse modeanalysis, and 810 s for the fine mode analysis.The prediction map was used to estimate whether an object

is occluded or not and which objects are overlapping each-otherif it is occluded. The predictions were correct for more than95 percent, since movements of objects are small in consecutiveframes. However, wrong predictions were made when an ob-ject moved at a different velocity from the expected one in theneighborhoods of other objects. Therefore, the analyzer wasprovided with options for examining the result obtained underthe wrong prediction and for proposing a new prediction.Those objects that are occluded in the first frame are reana-

lyzed using the result of a later frame where they are separated.However, if the objects are occluded from the first frame tothe last one, then they are misinterpreted as a single object.The objects that could not be found at previous frames arealso reanalyzed using the results of later frames where they aresuccessfully found. However, if the objects are missed and notfound in subsequent frames, then there are no provisions fordiscovering them in the current system. One of the methodsto discover them will be by using the moving object detectorwhen some objects could not be found in more than a certainnumber of frames. However, these two things mentioned abovehave not happened yet in the experiments made so far on var-ious fish under a variety of stimulus conditions. We have notapplied the system to objects other than fish; however, webelieve that the proposed system can be successfully appliedto an analysis of a wide variety of moving objects such asmicroorganisms.

ACKNOWLEDGMENT

The authors are grateful to Prof. R. Suzuki and Ms. S. Ma-shima of Osaka University for providing them with variouskinds of experimental data of fish and for very constructivediscussions.

REFERENCES

[11 W. N. Martin and J. Aggarwal, "Dynamic scene analysis," Comput.Graphics and Image Processing, vol. 7, pp. 356-3 74, 1978.

[21 H. H. Nagel, "Analysis for image sequences," in Proc. 4th Int.Conf. Pattern Recognition, 1978, pp. 186-211.

[3] J. L. Potter, "Scene segmentation by velocity measurements ob-tained with a cross-shaped template," in Proc. 4th Int. Conf Arti-ficil Intell., 1975, pp. 803-810.

[4] R. Jain, D. Militzer, and H. H. Nagel, "Separating non-stationaryfrom stationary scene components in a sequence of real world TV-images," in Proc. 5th Int. Joint Conf Artificial Intell., 1977, pp.612-618.

[5] R. T. Chien and V. C. Jones. "Acquisition of moving objects andhand-eye coordination," in Proc. 4th Int. Joint. Conf ArtificialIntell., 1975, pp. 737-741.

[6] W. K. Chow and J. K. Aggarwal, "Computer analysis of planercurvilinear moving images," IEEE Trans. Comput. vol. C-26, pp.179-185, 1977.

[7] M. Yachida, M. Asada, and S. Tsuji, "Automatic motion analysissystem of moving objects from the records of natural processes,"in Proc. 4th Int. Conf. Pattern Recognition, 1978, pp. 726-730.

[8] C. K. Chow and T. Kaneko, "Boundary detection of radiographicimages by a threshold method," in Frontiers of Pattern Recog-nition, S. Watanabe, Ed. New York: Academic, 1972.

search Fellow of theHis interests are inprocessing.

Masahiko Yachida was born in Okayama, Japan,in September 1945. He received the B.E. andM.Sc. degrees in electrical engineering and thePh.D. degree, all from Osaka University, Osaka,Japan, in 1969, 1971, and 1976, respectively.He joined the Intelligent Industrial Project at

the Electrotechnical Laboratory of the JapaneseGovernment from 1969 to 1970, and was a Re-search Associate at the Coordinated ScienceLaboratory, University of Illinois, Urbana, from1973 to 1974. Since 1971 he has been a Re-

Faculty of Engineering Science, Osaka University.the fields of artificial intelligence and picture

Minoru Asada was born in Shiga, Japan, onOctober 1, 1953. He received the B.E. andM.Sc. degrees in control engineering fromOsaka University, Osaka, Japan, in 1977 and1979, respectively.He is currently enrolled in the Ph.D. course

at Osaka University. He is interested in thefield of pattern recognition, particularly movingimage analysis.

Saburo Tsuji (SM'78) was born in Okayama,Japan, on January 27, 1932. He received theB.E. and M.Sc. degrees in electrical engineeringfrom Osaka University, Osaka, Japan, in 1953and 1955, respectively, and the Ph.D. degree in1965, from the same university.He joined the staff of the Electrotechnical

Laboratory of the Japanese Government in1955, where he became involved in research onhybrid computer systems and optimal controltheory. After receiving the Ph.D. degree, he

was appointed for a year as a Post Doctorate Fellow in the Department

19


of Industrial Engineering, University of Toronto. Since 1966 he has Project in 1968. At present, he is a Professor of Control Engineering,been working on bionics at the Electrotechnical Laboratory, where he Faculty of Engineering Science, Osaka University, and doing researchbecame Chief of the Bionics Section and began the Intelligent Industrial on machine intelligence and its labor-saving applications.

A Separable Median Filter for Image NoiseSmooth ing

PATRENAHALLI M. NARENDRA, MEMBER, IEEE

Abstract-This paper investigates some properties of the separable fil-ter resulting from successive applications of a one-dimensional medianfilter on the rows and columns of an image. Although the output ofthis separable filter is not identical to the corresponding nonseparabletwo-dimensional median filter with a square window, its performancein image noise smoothing is close. In particular, its effectiveness insmoothing noise and its behavior with edges are characterized and com-pared with those of the two-dimensional median filter. It is shown thatthe separable filter has a much simpler implementation in real-timehardware (at video rates, for example).

Index Tenns-Image enhancement, median filter, noise smoothing,real time, separable filter.

I. INTRODUCTIONMs EDIAN filtering of one-dimensional waveforms was sug-

gested by Tukey [1] and has since been widely adoptedfor two-dimensional image noise smoothing. The median filterhas been found to be more effective than a linear filter forsmoothing images with spiky noise distributions because ofextrema (outlier) rejection by the median. For example, themedian filter has found application as a postfilter after edgeemphasis filtering. Also, the median filter preserves monotonicstep edges, i.e., it does not blur sharp edges as a linear low-passfilter would. For this reason, the median filter is often used tosmooth images before applying an edge operator because theedge response would be damped if a linear filter has been usedinstead.Conventionally, a rectangular (or a square M X M) window

is used in two-dimensional median filtering. The intensity atevery point in the image is replaced by the median of the in-tensities of the points contained in the M X M window cen-tered at the point. This paper investigates a variant of the me-dian filter-the separable median filter or "median of medians"

Manuscript received March 22, 1979; revised January 17, 1980.This work was supported in part by the Army Night Vision Labora-

tory under Contract DAAG-76-C-0195.The author is with Honeywell Systems and Research Center, Min-

neapolis, MN 55431.

[2], for image smoothing applications. The separable medianfilter is a two-dimensional nonlinear filter which results fromsuccessive applications of a one-dimensional median filter ofsize M applied first along the rows and then along the columnsof an image (or vice versa). Although the result is not identicalto the nonseparable two-dimensional median filter (using aM X M square window), we shall see that the separable filter iscomparable in performance to the two-dimensional median fil-ter. Moreover, the separable formulation has a faster computerimplementation. Perhaps even more importantly, it is muchsimpler and easier to implement in real-time hardware than thenonseparable two-dimensional median filter.In Section II, asympotic properties of this separable median

filter are derived and compared with the conventional two-dimensional median filter. Its performance in smoothing noisefrom various distributions is shown to be comparable to thenonseparable median filter. Computer Monte Carlo simulationsbear out these results. The applications of this median flHterare illustrated in smoothing white noise, and as a postfilter foredge emphasis with examples from forward-looking infraredimagery. The edge preserving properties of the median filtersare illustrated with simple examples. Computational aspectsof the separable filter are discussed in Section III and the sub-stantial savings resulting from the separable structure are dem-onstrated. It is shown that, for fast real-time implementation(at video rates for example), the separable median filter mayweli be indispensable.

II. PROPERTIES OF THE SEPARABLE MEDIAN FILTER

Fig. 1 shows the separable median filter structure. The imagerows are first filtered by a one-dimensional median filter ofsize M. The resultant image is then filtered along the columnswith the same one-dimensional filter. This does not yield re-sults identical to the two-dimensional M X M median filter,but approximates it. Unlike a linear two-dimensional filterwith a separable kernel (anM XM averaging window, for exam-ple), the two-dimensional median filter is not separable.

0162-8828/81/0100-0020$00.75 © 1981 IEEE

20

Date post:	06-Nov-2016
Category:	Documents
Upload:	saburo
View:	222 times
Download:	3 times

Automatic Analysis of Moving Images

Documents