+ All Categories
Home > Documents > Object Tracking

Object Tracking

Date post: 18-Dec-2015
Category:
Upload: roychan1012
View: 240 times
Download: 2 times
Share this document with a friend
Description:
This paper clearly describes a new method of object tracking technnique.
Popular Tags:
15
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 3, MARCH 2013 503 A Directional-Edge-Based Real-Time Object Tracking System Employing Multiple Candidate-Location Generation Pushe Zhao, Hongbo Zhu, He Li, and Tadashi Shibata, Member, IEEE Abstract —We present a directional-edge-based object tracking system based on a field-programmable gate array (FPGA) that can process 640 × 480 resolution video sequences and provide the location of a predefined object in real time. Inspired by biological principle, directional edge information is used to represent the object features. Multiple candidate regeneration, a statistical method, has been developed to realize the tracking function, and online learning is adopted to enhance the tracking performance. Thanks to the hardware-implementation friendliness of the algo- rithm, an object tracking system has been very efficiently built on an FPGA, in order to realize a real-time tracking capability. At the working frequency of 60 MHz, the main processing circuit can complete the processing of one frame of an image (640 × 480 pixels) in 0.1 ms in high-speed mode and 0.8 ms in high-accuracy mode. The experimental results demonstrate that this system can deal with various complex situations, including scene illumination changes, object deformation, and partial occlusion. Based on the system built on the FPGA, we discuss the issue of very large- scale integrated chip implementation of the algorithm and self initialization of the system, i.e., the autonomous localization of the tracking object in the initial frame. Some potential solutions to the problems of multiple object tracking and full occlusion are also presented. Index Terms—Directional edge feature, field-programmable gate array (FPGA) implementation, multiple candidate regen- eration, object tracking, online learning, particle filter, real time. I. Introduction O BJECT tracking plays an important role in many ap- plications, such as video surveillance, human–computer interface, vehicle navigation, and robot control. It is generally defined as a problem of estimating the position of an object over a sequence of images. In practical applications, however, there are many factors that make the problem complex, such Manuscript received December 16, 2011; revised March 19, 2012 and May 24, 2012; accepted June 12, 2012. Date of publication September 10, 2012; date of current version March 1, 2013. This paper was recommended by Associate Editor T.-S. Chang. P. Zhao and T. Shibata are with the Department of Electrical Engineering and Information Systems, School of Engineering, University of Tokyo, Tokyo 113-0032, Japan (e-mail: [email protected]; [email protected]). H. Zhu is with the VLSI Design and Education Center, University of Tokyo, Tokyo 113-0032, Japan (e-mail: [email protected]). H. Li is with the Department of Information and Communication Engineer- ing, Graduate School of Information Science and Technology, University of Tokyo, Tokyo 113-0032, Japan (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCSVT.2012.2210665 as illumination variation, appearance change, shape deforma- tion, partial occlusion, and camera motion. Moreover, lots of these applications require a real-time response. Therefore, the development of real-time working algorithms is of essential importance. In order to accomplish such a challenging task, a number of tracking algorithms [1]–[6] and real-time working systems [7]–[12] have been developed in recent years. These algorithms usually improve the performance of the object tracking task in two major aspects, i.e., the target object representation and the location prediction. In the location prediction, the particle filter [13] shows a superior tracking ability and it has been used in a number of applications. It is a powerful method to localize target, which can achieve high-precision results in complex situations. Some works have proposed improvements based on the particle filter framework for better tracking abilities in very challenging tasks [6]. Despite the better performance of these algorithms with more complex structures, they suffer from the high computational cost that prevents their implementation from working in real time. Some implementations using dedicated processors always result in power-hungry systems [10], [14]. Many implemen- tations parallelize the time-consuming part of algorithms, thus increasing the processing speed to achieve real-time performance [15]–[17]. These solutions depend heavily on the nature of algorithms and the performance enhancement would be limited if the algorithms are not designed for efficient hardware implementation. Some specific implementations can be employed to speed up a certain part of the algorithm, such as feature extraction [18] or localization [19]. In this case, it is necessary to consider how to integrate them into the total system most efficiently. Several problems may arise when building parallel systems, such as transmission of large amount of data. In this paper, we have explored a solution to the object tracking task that considers an efficient implementation as the first priority. A hardware-friendly tracking framework has been established and implemented on field-programmable gate array (FPGA), thus verifying its compatibility with very large- scale integration (VLSI) technology. Several problems that limit the hardware performance, such as complex computation, data transmission, and cost of hardware resources, have been resolved. The proposed architecture has achieved 150 frames per second (f/s) on FPGA, and if it is implemented on VLSI 1051-8215/$31.00 c 2012 IEEE
Transcript
  • IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 3, MARCH 2013 503

    A Directional-Edge-Based Real-Time ObjectTracking System Employing Multiple

    Candidate-Location GenerationPushe Zhao, Hongbo Zhu, He Li, and Tadashi Shibata, Member, IEEE

    AbstractWe present a directional-edge-based object trackingsystem based on a field-programmable gate array (FPGA) thatcan process 640480 resolution video sequences and provide thelocation of a predefined object in real time. Inspired by biologicalprinciple, directional edge information is used to represent theobject features. Multiple candidate regeneration, a statisticalmethod, has been developed to realize the tracking function, andonline learning is adopted to enhance the tracking performance.Thanks to the hardware-implementation friendliness of the algo-rithm, an object tracking system has been very efficiently builton an FPGA, in order to realize a real-time tracking capability.At the working frequency of 60 MHz, the main processing circuitcan complete the processing of one frame of an image (640480pixels) in 0.1 ms in high-speed mode and 0.8 ms in high-accuracymode. The experimental results demonstrate that this system candeal with various complex situations, including scene illuminationchanges, object deformation, and partial occlusion. Based on thesystem built on the FPGA, we discuss the issue of very large-scale integrated chip implementation of the algorithm and selfinitialization of the system, i.e., the autonomous localization ofthe tracking object in the initial frame. Some potential solutionsto the problems of multiple object tracking and full occlusionare also presented.

    Index TermsDirectional edge feature, field-programmablegate array (FPGA) implementation, multiple candidate regen-eration, object tracking, online learning, particle filter, real time.

    I. Introduction

    OBJECT tracking plays an important role in many ap-plications, such as video surveillance, humancomputerinterface, vehicle navigation, and robot control. It is generallydefined as a problem of estimating the position of an objectover a sequence of images. In practical applications, however,there are many factors that make the problem complex, such

    Manuscript received December 16, 2011; revised March 19, 2012 and May24, 2012; accepted June 12, 2012. Date of publication September 10, 2012;date of current version March 1, 2013. This paper was recommended byAssociate Editor T.-S. Chang.

    P. Zhao and T. Shibata are with the Department of Electrical Engineeringand Information Systems, School of Engineering, University of Tokyo, Tokyo113-0032, Japan (e-mail: [email protected]; [email protected]).

    H. Zhu is with the VLSI Design and Education Center, University of Tokyo,Tokyo 113-0032, Japan (e-mail: [email protected]).

    H. Li is with the Department of Information and Communication Engineer-ing, Graduate School of Information Science and Technology, University ofTokyo, Tokyo 113-0032, Japan (e-mail: [email protected]).

    Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

    Digital Object Identifier 10.1109/TCSVT.2012.2210665

    as illumination variation, appearance change, shape deforma-tion, partial occlusion, and camera motion. Moreover, lots ofthese applications require a real-time response. Therefore, thedevelopment of real-time working algorithms is of essentialimportance. In order to accomplish such a challenging task, anumber of tracking algorithms [1][6] and real-time workingsystems [7][12] have been developed in recent years.

    These algorithms usually improve the performance of theobject tracking task in two major aspects, i.e., the target objectrepresentation and the location prediction. In the locationprediction, the particle filter [13] shows a superior trackingability and it has been used in a number of applications. Itis a powerful method to localize target, which can achievehigh-precision results in complex situations. Some works haveproposed improvements based on the particle filter frameworkfor better tracking abilities in very challenging tasks [6].Despite the better performance of these algorithms with morecomplex structures, they suffer from the high computationalcost that prevents their implementation from working in realtime.

    Some implementations using dedicated processors alwaysresult in power-hungry systems [10], [14]. Many implemen-tations parallelize the time-consuming part of algorithms,thus increasing the processing speed to achieve real-timeperformance [15][17]. These solutions depend heavily on thenature of algorithms and the performance enhancement wouldbe limited if the algorithms are not designed for efficienthardware implementation. Some specific implementations canbe employed to speed up a certain part of the algorithm, suchas feature extraction [18] or localization [19]. In this case,it is necessary to consider how to integrate them into thetotal system most efficiently. Several problems may arise whenbuilding parallel systems, such as transmission of large amountof data.

    In this paper, we have explored a solution to the objecttracking task that considers an efficient implementation asthe first priority. A hardware-friendly tracking framework hasbeen established and implemented on field-programmable gatearray (FPGA), thus verifying its compatibility with very large-scale integration (VLSI) technology. Several problems thatlimit the hardware performance, such as complex computation,data transmission, and cost of hardware resources, have beenresolved. The proposed architecture has achieved 150 framesper second (f/s) on FPGA, and if it is implemented on VLSI

    1051-8215/$31.00 c 2012 IEEE

  • 504 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 3, MARCH 2013

    with on-chip image sensor, it is possible that it achieves theframe rate as fast as 900 f/s.

    Since our solution provides a high flexibility in its config-uration, it can be integrated into a lot of other more complexintelligent systems as their subsystems. Due to its real-timeperformance much faster than the video rate, it would providea lot of opportunities for building real-time-operating highlyintelligent systems.

    In tracking algorithms, how to represent the target imageis of particular importance because it greatly influences thetracking performance under certain tracking framework. Color,edge, and texture are typical attributes used for representingobjects [20], [21]. A number of other features, including activecontour [11], scale-invariant feature transform (SIFT) feature[22], oriented energy [5], and optical flow [23], are also usedin many works. Some works also combine these features orincorporate online learning of the model of an object andbackground [2], [4], [21], [24]. In our research, we aim toestablish both the robustness of object representation andthe real-time performance of the processing because featureextraction is usually a time-consuming process.

    It is well known that animals have excellent ability invisual tracking, but the biological mechanism has not yet beenclarified. However, it was revealed that the visual perception ofanimals relies heavily on directional edges [25]. In this paper,therefore, the directional-edge-based image feature represen-tation algorithm developed in [26] is employed to representthe object image. Robust performance of the directional-edge-based algorithms has already been demonstrated in variousimage recognition applications. In addition, dedicated VLSIchips for efficient directional edge detection and image vectorgeneration have also been developed for object recognitionsystems [27], [28].

    The purpose of this paper is to develop a real-time objecttracking system that is robust against disturbing situations likeillumination variation, object shape deformation, and partialocclusion of target images. By employing the directional-edge-based feature vector representation, the system has been maderobust against illumination variation and small variation inobject shapes. In order to achieve real-time performance intracking, a VLSI hardware-implementation-friendly algorithmhas been developed. It employs a statistical approach, in whichmultiple candidate locations are generated during tracking.The basic idea was inherited from the particle filter butthe algorithm has been greatly modified and simplified fromthe original particle filter so that it can be implementedin VLSI hardware very efficiently. The algorithm was firstproposed in [29] and the performance was verified by onlysimulation. In this paper, however, the algorithm was actuallyimplemented on an FPGA, and the real-time performance androbust nature have been demonstrated by the measurementof the working system. In order to further enhance the ro-bustness of the tracking ability, an online learning techniquehas been introduced to the system. When the target objectchanges its appearance beyond a certain range, the systemautonomously learns the altered shape as one of its variations,and continues its tracking. As a result, for a large variationin the shape and for partial occlusion, the system has also

    Fig. 1. Process of MCR.

    shown a robust performance. The system was implemented ona Terasic DE3 FPGA board. Under the operating frequency of60 MHz, the experimental system achieved a processing abilityof 0.8 ms/frame in tracking a 64 64 scale object image in640 480-pixel size video sequences.

    Object tracking is still a challenging task for application inreal world due to different requirements in complex situations.Based on the tracking system developed in this paper, wealso proposed solutions to some important tracking problems,which was not included in the algorithm of [29]. We havedesigned a flexible architecture for multiple target tracking,using only a limited number of parallel processing elements.A new image scanning scheme has been explored to realizeautomated initialization of the tracking system instead ofmanual initialization. In this scheme, the image of the trackingtarget is autonomously localized in the initial frame of imagesequences. The same scheme has also been used to solve agroup of similar problems, full occlusion, target disappearancefrom the scene, and accidental loss of the target image, whilerequiring only a few additional logic functions in the circuitry.

    This paper is organized as follows. The directional edgefeatures and the tracking algorithm are explained in Section II.The implementation of this tracking algorithm on hardware isdescribed in Section III. Experiments and performance com-parison are presented in Section IV. Advanced architecturesfor more difficult situations are discussed in Section V. Finally,conclusions are drawn in Section VI.

    II. Algorithm

    The most essential part of this algorithm is a recursiveprocess called multiple candidate regeneration (MCR), whichis similar to the prediction and update in the particle filter. Thetask of object tracking in a moving image sequence is definedas making a prediction for the most probable location of thetarget image in every consecutive frame. The iteration processis shown in Fig. 1.

  • ZHAO et al.: A DIRECTIONAL-EDGE-BASED REAL-TIME OBJECT TRACKING SYSTEM 505

    Fig. 2. Simplified four-candidate example illustrating weight computationand candidate regeneration.

    At the very beginning of the tracking (the initializationstage), the target image is specified manually by enclosing animage by a square window and the center coordinates (x, y) ofthe window is defined as the image location. The target imageenclosed in the window serves as a template in the followingtracking process. At the same time, a fixed number of candi-date locations are generated as possible locations for search inthe next frame. In the initialization, these candidate locationsare uniformly placed around the target image location so thattheir average location coincides the target location.

    In the second frame, the similarity between the target imageand the local image at each candidate location is calculated anda weight is assigned to each location based on the similarity.The larger the similarity is, the larger the weight is assigned.Then, new candidate locations are regenerated reflecting theweight (similarity) at each location. Namely, a larger numberof new candidate locations are generated where the weightis large. The average of newly generated candidate locationsyields the new target location in the second frame. The processcontinues iteratively for each coming frame.

    Fig. 2 illustrates the procedure of weight computationand regeneration of new candidate locations using a simpleexample with only four candidates. In the previous frameshown at the top, there are four candidate locations representedby black dots around the target image of a smiling face. Inthe present frame below, the target moves to the right andcomes closer to location 3. The dotted line squares indicate thelocal images at candidate locations. The images at candidatelocations are matched with the template image and the weightsare calculated according to their similarities, which are shownas solid black circles below. A larger similarity correspondsto a larger weight, being represented by a larger solid circle.Then the same number (four) of new candidate locations aregenerated in the regeneration process, following the rule thata candidate with a higher weight regenerates more new can-didates around its location. The old candidates are discardedafter regeneration so that the total number of candidates staysconstant. As shown at the bottom, four new candidates aregenerated and the average of their locations yields the mostprobable location of the target in the present frame.

    The MCR inherits the basic philosophy of the particle filter.However, the algorithm has been greatly simplified so that it

    can be most efficiently implemented in the VLSI hardware. Inparticular, its application is focused only on object tracking.Thanks to the high frame-rate processing capability of VLSIchips, the target object under pursuit does not move a lotin consecutive frames, and therefore, the search area can berestricted to a small range. As a result, building a very efficientobject tracking system has been made possible.

    In the following, the entire algorithm is explained in detail,including the representation of object image, weight genera-tion, candidate regeneration, and the online learning function.They are all designed specifically aiming at easy and efficienthardware implementation.

    A. Algorithm StructureFig. 3 shows the structure of the algorithm. The principal

    component is the MCR block. The algorithm starts withthe initialization block at the beginning, which sets up allnecessary parameters, including candidate locations and thetarget template. The candidate container and the templatecontainer are two memory blocks that store the candidatelocations and feature vectors of the templates, respectively.Initialization is carried out with the first frame image, wherethe target for pursuit is identified by enclosing the imagewith a square window as shown at the top right. This isdone manually. The points in the tracking window representlocations of candidates. These points are distributed uniformlyin the tracking window in the initialization step and storedin the candidate container. A feature vector of the target isgenerated from the image in the tracking window and storedin the template container. Throughout the algorithm, we usereduced representation of local images, and the procedure offeature vector representation is explained later in this section.

    There are two loops in this algorithm, loop A and B, asshown in Fig. 3. In loop A, the output of MCR is sent backto the candidate container as inputs to the next iteration. TheMCR keeps updating the candidate distribution whenever thereis a new frame coming. One example is shown at the bottomright in Fig. 3, in which the points are candidate locationsand the square is located at the center of gravity of all thecandidates at the present time. This yields the most probablelocation of the target in the present frame. Loop B representsthe process of learning feedback. The online learning blockgenerates new templates during the tracking process and storesnew templates into the template container. This process isexplained in detail at the end of this section.

    In summary, the algorithm starts from initialization blockusing the first frame of the image and then processes each newincoming frame and outputs the target location continuouslyuntil there is no more input image.

    B. Object RepresentationAs explained in Section II-A, in order to calculate the

    weight of each candidate, we need to evaluate the similaritybetween the candidate image and the template image. Thisis done by calculating the distance between the two featurevectors representing the two images. Therefore, employing asuitable feature representation algorithm is very important. We

  • 506 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 3, MARCH 2013

    Fig. 3. (a) Main structure of the present object tracking algorithm. (b) Examples of candidate points distribution in the initial frame and in a new frame.

    Fig. 4. Feature extraction from a 6464-pixel grayscale image and conver-sion to a 64-D feature vector [30].

    employed the directional-edge-based image representation al-gorithm [30][32] that was inspired by the biological principlefound in the animal visual system [25]. This method needsonly the grayscale information of an image as input and theoutput is a 64-D feature vector. It consists of three successivesteps: local feature extraction (LFE), global feature extraction(GFE), and averaged principal-edge distribution (APED) [30].Fig. 4 shows the function of each step.

    1) Local Feature Extraction: The function of LFE is toextract the edge and its orientation at each pixel location inan image. For every pixel location, the convolutions of a 55pixel region with four directional filtering kernels (horizontal,+45, vertical, 45) are calculated as shown in Fig. 5.Then, the absolute values of these four convolution resultsare compared, and the maximum value and its corresponding

    Fig. 5. Process of directional edge detection using 5 5-pixel filteringkernels.

    orientation are stored as the gradient and edge orientation atthis pixel location, respectively.

    2) Global Feature Extraction: The gradient map producedin the previous step contains edge orientation at all pixel sites.In this step, only the edges of significance are left by settinga threshold to the gradient map. All the gradient data aresorted and we find out a certain number of pixels that havelarger gradient values than others. The pixel locations withthese larger gradients are marked as edges in four-directionaledge maps. The number of edges to be left is specified by apercentage to the total pixel number.

    3) Averaged Principal-Edge Distribution: Although theinformation has been compressed by extracting edges in LFEand GFE, the amount of information is still massive inquantity. Therefore, a method called APED [30] is employedto reduce the four edge maps into a 64-D vector. In theAPED vector representation, each edge map is divided into16 square bins and the number of edge flags in each bin issummed up, which constitutes an element of the vector. The64-D feature vector is the final output of the feature extraction

  • ZHAO et al.: A DIRECTIONAL-EDGE-BASED REAL-TIME OBJECT TRACKING SYSTEM 507

    processing and is used throughout the entire algorithm as therepresentation of local images, including candidate images aswell as template images.

    C. Weight Computation and Candidate RegenerationSince the basic principle has already been explained, how

    to implement it is described here. In order to make allcomputations easily and efficiently implementable in the VLSIhardware, each mathematical operation was replaced by ahardware-implementation friendly analogue, which are differ-ent from that in the regular particle filter algorithm.

    The local image taken from each candidate location isconverted to a feature vector and the Manhattan distancesare calculated with template vectors. In this algorithm, thereare more than one templates in the template container torepresent the target. The first template is generated at theinitialization step, while others are generated during the onlinelearning process. Therefore, the minimum Manhattan distanceis utilized to determine the weight of this candidate describedas follows:

    MDi,j =n

    k=1

    VCi[k] VTj[k] (1)Di = min(MDi1, MDi2, . . . , MDin) (2)

    Wi =

    {0, (Di C)INT[N0 (1 Di/C)], (Di < C). (3)

    Here, MDij stands for the Manhattan distance betweenthe candidate i and the template j, and VCi[k] and VTj[k]denote the kth element of the candidate vector VCi and thetemplate vector VTj , respectively. Di is the minimum distanceof candidate i with all the templates and Wi represents theweight for the candidate i. N0 is a constant value determiningthe scale of the weight. In (3), C is a threshold defining thescale of weight values, which is determined by experiments.INT means taking the integer component of the value. In thismanner, those candidates that have at least one Manhattandistance value smaller than the threshold C are all preserved toregenerate new candidates in the next frame. At the same time,larger weight values are assigned to candidates with smallerdistances.

    In the third step, new candidates are regenerated as de-scribed below. First, the maximum weight value Wmax is foundand it is used as a threshold number (Nth) for new candidateregeneration. Note that Nth = Wmax ( N0) is an integernumber. At old candidate locations whose weights are equalto Wmax, new candidate is generated in the vicinity at eachlocation. Then the threshold number is decreased by one anda new threshold is obtained as Nth = Wmax 1. Then allweight values are compared again with the new thresholdnumber, and at those old candidate locations whose weightsare greater than or equal to Nth, one more new candidateis generated in each vicinity. Then, Nth is decreased by oneagain (Nth = Wmax 2). The process is repeated until thetotal number of new candidates reaches a constant number N.After obtaining N new candidate locations, old candidates areall discarded.

    Fig. 6. Object tracking system implementing the algorithm developed in thispaper.

    D. Online LearningIn many practical applications, the target we are concerned

    about is a nonrigid object, which may change its appearanceand size. In addition, sufficient knowledge about the target is,in general, not available before tracking. This problem causestracking failure if the algorithm does not flexibly learn theappearance change in the target. An online learning methodis introduced to solve this problem in this paper. The learningprocess begins after the estimation of the target location. Onefeature vector is generated from the image at the target locationin the present frame. Then the Manhattan distances betweenthis feature vector and all the templates are calculated and theminimum distance is found. If the minimum distance is largerthan a certain threshold, it is interpreted as the target that haschanged its appearance substantially, and the feature vector isstored as a new template in the template container.

    III. Implementation

    This tracking system has been implemented on TerasicDE3 FPGA board that uses Altera Stratix III chip. TerasicTRDB-D5M camera is used as the image input device and aTerasic DE2 FPGA board is used for saving and displayingthe tracking result. A photo of this system is shown in Fig. 6.The following sections explain each part of the system andgive the evaluation of the processing time.

    A. Feature ExtractionThe feature extraction stage is implemented in three serially

    connected functional blocks: LFE, GFE, and vectorization. Inthis system, the image transmission from camera to FPGAboard is serial, one pixel per clock cycle. Therefore, at thisstage, we built the feature extraction block working in pipelinefor efficient computation. The whole system has eight suchunits working in parallel for efficient computation. Implemen-tation of each part is explained in the following paragraphsand a VLSI implementation for much faster processing speedis discussed later in Section V.

    The structure of LFE block is shown in Fig. 7. There arefour 68-stage shift registers, each serially connected, and theoutput of each shift register is inputted to the respective rowof a 55 register array. It shifts pixel data of 8 bits. The shiftregister stores the minimum size of image data necessary for

  • 508 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 3, MARCH 2013

    Fig. 7. Implementation of LFE block.

    Fig. 8. Implementation of GFE block.

    computation. The 55 register array works as a buffer betweenthe shift register and the logic block. The combinational logicblock deals with all the logic processing needed to calculatethe gradient and orientation in two clock cycles, includingdoing convolution with four 55 kernels, taking their absolutevalues, and storing the largest value. The intensity values ofan image are sent into the first row of the shift register and, atthe same time, into the top row of the register array pixel bypixel. The four rows of data in the shift register are shifted-into the corresponding lines of the 5 5 register array. In thismanner, the 5 5-pixel filtering kernel block scans the entireimage pixel by pixel and generates a directional gradient map.Because gradient values centered at the peripheral two rowsand two columns are not calculated, a 64 64 gradient mapis produced from a 68 68 image in 4626 clock cycles (twomore cycles for processing the last value).

    The following GFE block, as explained in the algorithm sec-tion, must implement the sorting function. Since we employeda hardware-friendly sorting algorithm, the processing time isonly related to the bit length of the data. This algorithm isbriefly explained in the following and the detail can be foundin [27].

    Suppose that we need to pick out the K largest data from agroup of data, the sorting starts from the most significant bits(MSBs) of the data. Before sorting, all the data are assigned amark of UNKNOWN. First, according to the value of MSB(1 or 0), the data are divided into two groups. The first grouphas all the data with 1 as MSB, while the second groupowns all the data with 0 as MSB. Then the system counts

    how many data are in the first group. If the number is lessthan K, it is certain that all the data in the first group belongto the K largest and the data are marked with YES. If thenumber is greater than K, all the data in the second group canbe discarded as not belonging to the K largest and are markedwith NO. The data left will be still marked UNKNOWN.In the second step, similar computation is repeated upon thesecond bit from MSB. The unknown data will be divided intotwo groups again, but the summation of the data in this stepwill also count in the data with mark YES. By repeatingthis procedure, all the largest K data will be marked withYES after processing all bits of the data. This is a parallelsorting method, which can be completed in several clockcycles theoretically. The difficulty in implementation is thatwe need an adder that sums up all single bits coming from allthe data. In this tracking system, there are totally 4096 datato process in GFE. It is not easy to implement a 4096-inputadder connected to 4096 15-bit registers. Therefore, we madea tradeoff between the speed and complexity, dividing the total4096 data into 64 groups. The implementation of this part isshown in Fig. 8.

    The 64 groups of data are processed in parallel and in apipelined way. The FLAG and MARK are used to repre-sent the state of each datum. The FLAG indicates whetherthe decision has already been made or still UNKNOWN,while the MARK tells whether the datum is marked withYES or NO. The 64 groups of data and default valuesof FLAG and MARK are all stored in their respectiveshift registers. Each shift register stores 64 data and owns one

  • ZHAO et al.: A DIRECTIONAL-EDGE-BASED REAL-TIME OBJECT TRACKING SYSTEM 509

    Fig. 9. Implementation of MCR block composed of weight generation block (left) and candidate regeneration block (right).

    output feeding back to its input. At the beginning, the shiftregister shifts data for 64 clock cycles and a 64-input adderwith accumulator sums up the MSB of all the data. In thenext loop of 64 clock cycles, the FLAG and MARK aremodified according to the summation result following the rulesexplained above. At the same time, the summation of the nextbit from all the data are summed up, which will be used in thenext loop. The calculation time for GFE is 1024 clock cycles.

    The output of GFE is a binary map that contains the edgeinformation. In the following step, this edge information iscompressed effectively into a feature vector representation asexplained in the algorithm section. We use shift registers andaccumulators to realize this function in a common way anddo not describe it in detail here.

    In computer vision, SIFT [33] is an effective algorithmto detect and describe local features. From the viewpoint ofhardware implementation, we compared the APED with SIFTto illustrate the performance. Implemented on VLSI and FPGA[18], [34], the time for computing one feature of SIFT hasbeen reduced to about 3300 clock cycles. In order to describea subimage, at least three features are necessary and morefeatures are needed to describe the scene. In this paper, thefeature extraction method takes about 5600 cycles to generatea global description of a candidate and in total 64 candidatesare needed. Since the processing unit is not complex, it is alsoconvenient to realize parallel processing.

    B. Multiple Candidate RegenerationThe next several blocks of the system, including weight

    generation and new location estimation, are explained in thissection. Fig. 9 shows the hardware structure of the weight gen-eration block and candidate regeneration block. A shift registeris used to store the templates. Each time when this block re-ceives a feature vector from feature extraction block, it sends astart signal to the template container and the template containershifts out all the templates to the weight generation block.Then, Manhattan distances between the feature vector and allthe templates are calculated one by one, and the minimum

    value of them is retained for calculating the weight. At last,the weight will be sent to the candidate regeneration block.

    In the candidate regeneration block, a shift register is usedto store the candidate locations, and the number of candidatesis 64 in our system. The candidate regeneration block firstcollects all the weights for 64 candidates and then counts downfrom the largest weight value, which is set to 15 [N0 = 15 in(3)] in this system. New candidate is generated when there isany candidate that has a weight value greater than the counter.As shown in Fig. 9, there are eight directions which thenew candidate can choose randomly. This avoids the problemthat all the candidates may tend to be generated in the samelocation. A 3-bit counter is used to represent the direction forregeneration. While generating new candidates in every clockcycle, the counter is added by one. Because the process ofdetermining whether to generate new candidate is not regular,the directions read from the counter will be like random values.For determining the distance between the old and the newcandidates, this system gives a small distance variance whenthe weight is large, because these candidates reflect the targetlocation better. A large distance variance is assigned when theweight is small, in order to produce a wide distribution that cancover the area for detecting the target. For example, weightslarger than 12, larger than 8, and less than 8 correspond todistance of 1 pixel, 2 pixels, and 4 pixels, respectively. Afterthe regeneration of new candidates, the new location is storedin the candidate container. The center of gravity of all thenew candidate locations is sent to both the display block andthe online learning block as the prediction of the new targetlocation. For the calculation of these blocks together, a typical620 clock cycles are needed and the maximum number ofclock cycles is 1024.

    C. Online LearningAfter receiving the estimation of the new target location

    from the candidate regeneration block, this online learningblock will extract the feature vector from the image at thenew target location. This feature vector is compared to all the

  • 510 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 3, MARCH 2013

    Fig. 10. Hardware organization of this tracking system. After receiving the image data from camera, this system first allocates the data into correspondingmemories. Then eight parallel candidate processing blocks work to process these data in parallel and output the weight of every candidate. These weightvalues are used to generate new candidate locations and the target location. The online learning block updates the templates according to the tracking resultin each iteration.

    TABLE IFPGA Resource Utilization Summary

    Combinational ALUTs Memory ALUTs Dedicated Logic Registers Time (Clock Cycles)Edge map generator 2253 1264 2175 4626Vector generator 553 144 629 64Weight generator 541 144 419 64Candidate regeneration 7272 0 5073 1024Online learning 5980 1660 8435 6802Total (entire system) 75 830 (28%) 22 504 (17%) 60 906 (23%)

    templates by using Manhattan distance to find the minimumdistance. If the minimum distance is greater than a certainthreshold, this feature vector will be stored into the templatecontainer as a new template. In order to start searching fornew target locations, only a limited region in the input imagethat is four times larger than the tracking window is stored forsaving memory resource.

    D. Overall StructureFig. 10 illustrates the overview of the hardware organization

    of this system. Table I shows a summary of FPGA resourceutilization of main processing blocks with processing time.After receiving the image data from the camera, this systemfirst allocates the data into corresponding memories. Theneight parallel candidate processing blocks process these data inparallel and output the weight of each candidate. Then, theseweight values are used to generate new candidate locationsand the target location. At the same time, the online learningblock updates the templates according to the tracking result

    in each iteration. In this system, we set up total 64 candidatesfor tracking. Considering the resource limitation of the FPGAboard, we divided the 64 candidates into eight groups. Alleight candidates in each group are processed in parallel andall eight groups are processed serially. In the experiments, ontracking with only eight candidates in total, the system stillshows tracking ability, but with some degradation in the per-formance. Therefore, this system can be operated in differentmodes to achieve the balance between the tracking speed andaccuracy. In the high-speed mode, the system handles a lessnumber of candidates for higher speed search, while in thehigh-accuracy mode, the system takes more time and handlesa larger number of candidates. At the working frequency of60 MHz, the typical processing time for one frame (640480pixel size) is 0.1 ms in the high-speed mode and 0.8 ms in thehigh-accuracy mode. Such a flexible configuration provides anopportunity of realizing multiple target tracking function witha fixed number of processing elements, which is discussed inthe discussion section.

  • ZHAO et al.: A DIRECTIONAL-EDGE-BASED REAL-TIME OBJECT TRACKING SYSTEM 511

    Fig. 11. Diagram illustrating data transfer in the tracking system (processing of one candidate).Data transfer is one of the most important issues in video

    processing system. Fig. 11 illustrates the data bandwidth ofthe connections between the functional blocks and memories.In Fig. 11, only the processing of one candidate is shown,but all types of connections in the system are included. It canbe observed that after the image data are transformed into avector, the quantity of the data becomes very small for com-putation. It is convenient to transfer and store these vectors.Some intensive connections can be found in the GFE part,which uses row-parallel processing to reduce computationaltime. We considered a balance between the parallelism andthe hardware resource, which is discussed in implementationof GFE in detail. In summary, the strategy for data computingin this system contains the following two aspects. First, thelarge amount of image data are transformed to feature vectorsin an efficient way. Second, we have achieved the balancebetween the parallelism and the resource consumption, andlimited the massive data transfer only in local regions.

    IV. Experiments

    We evaluated the tracking system by using a group ofchallenging video sequences and showed the real-time perfor-mance of the system. For evaluation on accuracy, we did theexperiments through software simulation. The program waswritten in a way that every logic operation is same with theimplementation on FPGA. For the experiments on the realsystem, output of the system was real-time displayed on amonitor screen and it was recorded by a video camera. Theresults shown in figures are images extracted from the video.In all experiments, the parameters are fixed, such as thresholdand number of candidates.

    A. Evaluation on AccuracyIn this section, we show the evaluation results of the

    proposed system by using several challenging video sequencesfrom a public database. For comparison, we adopted an evalua-tion methodology proposed in [35] and compared our systemwith tracking system in that work. Although this evaluationwas made through software simulation, we programmed insuch a way that every logic operation in program is samewith the implementation on FPGA.

    In Fig. 12, tracking results on these video sequences areshown. The features of these videos are the following: the

    TABLE IIComparisons: Precision at a Fixed Threshold of 20

    OAB SemiBoost Frag MILTrack This WorkSylvester 0.64 0.69 0.86 0.90 0.83David Indoor 0.16 0.46 0.45 0.52 0.88Cola Can 0.45 0.78 0.14 0.55 0.93Occluded Face 0.22 0.97 0.95 0.43 0.12Occluded Face 2 0.61 0.60 0.44 0.60 0.44Surfer 0.51 0.96 0.28 0.93 0.60Tiger 1 0.48 0.44 0.28 0.81 0.37Tiger 2 0.51 0.30 0.22 0.83 0.50Coupon Book 0.67 0.37 0.41 0.69 0.40

    Results show the percentage of how many successful predictions are madeover the total number of images in a video sequence.

    Sylvester and the David Indoor video sequences presentchallenging lighting, scale, and pose changes: the Cola Cansequence contains a specular object, which adds some diffi-culty, the Tiger sequences exhibit many challenges and containfrequent occlusions and fast motion (which causes motionblur), and the Coupon Book clip illustrates a problem thatarises when the tracker relies too heavily on the first frame.

    In Table II, we show the tracking accuracy at a fixedthreshold of 20. The threshold is a distance in pixels. If thedistance between the predicted location and the ground truthin one image is larger than the threshold, the prediction inthis image is considered as failed. The data in Table II showthe percentage of how many successful predictions are madeover the total number of images in a video sequence. Detailedinformation about this measurement method can be found in[35]. Table III shows evaluation on average location error onthe same algorithms, including one more tracking algorithmDMLTrack [36].

    In the experiments, this tracking system shows ability ofdealing with illumination change, size change, deformation,and partial occlusion. In situations of severe partial occlusionor full occlusion, this system has limitation. This is mainlybecause the edge feature vector we employed is a globalrepresentation, which is sensitive to severe occlusion.

    B. Tracking on FPGA SystemWe set up the parameters of this system by optimizing

    them through preliminary experiments and did not change anyparameter during the experiments.

  • 512 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 3, MARCH 2013

    Fig. 12. Tracking results from software simulation. (a) Sylvester. (b) David Indoor. (c) Cola Can. (d) Occluded Face. (e) Occluded Face 2. (f) Tiger 1.(g) Tiger 2. (h) Surfer. (i) Coupon Book.

    Fig. 13 shows the results of an experiment, in which a cupwas moved around continuously in a complex circumstance.There is a sudden illumination change on the object, producedby a spotlight coming from the right. The background alsogives a disturbing brightness condition; there is light fromthe left side of the window, while the right side is coveredby a window blind. The target changes its appearance andsize while moving. According to the results, the system gavestable trace of the target in the complex situation. In thisexperiment, we turned off the online learning function andstored several templates of the cup in different angles andsizes before tracking. Since, in this system, the size of thetracking window is fixed, we stored some parts of the target

    as template when the target size is larger than the windowsize. The total number of templates was eight.

    Fig. 14 shows the online leaning process of the system.When the hand changed to several different gestures, thesystem detected the changes and stored the new gesturesas templates. After the system learned a sufficient numberof templates, it can track the target that moves around andchanges its appearance continuously, as shown in Fig. 15.

    Fig. 16 shows the situation where partial occlusion occurs.The target goes behind some obstacle (a chair) and a part ofthe target is lost from the scene. The system learns the objectimage partially lost by occlusion as a new template and keepson tracking the target successfully.

  • ZHAO et al.: A DIRECTIONAL-EDGE-BASED REAL-TIME OBJECT TRACKING SYSTEM 513

    Fig. 13. Experiment showing tracking of a cup with illumination change and deformation. In this case, the templates are set up before tracking, includingappearances of cup at different angles and sizes. The online learning function is turned off in this case.

    Fig. 14. Online learning process. The tracking system stores new templates when the target changes its appearance.

    Fig. 15. Experiment showing the tracking ability with a sufficient number of templates obtained by online learning. The system can continuously track theobject moving and deforming.

    Fig. 16. Experiment showing the tracking ability when the target is partially occluded.

  • 514 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 3, MARCH 2013

    Fig. 17. Experiment on two-target tracking. In this experiment, each target had a template container, a candidate container, and 32 processing elements.Locations of the targets were initialized separately in the first frame. The result shows that the system can track two different objects well, without usingadditional memory or processing elements.

    TABLE IIIComparisons: Average Center Location Errors (Pixels)

    SemiBoost Frag MILTrack DMLTrack This WorkCola Can 13.09 63.44 20.13 12.84 12.42Coupon Book 66.59 55.88 14.74 5.68 63.53Sylvester 15.84 11.12 10.82 9.79 13.16Tiger 2 61.20 36.64 17.85 31.39 23.80David Indoor 38.87 46.27 23.12 8.82 12.95Occluded Face 6.98 6.34 27.23 19.29 46.98Occluded Face 2 22.86 45.19 20.19 4.97 29.03

    In Table IV, we compared the performance of the proposedsystem with three other implementations [11], [10], [14]. Allthree other systems use particle filter as localization methodbut use different feature representation algorithms. Thesestudies claimed real-time performance, but considering the cal-culation time for one frame, our method is much faster than thefirst two systems and is 40 times faster than the third systemwith the tracking window size 16 times larger. In addition,we also show the frame processing ability, which is, in fact,limited by the camera and the transmission between cameraand processing elements. This common problem can be solvedby implementing the image sensor and the processing elementson the same VLSI chip, which is discussed in the VLSI imple-mentation section. We set up the camera to work at 25 f/s. Be-cause the system works faster than the camera, the same frameof image is processed repeatedly (six times every frame in thissystem) as if it is a new frame, until the real new frame of im-age is captured by the camera. Since new candidates are regen-erated for the same frame of image in each iteration, the track-ing result is made stable when object moves faster than themovement step of candidates. For this reason, we have claimedthat the processing ability of the present system is 150 f/s.The implementation of this tracking system on VLSI chips forimproved performance is discussed in the following section.

    V. Discussion

    A. VLSI ImplementationFrom the analysis of the computational time of this system,

    it was found that most of the time is consumed in waiting forimage data input and doing feature extraction computation.This is because we cannot process the image informationfrom camera efficiently due to the data transmission limitationfrom the camera to the FPGA. This problem can be resolvedif the algorithm is directly implemented on VLSI chips.If this algorithm is implemented with a high-performance

    image sensor, the performance will greatly improve. In fact, aVLSI processor has been developed for the object recognitionpurpose that is composed of image sensor and the featureextraction block based on the same algorithm employed in thissystem [27]. For a 68 68 image, that processor is capableof reading image data directly from the on-chip image sensorarray and calculate the intensity gradients in a row-parallelway. The GFE part can be finished in only 11 clock cycleswhile it needs 960 cycles in this system. Therefore, a nearlysix times decrease in the computational time and a higherframe rate can be expected by integrating the tracking systemdeveloped in this paper directly on the chip of [27].

    B. Multiple Target TrackingMultiple target tracking is in great demand in certain

    applications. The human brain acquired such an essentialability through evolution. Although the multiple target trackingmechanism in the human brain is not yet known, in Pylyshynand Storms research [37], a widely accepted theory has beenproposed, which is well supported by experiments. Their datashowed that participants can successfully track a subset ofup to five targets from a set of ten and both accuracy andreaction times decline with increasing numbers of targets. Onepossible interpretation of their findings is that targets are beingtracked by a strictly parallel preattentive process with limitedresources.

    In hardware systems, the problem of limited processingresources always exists, especially for the real-time appli-cations. In our system, this problem is resolved by flexiblyallocating the processing elements to multiple targets. Basedon the implemented tracking system, we verified the trackingmechanism really works for a two-target experiment. In theexperiment, there were two targets with their own templates.The 64 candidates were allocated to the two targets and thetracking process was the same as the single-target tracking.The experimental result in Fig. 17 shows that the system stilltracks the targets successfully in multitarget tracking with lim-ited resources. In different tracking applications with differentnumber of targets and hardware resources, the mechanism ofthe proposed system can provide flexible and highly efficientsolutions.

    C. System InitializationThe algorithm adopted in this tracking system is based on a

    regeneration mechanism. Therefore, the initial target locationmust be specified manually. In this section, it is explained thatthe problem of initialization can also be resolved by employing

  • ZHAO et al.: A DIRECTIONAL-EDGE-BASED REAL-TIME OBJECT TRACKING SYSTEM 515

    TABLE IVComparisons of Three Object Tracking Implementations

    [11] [10] [8], [14] This WorkFeature Local-oriented energy Harr-like feature Color Directional edgeLocalization Particle filter Particle filter Particle filter MCRProcessing time 32.77 ms 4 ms 0.1 msFrames per second 30 30 30 150 (25a)Tracking window Variable 15 15 64 64Image resolution 640 480 320 240 256 240 640 480Implementation FPGA Cell/B.E. SIMD processor FPGA

    aLimited to 25 f/s by image capturing and transmission to the FPGA. All other processing operates at 150 f/s. See text.

    Fig. 18. Process of searching for two targets in an image based on software simulation. Images in the first row show the candidate distribution in each iterationand the location of one object is detected, as shown in the rightmost image. Images in the second row show the candidates distribution after suppressingfeedback in the original image. All candidates are initialized to the default location again and go to the location of the second object after eight iterations.

    the MCR mechanism developed in this paper. The merit of thissolution is that it does not need any additional resources exceptfor some simple logic elements.

    In our previous experiments, initialization was done bysetting up the target location manually before the trackingstarts. However, there are some other requirements in differentapplications. For example, in some case, the system hasalready possessed some templates of the target and startstracking when the target appears in the scene. Then the systemmust search for the target first using the templates, and whenit is found, the system use this location as the initial location.We focus on this situation here and propose a solution to theinitialization problem in the following.

    First, the image is divided into half-overlapped subregionshaving the same size with the tracking window. Second, allsuch subregions are treated as candidates in the MCR. Similarto the tracking mechanism, all the candidates accumulate atthe target location after several iterations. At last, the locationdetermined by the candidates is stored as initial location. Inthe multitarget situation, the target that is already found issuppressed by giving a feedback to the image. Namely, thetarget found in the first round is blanked by masking. Then,the searching operation explained above is repeated to find

    the second target. In this iteration, the candidates accumulateat the second target location naturally. An example of findingthe initial locations of two black clips in an image is shownin Fig. 18.

    For a searching task, the simplest way is to do tem-plate matching at every location in the image. This exhaus-tive searching strategy needs a lot of computation, espe-cially for a large size image. In this regard, the compu-tational complexity has been reduced significantly by thissolution because only the candidate images are processed forsearching.

    D. Full OcclusionFull occlusion is a very challenging problem, which may

    cause the tracker lose the target. This is because the track-ing algorithm uses the previous target location as importantinformation. The same searching mechanism explained inthe initialization using MCR can also be used to solve thisproblem. When a certain condition is met while tracking(for instance, difference between the current target image andthe templates is larger than a certain threshold), the trackingsystem enters the searching mode and keeps searching thetarget as it does in the initialization. When the system finds

  • 516 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 3, MARCH 2013

    the target, it returns to the tracking mode. In this way, a targetis found and tracked in real time even after disappearing forsome time from the scene.

    VI. Conclusion

    In this paper, we proposed a real-time object trackingsystem, which was based on the multiple candidate-locationgeneration mechanism. The system employed the directional-edge-based image features and also an online learning al-gorithm for robust tracking performance. Since the designof this algorithm was hardware friendly, we designed andimplemented the real-time system on FPGA, which has theability of processing a 640 480 resolution image in about0.1 ms. It achieved 150 f/s frame rate on FPGA and can reachabout 900 f/s if implemented on VLSI with on-chip imagesensor. Evaluation of the tracking system on both accuracy andspeed were shown and discussed, which clarify the features ofthis system. This paper also presented a detailed discussion onseveral issues of tracking, including VLSI chip implementationfor faster operation, multiple target tracking, initializationproblem, and full occlusion problem. The solutions presentedin the discussion were based on our hardware system; this willgive solutions in real-time applications.

    References

    [1] A. Yilmaz, O. Javed, and M. Shah, Object tracking: A survey, ACMComput. Surveys, vol. 38, no. 4, pp. 145, 2006.

    [2] H. Wang, D. Suter, K. Schindler, and C. Shen, Adaptive object trackingbased on an effective appearance filter, IEEE Trans. Patt. Anal. Mach.Intell., vol. 29, no. 9, pp. 16611667, Sep. 2007.

    [3] B. Han, Y. Zhu, D. Comaniciu, and L. Davis, Visual tracking by con-tinuous density propagation in sequential Bayesian filtering framework,IEEE Trans. Patt. Anal. Mach. Intell., vol. 31, no. 5, pp. 919930, May2009.

    [4] Y.-J. Yeh and C.-T. Hsu, Online selection of tracking features usingAdaBoost, IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 3,pp. 442446, Mar. 2009.

    [5] Q. Chen, Q.-S. Sun, P. A. Heng, and D.-S. Xia, Two-stage objecttracking method based on kernel and active contour, IEEE Trans.Circuits Syst. Video Technol., vol. 20, no. 4, pp. 605609, Apr.2010.

    [6] Z. Khan, I. Gu, and A. Backhouse, Robust visual object trackingusing multi-mode anisotropic mean shift and particle filters, IEEETrans. Circuits Syst. Video Technol., vol. 21, no. 1, pp. 7487, Jan.2011.

    [7] J. U. Cho, S. H. Jin, X. D. Pham, J. W. Jeon, J. E. Byun, and H. Kang,A real-time object tracking system using a particle filter, in Proc.IEEE/RSJ Int. Conf. Intell. Robots Syst., Oct. 2006, pp. 28222827.

    [8] H. Medeiros, J. Park, and A. Kak, A parallel color-based particle filterfor object tracking, in Proc. IEEE Comput. Soc. Conf. CVPRW, Jun.2008, pp. 18.

    [9] Z. Kim, Real time object tracking based on dynamic feature groupingwith background subtraction, in Proc. IEEE Conf. CVPR, Jun. 2008,pp. 18.

    [10] T. Ishiguro and R. Miyamoto, An efficient prediction scheme forpedestrian tracking with cascade particle filter and its implementationon Cell/B.E., in Proc. Int. Symp. ISPACS, Jan. 2009, pp. 2932.

    [11] E. Norouznezhad, A. Bigdeli, A. Postula, and B. Lovell, Robust objecttracking using local oriented energy features and its hardware/softwareimplementation, in Proc. 11th Int. Conf. Contr. Automat. Robot. Vision(ICARCV), Dec. 2010, pp. 20602066.

    [12] S.-A. Li, C.-C. Hsu, W.-L. Lin, and J.-P. Wang, Hardware/software co-design of particle filter and its application in object tracking, in Proc.ICSSE, Jun. 2011, pp. 8791.

    [13] A. Doucet, On sequential simulation-based methods for Bayesianfiltering, Dept. Eng., Univ. Cambridge, Cambridge, U.K., Tech. Rep.CUED/F-INFENG/TR.310, 1998.

    [14] H. Medeiros, X. Gao, R. Kleihorst, J. Park, and A. C. Kak, A parallelimplementation of the color-based particle filter for object tracking, inProc. ACM SenSys Workshop Applicat. Syst. Algorithms Image Sensing(ImageSense), 2008.

    [15] D. Cherng, S. Yang, C. Shen, and Y. Lu, Real time color based particlefiltering for object tracking with dual cache architecture, in Proc. 8thIEEE Int. Conf. AVSS, Aug.Sep. 2011, pp. 148153.

    [16] X. Lu, D. Ren, and S. Yu, FPGA-based real-time object tracking formobile robot, in Proc. ICALIP, 2010, pp. 16571662.

    [17] S. Liu, A. Papakonstantinou, H. Wang, and D. Chen, Real-time objecttracking system on FPGAs, in Proc. SAAHPC, 2011, pp. 17.

    [18] Y.-M. Lin, C.-H. Yeh, S.-H. Yen, C.-H. Ma, P.-Y. Chen, and C.-C. Kuo,Efficient VLSI design for SIFT feature description, in Proc. ISNE,2010, pp. 4851.

    [19] H. El, I. Halym, and S. E.-D Habib, Proposed hardware architecturesof particle filter for object tracking, EURASIP J. Adv. Signal Process.,vol. 2012, no. 1, p. 17, 2012.

    [20] P. Li, An adaptive binning color model for mean shift tracking, IEEETrans. Circuits Syst. Video Technol., vol. 18, no. 9, pp. 12931299, Sep.2008.

    [21] J. Wang and Y. Yagi, Integrating color and shape-texture features foradaptive real-time object tracking, IEEE Trans. Image Process., vol. 17,no. 2, pp. 235240, Feb. 2008.

    [22] S. Fazli, H. Pour, and H. Bouzari, Particle filter based object trackingwith sift and color feature, in Proc. 2nd ICMV, Dec. 2009, pp.8993.

    [23] S. Avidan, Support vector tracking, IEEE Trans. Patt. Anal. Mach.Intell., vol. 26, no. 8, pp. 10641072, Aug. 2004.

    [24] S. Avidan, Ensemble tracking, IEEE Trans. Patt. Anal. Mach. Intell.,vol. 29, no. 2, pp. 261271, Feb. 2007.

    [25] D. Hubel and T. Wiesel, Receptive fields of single neurones inthe cats striate cortex, J. Physiol., vol. 148, no. 3, pp. 574591,1959.

    [26] T. Shibata, M. Yagi, and M. Adachi, Soft-computing integrated circuitsfor intelligent information processing, in Proc. Int. Conf. Inform.Fusion, vol. 1. 1999, pp. 648656.

    [27] H. Zhu and T. Shibata, A real-time image recognition system usinga global directional-edge-feature extraction VLSI processor, in Proc.ESSCIRC, Sep. 2009, pp. 248251.

    [28] N. Takahashi, K. Fujita, and T. Shibata, A pixel-parallel self-similitudeprocessing for multiple-resolution edge-filtering analog image sensors,IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 56, no. 11, pp. 23842392,Nov. 2009.

    [29] H. Zhu, P. Zhao, and T. Shibata, Directional-edge-based object trackingemploying on-line learning and regeneration of multiple candidatelocations, in Proc. IEEE ISCAS, Jun. 2010, pp. 26302633.

    [30] Y. Suzuki and T. Shibata, Multiple-clue face detection algorithm usingedge-based feature vectors, in Proc. IEEE ICASSP, vol. 5. May 2004,pp. 737740.

    [31] A. Nakada, T. Shibata, M. Konda, T. Morimoto, and T. Ohmi, Afully parallel vector-quantization processor for real-time motion-picturecompression, IEEE J. Solid-State Circuits, vol. 34, no. 6, pp. 822830,Jun. 1999.

    [32] M. Yagi and T. Shibata, An image representation algorithm com-patible with neural-associative-processor-based hardware recognitionsystems, IEEE Trans. Neural Netw., vol. 14, no. 5, pp. 11441161, Sep.2003.

    [33] D. Lowe, Distinctive image features from scale-invariant keypoints,Int. J. Comput. Vision, vol. 60, no. 2, pp. 91110, 2004.

    [34] F.-C. Huang, S.-Y. Huang, J.-W. Ker, and Y.-C. Chen, High-performance SIFT hardware accelerator for real-time image featureextraction, IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 3,pp. 340351, Mar. 2012.

    [35] B. Babenko, M.-H. Yang, and S. Belongie, Robust object tracking withonline multiple instance learning, IEEE Trans. Patt. Anal. Mach. Intell.,vol. 33, no. 8, pp. 16191632, Aug. 2011.

    [36] G. Tsagkatakis and A. Savakis, Online distance metric learning forobject tracking, IEEE Trans. Circuits Syst. Video Technol., vol. 21,no. 12, pp. 18101821, Dec. 2011.

    [37] Z. Pylyshyn and R. Storm, Tracking multiple independent targets:Evidence for a parallel tracking mechanism, Spatial Vision, vol. 3, no.3, pp. 179197, 1988.

  • ZHAO et al.: A DIRECTIONAL-EDGE-BASED REAL-TIME OBJECT TRACKING SYSTEM 517

    Pushe Zhao received the B.Eng. degree in in-formation science and electronic engineering fromZhejiang University, Zhejiang, China, in 2004, andthe M.Eng. degree in electronic engineering fromNanjing Electronic Device Institute, Nanjing, China,in 2007. He is currently pursuing the Ph.D. degreewith the Department of Electrical Engineering andInformation Systems, University of Tokyo, Tokyo,Japan.

    From 2007 to 2009, he was with the NanjingElectronic Device Institute, involved in fabrication

    of silicon power devices. His current research interests include image andvideo processing, computer vision, and real-time intelligence systems.

    Hongbo Zhu received the B.Eng. degree from theDepartment of Information Science and ElectronicEngineering, Zhejiang University, Zhejiang, China,in 2004, the M.Eng. degree from the Division ofElectrical, Electronic, and Information Engineering,Osaka University, Osaka, Japan, in 2007, and thePh.D. degree from the Department of ElectronicEngineering, University of Tokyo, Tokyo, Japan, in2010.

    From 2010 to 2011, he was with the Departmentof Embedded Systems Research, Central Research

    Laboratory, Hitachi, Ltd., Tokyo. In 2011, he joined the VLSI Design andEducation Center, University of Tokyo, as an Assistant Professor. His currentresearch interests include complementary metaloxidesemiconductor visionsensors and intelligent image-processing theories, circuits, and systems.

    He Li received the B.Eng. degree in electrical engi-neering from the University of Tokyo, Tokyo, Japan,in 2011, where he is currently pursuing the Mastersdegree with the Graduate School of InformationScience and Technology.

    His current research interests include computervision and distributed processing.

    Tadashi Shibata (M79) was born in Japan in 1948.He received the B.S. degree in electrical engineeringand the M.S. degree in material science from OsakaUniversity, Osaka, Japan, and the Ph.D. degree fromthe University of Tokyo, Tokyo, Japan, in 1971,1973, and 1984, respectively.

    From 1974 to 1986, he was with Toshiba Corpo-ration, Tokyo, where he was a VLSI Process andDevice Engineer involved in the development ofmicroprocessors, dynamic random access memories,and electrically erasable programmable read-only

    memories. From 1978 to 1980, he was a Visiting Research Associate withStanford Electronics Laboratories, Stanford University, Stanford, CA, wherehe studied laser beam processing of electronic materials including silicide,polysilicon, and superconducting materials. From 1986 to 1997, he was anAssociate Professor with Tohoku University, Sendai, Japan, where he wasinvolved in research on low-temperature processing and ultraclean technolo-gies for very large-scale integration fabrication. Since 1997, he has beena Professor with the Department of Electrical Engineering and InformationSystems, University of Tokyo. After the invention of the neuron metaloxidesemiconductor transistor in 1989, his research interest shifted from devicesand materials to circuits and systems. His current research interests includedeveloping human-like intelligent computing systems based on state-of-the-art silicon technology and biologically and psychologically inspired modelsof the brain.

    Dr. Shibata is a member of the Japan Society of Applied Physics and theInstitute of Electronics, Information, and Communication Engineers.

    /ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 300 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 600 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages false /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 1200 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False

    /CreateJDFFile false /Description > /Namespace [ (Adobe) (Common) (1.0) ] /OtherNamespaces [ > /FormElements false /GenerateStructure false /IncludeBookmarks false /IncludeHyperlinks false /IncludeInteractive false /IncludeLayers false /IncludeProfiles false /MultimediaHandling /UseObjectSettings /Namespace [ (Adobe) (CreativeSuite) (2.0) ] /PDFXOutputIntentProfileSelector /DocumentCMYK /PreserveEditing true /UntaggedCMYKHandling /LeaveUntagged /UntaggedRGBHandling /UseDocumentProfile /UseDocumentBleed false >> ]>> setdistillerparams> setpagedevice


Recommended