+ All Categories
Home > Documents > Fast Background Subtraction Based on a Multilayer Codebook Model for Moving Object Detection

Fast Background Subtraction Based on a Multilayer Codebook Model for Moving Object Detection

Date post: 21-Dec-2016
Category:
Upload: jing-yu
View: 218 times
Download: 0 times
Share this document with a friend
13
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 10, OCTOBER 2013 1809 Fast Background Subtraction Based on a Multilayer Codebook Model for Moving Object Detection Jing-Ming Guo, Senior Member, IEEE, Chih-Hsien Hsia, Member IEEE, Yun-Fu Liu, Student Member, IEEE, Min-Hsiung Shih, Cheng-Hsin Chang, and Jing-Yu Wu Abstract —Moving object detection is an important and fun- damental step for intelligent video surveillance systems because it provides a focus of attention for post-processing. A multilayer codebook-based background subtraction (MCBS) model is pro- posed for video sequences to detect moving objects. Combining the multilayer block-based strategy and the adaptive feature extraction from blocks of various sizes, the proposed method can remove most of the nonstationary (dynamic) background and significantly increase the processing efficiency. Moreover, the pixel-based classification is adopted for refining the results from the block-based background subtraction, which can further classify pixels as foreground, shadows, and highlights. As a result, the proposed scheme can provide a high precision and efficient processing speed to meet the requirements of real-time moving object detection. Index Terms—Background subtraction, codebook model, foreground detection, hierarchical structure, shadow removal. I. Introduction B ACKGROUND subtraction is an essential issue in vi- sual surveillance and can extract moving objects for further analysis. However, a difficult issue in background subtraction is that the background is usually nonstationary, such as a waving tree or changing lights. Moreover, when moving objects are involved in a scene, there might be some shadows cast or changes in the lighting, which could result in incorrect detections. To solve this problem, many previous studies have proposed a corresponding pixel classification algorithms to classify the pixels as shadow, highlight, or foreground. Cucciara et al. [1] proposed a hue-saturation- value color model to handle the shadow; this method defined shadows by the luminance and saturation values and used a predefined parameter for the hue variation. In [2] and [3], a red, green, and blue (RGB) color model was proposed to Manuscript received September 21, 2012; revised January 8, 2013 and March 26, 2013; accepted March 26, 2013. Date of publication June 17, 2013; date of current version September 28, 2013. This work was supported by the National Science Council, R. O. C., under contract NSC 100-2221-E- 011-103-MY3. This paper was recommended by Associate Editor L. Onural. J.-M. Guo, Y.-F. Liu, M.-H. Shih, C.-H. Chang, and J.-Y. Wu are with the Department of Electrical Engineering, National Taiwan University of Science and Technology, Taipei 10607, Taiwan (e-mail: [email protected]; [email protected]; [email protected]; [email protected]. edu.tw; [email protected]). C.-H. Hsia is with the Department of Electrical Engineering, Chinese Culture University, Taipei 11114, Taiwan (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCSVT.2013.2269011 remove the shadow; however, one problem with this model was that there were too many parameters in the color model. All of these methods show good performance in managing the shadow issue; however, some disadvantages are apparent, such as a nonstationary background removal capability. Currently, the mainstream techniques of background subtraction can be roughly separated into three groups, including the mixture of Gaussian (MoG), the kernel density estimation (KDE), and the codebook (CB). Among these, the MoG attracts the most attention. Stauffer and Grimson [4] used multiple Gaussian distribu- tions to construct a background model for each pixel, and this method can achieve good performance through a learning procedure that builds statistical models. However, there are some disadvantages; for example, it cannot detect and remove shadows. Martel-Brisson and Zaccarin [5] proposed a pixel- based statistical algorithm for detecting moving shadows of nonuniform objects. However, its high computational com- plexity results in a longer learning time. Hu and Su [6] proposed a Gaussian distribution-based RGB color model; the cone-shaped color model can classify pixels as shadows and highlights, but the processing time is too long for practical implementation. Xue et al. [7] proposed a phase-based back- ground modeling approach that combines Gabor wavelet trans- forms to handle illumination changes. Kim et al. [8] proposed a real-time method that uses CB; this method gathers the background pixel values to construct the background model. This method compresses the pixel information to increase the processing speed. Wu and Peng [9] proposed a spatial- temporal CB model that includes the concept of a spatial relationship between the pixels and uses the Markov random field (MRF) to address background subtraction. However, the applied MRF leads to a low processing speed. In [10], the Kohonen network concept and self-organizing maps [11] were utilized to construct the background model, which can adapt in a self-organizing manner. Guo et al. [12] adopted the concept of block-based CBs to construct four different background models. However, the fixed-sized block-based process loses adaptation flexibility and thus causes more false detections. Barnich and Van Droogenbroeck [13] proposed a unique model that uses a random concept in the color space along with an updating method. This approach provides good detection performance and a high processing speed. However, it is a challenge for a software platform to meet real-time performance at a higher resolution. In [14], a high frame 1051-8215 c 2013 IEEE
Transcript
Page 1: Fast Background Subtraction Based on a Multilayer Codebook Model for Moving Object Detection

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 10, OCTOBER 2013 1809

Fast Background Subtraction Based on a MultilayerCodebook Model for Moving Object Detection

Jing-Ming Guo, Senior Member, IEEE, Chih-Hsien Hsia, Member IEEE, Yun-Fu Liu, Student Member, IEEE,Min-Hsiung Shih, Cheng-Hsin Chang, and Jing-Yu Wu

Abstract—Moving object detection is an important and fun-damental step for intelligent video surveillance systems becauseit provides a focus of attention for post-processing. A multilayercodebook-based background subtraction (MCBS) model is pro-posed for video sequences to detect moving objects. Combiningthe multilayer block-based strategy and the adaptive featureextraction from blocks of various sizes, the proposed methodcan remove most of the nonstationary (dynamic) backgroundand significantly increase the processing efficiency. Moreover,the pixel-based classification is adopted for refining the resultsfrom the block-based background subtraction, which can furtherclassify pixels as foreground, shadows, and highlights. As a result,the proposed scheme can provide a high precision and efficientprocessing speed to meet the requirements of real-time movingobject detection.

Index Terms—Background subtraction, codebook model,foreground detection, hierarchical structure, shadow removal.

I. Introduction

BACKGROUND subtraction is an essential issue in vi-sual surveillance and can extract moving objects for

further analysis. However, a difficult issue in backgroundsubtraction is that the background is usually nonstationary,such as a waving tree or changing lights. Moreover, whenmoving objects are involved in a scene, there might be someshadows cast or changes in the lighting, which could resultin incorrect detections. To solve this problem, many previousstudies have proposed a corresponding pixel classificationalgorithms to classify the pixels as shadow, highlight, orforeground. Cucciara et al. [1] proposed a hue-saturation-value color model to handle the shadow; this method definedshadows by the luminance and saturation values and used apredefined parameter for the hue variation. In [2] and [3],a red, green, and blue (RGB) color model was proposed to

Manuscript received September 21, 2012; revised January 8, 2013 andMarch 26, 2013; accepted March 26, 2013. Date of publication June 17,2013; date of current version September 28, 2013. This work was supportedby the National Science Council, R. O. C., under contract NSC 100-2221-E-011-103-MY3. This paper was recommended by Associate Editor L. Onural.

J.-M. Guo, Y.-F. Liu, M.-H. Shih, C.-H. Chang, and J.-Y. Wu are withthe Department of Electrical Engineering, National Taiwan University ofScience and Technology, Taipei 10607, Taiwan (e-mail: [email protected];[email protected]; [email protected]; [email protected]; [email protected]).

C.-H. Hsia is with the Department of Electrical Engineering, ChineseCulture University, Taipei 11114, Taiwan (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSVT.2013.2269011

remove the shadow; however, one problem with this modelwas that there were too many parameters in the color model.All of these methods show good performance in managing theshadow issue; however, some disadvantages are apparent, suchas a nonstationary background removal capability. Currently,the mainstream techniques of background subtraction can beroughly separated into three groups, including the mixture ofGaussian (MoG), the kernel density estimation (KDE), andthe codebook (CB). Among these, the MoG attracts the mostattention.

Stauffer and Grimson [4] used multiple Gaussian distribu-tions to construct a background model for each pixel, andthis method can achieve good performance through a learningprocedure that builds statistical models. However, there aresome disadvantages; for example, it cannot detect and removeshadows. Martel-Brisson and Zaccarin [5] proposed a pixel-based statistical algorithm for detecting moving shadows ofnonuniform objects. However, its high computational com-plexity results in a longer learning time. Hu and Su [6]proposed a Gaussian distribution-based RGB color model; thecone-shaped color model can classify pixels as shadows andhighlights, but the processing time is too long for practicalimplementation. Xue et al. [7] proposed a phase-based back-ground modeling approach that combines Gabor wavelet trans-forms to handle illumination changes. Kim et al. [8] proposeda real-time method that uses CB; this method gathers thebackground pixel values to construct the background model.This method compresses the pixel information to increasethe processing speed. Wu and Peng [9] proposed a spatial-temporal CB model that includes the concept of a spatialrelationship between the pixels and uses the Markov randomfield (MRF) to address background subtraction. However,the applied MRF leads to a low processing speed. In [10],the Kohonen network concept and self-organizing maps [11]were utilized to construct the background model, which canadapt in a self-organizing manner. Guo et al. [12] adoptedthe concept of block-based CBs to construct four differentbackground models. However, the fixed-sized block-basedprocess loses adaptation flexibility and thus causes more falsedetections. Barnich and Van Droogenbroeck [13] proposed aunique model that uses a random concept in the color spacealong with an updating method. This approach provides gooddetection performance and a high processing speed. However,it is a challenge for a software platform to meet real-timeperformance at a higher resolution. In [14], a high frame

1051-8215 c© 2013 IEEE

Page 2: Fast Background Subtraction Based on a Multilayer Codebook Model for Moving Object Detection

1810 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 10, OCTOBER 2013

rate implementation based on the MoG is presented usinggeneral-purpose computing on graphics processing units,which can provide very high speed processing performancefor server systems. Compared to the MoG, the proposedcomplexity-reduced hierarchical method for foreground detec-tion using the CB model can achieve a good classificationperformance without substantial hardware resources.

In recent years, the resolutions of digital cameras and videorecorders have increased over time, e.g., standard definition(SD); however, the complexity of the former foregrounddetection methods is still too high to handle high resolutionscenarios. To address this concern, this paper proposes anew multilayer, adaptive block-based background subtractionmethod and a pixel-based refinement procedure that uses therather robust mean feature to cater to the CB to yield high pro-cessing efficiency and detection precision (Pr) simultaneously.Herein, all of the parameters in the multilayer codebook-basedbackground subtraction (MCBS) are fixed, and the parametersin this paper are unified to yield general results.

This paper is organized as follows. The feature extractionfor the blocks of various sizes is first introduced in Section II,in which the multilayer CB construction algorithm is also pre-sented. Section III describes the combination of the proposedmultilayer background subtraction and the CB model. SectionIV describes the details of the fake foreground removal model(FFRM). Section V demonstrates the extensive experimentalresults and performance comparisons to prove the reliabilityof the proposed method. Finally, the conclusions are drawn inSection VI.

II. Multilayer Background Model Construction

With the block-based CB background model, the hierar-chical method for foreground detection using the CB model(termed the hierarchical codebook (HCB) [12]) was employedto solve the dynamic background and to improve the pro-cessing speed by adopting high-mean and low-mean valuesof blocks; however, many additional issues were introduced.For example, fixed-size block-based processing that uses anidentical threshold could result in false detection in the back-ground subtraction. Fig. 1(a) shows the original frame #366 ofthe video CAMPUS [18], and Figs. 1(b)–(d) are the results ofthe HCB using different block sizes. Among these, althougha greater block size 16 is good to handle the dynamic back-ground issue, some false subtractions (false negatives) can beobserved over the vehicle (left) and the pedestrian (right). Forthe smaller block of size 4 × 4, the foreground is well detectedbut the subtracted background is rather poor for the dynamicbackground, which is why the tradeoff block size 8 × 8 isadopted in HCB [12]. However, in some other sequences, ablock of size 8 × 8 is not sufficiently large, which leads tonoise, as is the case with a block of size 4 × 4 [Fig. 1(d)].Conversely, for some sequences, the block of size 8 × 8 is notsufficiently small, which leads to false subtractions, as is thecase with a block of size 16 × 16 [Fig. 1(b)]. To solve theseissues, a multilayer block-based background model is proposedwith three adaptive block-based layers for coarse detectionand a pixel-layer (block of size 1 × 1) for further refinement.

Fig. 1. Background subtraction results of the HCB [12] using various blocksizes with test sequence CAMPUS. (a) Original image. (b) 16 × 16. (c) 8 × 8.(d) 4 × 4.

Fig. 2. Conceptual flowchart of the proposed scheme.

With this strategy, the reliability of the system is improvedagainst the dynamic background problem, and the integrityof the foreground is well preserved. In the experiments, theproposed algorithm shows better performance in terms ofvarious evaluation indices using the mean value of a block(as defined below) instead of using the high and low meansin the HCB.

Fig. 2 shows the conceptual flowchart of the proposedmethod, in which the right vertical axis denotes the time index(t). The flowchart can be separated into two parts: the first half(1 ≤ t ≤ T ) , on the top, is for training the background modelusing four CBs, as introduced below, and the other half (t > T )is for background subtraction using the hierarchical conceptualalgorithm. The FFRM, which is adopted for adapting thebackground, is on the bottom-right corner. Moreover, as shownon the left of this figure, an illumination change procedure isalso proposed to overcome the light condition changes. In thissection, the first part (1 ≤ t ≤ T ), known as the backgroundconstruction, is introduced.

A. Feature Extraction

Feature extraction is a very crucial factor in foregrounddetection because it could induce a very large impact on the

Page 3: Fast Background Subtraction Based on a Multilayer Codebook Model for Moving Object Detection

GUO et al.: FAST BACKGROUND SUBTRACTION BASED ON A MULTILAYER CODEBOOK MODEL 1811

results. Although the HCB [12] used block-based featuresand achieved good performance, the adopted features stillhave room for further improvement. First, each frame Ft ={Xt

m,n |1 ≤ m ≤ I, 1 ≤ n ≤ J}

of size I × J of a sequence isseparated into multiple nonoverlapping blocks of size M ×M

(in different time slots, each frame of size I × J is dividedinto 1/M × J/M blocks, and each block is recorded into aCB individually), where Xt

m,n ={Xe

m,n |e = R, G, B}

denotesa color pixel in the RGB color domain, and M denotes thecovered region (block size) of a CB. Subsequently, each blockis processed independently. In the HCB, the concept of blocktruncation coding (BTC) was used, which entailed dividing animage into nonoverlapping blocks, and each block was simplyrepresented by four regional means, namely the high-top mean,high-bottom mean, low-top mean, and low-bottom mean [thus,in total, 12 (three color channels four means) feature values],which are used to construct the block-based CBs. However,these four means induce additional computational complexity;moreover, they are easily interfered by environmental factorssuch as lighting conditions.

According to the extensive experiments, one mean value ofa block as defined below is employed in this paper to replacethe roles of the former four means in the HCB. As a result,an extremely low computational complexity can be obtainedwithout noticeably degrading the description capability

μ(M,e) =1

M × M

M∑m=1

M∑n=1

χem,n (1)

where e = R, G, and B are used to represent the three colors.Thus, only one mean value is used to describe a block ina specific color channel. In addition, each block can berepresented by BM = {μ(M,e)|e = R, G, B}, where M = 1, 4,8, and 16, and notably when M = 1, the corresponding B1 isequivalent to

{xe

m,n |e = R, G, B}

.

B. Background Model Construction

The features of a specific block can be described by BM

during a training period (1 ≤ t ≤ T ), and a CB (thebackground model) for a block can be represented by CM ={CM

i |1 ≤ i ≤ KCm}, where CMi denotes the ith codeword of

size M × M in the CB, and KCm denotes the number ofcodewords in CM . Herein, CM

i = {BMi , WcM

i, time cM

i} (for

pixel-based CMi = {pM

i , wcMi, time cM

i}), where WcM

idenotes

a weight variable, and time cMi

denotes a time variable. Eachframe of size I × J is divided into multiple nonoverlappingblocks of size I/M × J/M, and one block-based CB isemployed to record a block (each block is processed inde-pendently). In addition, each layer associates with a blockof a specific size (1, 4, 8, and 16), and thus, a total of(I ×J + I/4×J/4 + I/8×J/8 + I/16×J/16) CBs are requiredin a frame after the background model is constructed.

Fig. 3 illustrates the proposed algorithm of the four-layerbackground model construction with the updating method,where the block “add codeword” involves the following equa-tions: K

CM = KCM + 1, CM

Kc = BM, WCMKc

= 1T, and time CM

Kc= t.

and The block “update codeword” involves three equations:CM

i = (1−α)cMi +αBM, WCM

i= WCM

i+ 1

T, and time cM=t

i ; where

Fig. 3. Background model training algorithm.

T denotes the number of training frames, and denotes thelearning rate to control the proportion needed to maintain thecurrent codeword, and includes the current block value in thecodeword. The α is set at 0.05 in this paper, and theoretically,a higher learning rate reflects that the current RGB color ofsuccessive matches possesses a higher confidence (when 1is set, the original color will be replaced completely). Theproposed background model construction employs multiplecodewords CM

Kc to address the characteristics of a block.Given an input block vector, Bm the match function asdefined below is employed to check the Euclidean distancebetween the codewords cM

i in the corresponding CB and thetransformed block vector BM in the RGB color space

match function(νsource, νcodeword)

=

{true, if |dT d|

3 < λ2M

false, otherwisewhere, M = 1, 4, 8, 16 (2)

where M = 1, 4, 8, 16

d = νsource − νcodeword (3)

where λ4, λ8 and λ16 denote the threshold for the block-based CBs, and λ1 denotes the threshold for the pixel-basedCB (set at 4 and 3 for the block-based CBs and the pixel-based CB, respectively). To setup these thresholds, the block-based CBs are established for dynamic background scenariossuch as waving trees; thus, a greater threshold is consideredto filter out most of the noise, and a smaller threshold isadopted for refining the outputs. In general, when the block-based threshold value is small, more foreground is detected,and a greater pixel-based threshold is applied to refine theforeground; vsource denotes a 1-D feature vector from the testsequence, and vcodeword denotes a one-dimension codewordstored in the CM . This match function is widely used in thispaper for background model construction because the RGB

Page 4: Fast Background Subtraction Based on a Multilayer Codebook Model for Moving Object Detection

1812 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 10, OCTOBER 2013

color model is employed in this paper, and the two vectorsvsource and vcodeword are applied to yield the average distanced across the three-color space. Subsequently, the matchingparameter λM is adopted to compare the Euclidean distancebetween the two vectors in the RGB color spaces. When acodeword cM

i is matched, cMi is updated by the corresponding

block vector BM ; the larger the number of block vectors offrames that are matched to the ith codeword, the higher theimportance that the codeword can be boosted to (by increasingthe weight variable WcM

i). Conversely, some codewords in a

CB are not frequently used to describe the background.Consequently, a refining procedure is employed (as de-

scribed below) to filter out those redundant codewords as wellas to further reduce the computational complexity. The refiningprocedure extracts L

CM codewords from KCM(L

CM ≤ KCM)

codewords in a CB; the codeword cMi in the CB is sorted

according to the corresponding weight wcMi

from high (higherimportance) to low (lower importance)

LCM = argming

[(

g∑i=1

WCMi

) > η

], where g ≤ KCM (4)

where WCMi

denotes the weight of the sorted codeword CMi ,

and η denotes the proportion parameter used to determinewhich proportion of the codewords should be maintained (η= 0.7). A greater η means that more codewords are retainedduring the updating procedure, and the completeness of thecodebook model can be better maintained. Yet, sometimeswrong codewords can be added into the codebook model inthis scenario. If there are almost no moving objects duringthe background model construction, then η has no effecttoward the codeword number; conversely, if many movingobjects are involved, then a greater proportion parameter canlead to recording more wrong codewords. The refined CBsCM = {CM

i

∣∣1iLCM, M = 1, 4, 8, 16 } are then employed for

the block-based background subtraction, which is introducedin Section III.

C. Illumination Change Procedure

Typically, lighting conditions change over time, and usuallya fixed threshold, such as the λM in (2), is not sufficient to fullycover these variations for most of the scenes. To address thisconcern, an adjustment strategy that adaptively modifies thebackground model with the variations in the lights, to obtaina higher suitability, is expected to be given. To do so, theillumination change procedure as shown in Fig. 2 is proposedto solve this problem. Fig. 4 illustrates this algorithm, and thevariables are defined as follows:

dis = ‖ctg(Value) − Gray‖, where ctg(Value) = BM

Gray = GrayM, M = 1, 4, 8, 16 (5)

where this form of YCbCr was defined for a standard videocapture system and used in the ITU-R BT.601 (formerly CCIR601) standard for digital image compression. The component

Fig. 4. Illumination changes procedure.

Y (= ctg(Value)) is derived from the corresponding RGB space[15] using the following equation:

ctg(Value)=VR×0.299+VG×0.587+VB×0.114, where V ∈ R3.

(6)In the initialize variable block, the variables Value and Gray

denote the current gray value (BM) and the recoded gray value(the previous mean gray value) of the corresponding layer, re-spectively; variable countGM denotes the count of the recordedframes. In this figure, a distance dis that is too large suggeststhat a very large illumination change occurs and the previousGray might not be able to represent the background. Thus, thecoming Value is adapted to replace the Gray value, to shift theluminance range of the background model (which involvesthe following equations: Gray = Value and countGM = 1). Atthe same time, the current λM is also discarded and replacedby the predefined λM , which is always the original valuefor initialization. Compared with the fixed strategy that isused in (2), this adaptive manner automatically adjusts themodel location along with the fluctuation in the lights; thus,more variations will be covered reasonably. Conversely, ifthe dis is sufficiently small, the distortion is updated [in-cluding the following equations: countGM = countGM + 1 andGray = (Gray × (countGM – 1) + Value)/countGM), to yield anew, as follows:

λ′M = λM × (1 +

dis

255) (7)

for improving the tolerance to the environmental changes inthe background model, most importantly to the illuminationchanges.

III. Multilayer Background Subtraction

The proposed hierarchical structural CB method is to reducethe computational complexity for the foreground detection, inwhich multiple codewords are employed to fully describe animage block. Subsequently, the FFRM is employed to updatethe background model CBs. To adapt the current situations,the nonbackground information is also used to update theblock-based and pixel-based background models. This methodprovides an independent way to update the background modelaccording to the time that the foreground stays, rather than theformer method, which must confirm a color similarity (Sim).Finally, the results are then further refined during the pixel-based phase. This phase also provides additional functions

Page 5: Fast Background Subtraction Based on a Multilayer Codebook Model for Moving Object Detection

GUO et al.: FAST BACKGROUND SUBTRACTION BASED ON A MULTILAYER CODEBOOK MODEL 1813

Fig. 5. Flowchart of the proposed background subtraction. Top block: multi-layer block-based background subtraction. Bottom block: pixel classification.

that distinguish whether a target belongs to highlight orshadow, which might confuse the procedure of the foregrounddetermination.

After finishing the background model construction by train-ing on T frames, as shown in Fig. 2, four layers of CBs,the block-based C16, C8, and C4 and the pixel-based C1, areobtained; these are then utilized for the proposed multilayerbackground subtraction. Without loss of generality, the back-ground subtraction starts after T , and the test input frames areFT+1, FT+2, . . . . By observing (4), we know that in the refiningprocedure, the proposed method retains the top 70% (pro-portion parameter η=0.7) of the codewords according to thepriority of importance. Consequently, even if moving objectsrandomly appear during the training, the proposed method canstill build the background information robustly because theseobjects associate with unstable codewords, which mostly willnot be retained by (4). The main difference between the HCB[12] and the proposed method is that, in total, four layers ofvarious block sizes are employed in this paper, which not onlyboosts the processing efficiency of the foreground detectionbut also adaptively solves the dynamic background issue. Thepixel-based CBs at the end of the proposed system can alsoclassify the pixels into three types, foreground, shadow, andhighlight, as detailed below.

TABLE I

Performance Comparison

A. Hierarchical Background Subtraction Using MultilayerBlock-Based CBs

The upper block of Fig. 5 shows the flow of the proposedMCBS. Initially, similar to the construction of the block-basedbackground model, the input frames (F t) are divided intomultiple nonoverlapping blocks, and the feature extraction asintroduced in Section II-A is then applied in this process. Thus,each block is transformed into a 3-D mean-value vector BM

for background subtraction. In the first stage, B16 is adoptedfor the first layer process. Before applying (2) to determine thebackground, the illumination changes procedure as formulatedin (7) is adopted to adjust the threshold λM [defined in (2)]to meet various changes in the lighting, where Value = BM,

and Gray = GrayM, in this case. The current block meanvalue B16 is used to match the block of size 16 × 16 inthe background model C16 = {C16

i |1 ≤ i ≤ LC16} usingthe new threshold via (2). If a codeword is matched, thenupdate the matched ith codeword c16

i [which involves thefollowing equations: c16

i = (1 – α)c16i + αB16, time c16

i = t, andcount = count + 1] and the gray recorder value Gray16, asshown below, and determine the current block to be back-ground

⎧⎨⎩

countGM = countGM + 1

GrayM=GrayM × (countGM − 1)+ctg(BM)

countGM, M = 4, 8, 16

(8)where ctg(·) is defined in (6). Otherwise (no codeword ismatched), the current block is divided into four 8 × 8 blocks,and each block is transformed into a block-based vector B8,which is adopted for the next block-based layer. Equation (2)is applied for the second layer with the new threshold after theillumination changes procedure (7), to match the codewords tothe background model C8 and the current block vector B8. Ifthey match, then update the matched codewords, determine thecurrent block to be background, and update the gray recorderGray8; otherwise, divide the 8 × 8 block into four 4 × 4 blocks,similar to the algorithm for the second layer, and do likewisefor the updating phases. After the three stages are finished, thefinal phase combines the results yielded from the blocks of thethree sizes. In this way, the block-based stage can remove mostof the noise and dynamic background; however, it has low Pr.To overcome this problem, the pixel-based stage is adoptedto enhance the Pr, which also can reduce the FPR, as shownin Table I. The main contribution of the block-based stage isto reduce the redundant foreground detection operations and

Page 6: Fast Background Subtraction Based on a Multilayer Codebook Model for Moving Object Detection

1814 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 10, OCTOBER 2013

to reduce the noise in the dynamic background because themean is considered in the block-based stage, and the meanis a good feature to protect against the noise. In addition, alarger block size is employed for the early stages while leavinga smaller block size in the later stages, and the processingspeed can be further boosted. The result from the block-based stage is then fed into the pixel classification, which isintroduced in the following section. In other words, as longas a block has been matched during the block-based stages,the pixel-based phase will not be required for simplification.To prevent the codeword representative in the pixel-based CBfrom decreasing, a pixel-based update procedure is proposed.The variable countGM in the top block of Fig. 5 denotes thetemporary constant that records the update times of C16, andupdatepixel denotes the period constant that controls how manytimes C16 updates (the updating parameter updatepixel = 3 forthe pixel-based CB). The number 3 represents that our systemwill update the pixel information to ensure that no data aremissed after there are three successive matches with a block ofsize 16 × 16. A smaller updatepixel means that the pixels candescribe the current background more precisely, yet it alsoleads to a higher computational complexity. Thus, in everyupdatepixel time interval, the pixel-based update procedureupdates the pixels Xt

m,n in the current block B16.

B. Pixel Classification Using Pixel-Based CB

In the bottom block of Fig. 5, the pixel-based CB is used tofurther classify the property of the pixels, where the variableχ+

m,n denotes the pixels in the nonsubtracted area after theblock-based background subtraction. After the illuminationchanges procedure [similar to (7)] using the new pixel-basedmatching threshold λ′

P in (2), the pixels in this area arematched with the corresponding CB C1. If they match, thenthe matched codeword c1

i is updated, and the pixel is classifiedas background and the pixel-based gray recorder Gray1 isupdated as in (8). For the unmatched codewords, there mightbe some fake foregrounds, such as shadows and highlights. Toseparate these from the true foreground, the former algorithmin Carmona et al. [16] is applied, in which a cone-shaped colormodel is proposed to improve the CB’s color model, whichis more robust and is employed in this paper to classify thepixels.

Fig. 6 shows the color model in RGB color space, whichis constructed by the pixel-based codeword c1

i = (pei |e =

R, G, B), which is the ith codeword in the background modelCB of the length L

C1; to classify the pixels in RGB colorspace, a high bound IMAX and a low bound Imin are calculatedas follows:

‖c1i ‖ =

√(pR

i )2 + (pGi )2 + (pB

i )2 (9)

IMAX = β‖c1i ‖ (10)

Imin = γ‖c1i ‖ (11)

where β is greater than 1, and γ is smaller than 1, or morespecifically, β= 1.25 and γ= 0.7, which sets a high bound and alow bound, respectively. In Fig. 6, an angle parameter θcolor = 3

Fig. 6. Color model used in pixel classification.

is used to define the region of the cone-shaped color model.Similar to a bright environment, the countermeasure is towiden the middle part of the color model by reducing the γ

(which also yields a smaller Imin) and increasing the θcolor toseparate the shadows. Conversely, under a darker environment,the highlights are rather apparent, and then, we can obtaina cone-shaped color model with a wider top. By applyinga greater θcolor and β (a greater IMAX can be yielded), thehighlights can be classified correctly. To verify which regionthat the current pixel vector Xt

m,n ={Xe

m,n |e = R, G, B}

belongs to, the projected vector Xproj from Xtm,n to c1

i iscalculated first, as follows:

Xproj =〈Xt

m,n · c1i 〉

‖c1i ‖

c1i (12)

where C1i = C1

i /∥∥C1

i

∥∥ denotes the unit vector of C1i , and the

inner product⟨Xt

m,n · C1i

⟩is calculated as follows:

〈Xtm,n · C1

i 〉 = xm,n(pGi )2 + xG

m,n(pGi )2 + xB

m,n(pBi )2 (13)

so that∥∥Xproj

∥∥ can be calculated as follows:

‖xproj‖ =〈Xt

m,n · c1i 〉

‖c1i ‖

, where ‖C1i ‖ = 1. (14)

Subsequently, the angle between the current pixel Xtm,n and

the codeword C1i can be calculated as follows:

θXtm,n,c

1i

= tan−1

(distxt

m,n, c1i

||Xproj||

)(15)

where distXtm,n,,c

1i

denotes the distance between Xtm,n and Xproj,

the calculation method is shown next

distXtm,nc

1i

=√

‖Xtm,n‖2 − ‖c1

i ‖2 (16)

where denotes the L2-norm of Xt and is defined as

‖Xtm,n‖ =

√(xR

m,n)2 + (xGm,n)2 + (xB

m,n)2. (17)

If the angle θXtm,n,c

1i

is smaller than the angle parameterθcolor, then the current pixel vector belongs to the cone-shapedregion. The high bound is shown with green and is defined asthe highlight region; the low bound is shown with blue and is

Page 7: Fast Background Subtraction Based on a Multilayer Codebook Model for Moving Object Detection

GUO et al.: FAST BACKGROUND SUBTRACTION BASED ON A MULTILAYER CODEBOOK MODEL 1815

defined as the shadow region. The overall color model usedin this paper is organized as follows:

classificationpixel(Xtm,n, c

1i )

=

⎧⎨⎩

Shadow, if θXtm,nc

ti< θcolorImin ≤ ‖Xproj‖ < ‖pi‖

Highlight, if θXm,n,c1i< θcolor‖c1

i ‖ ≤ ‖Xproj‖ < IMAX

Foreground, otherwise.(18)

With the color model above, the result of the block-basedCB model is further refined in the pixel-based phase. From thecone-shaped color model using the public test sequences [18]–[21], the definition of the highlight is when the test vectorsare greater than the high bound IMAX, which represents uselessinformation in the foreground.

IV. Fake Foreground Removal Model

Although further refinement of the block-based phase usingthe four-layer strategy can lead to a performance able toaddress most situations, there are still some specific scenariosto be considered. For example, a moving object becomes astationary background when it stands for a length of timeduring the period of background subtraction. The bottom-rightcorner of Fig. 2 shows the relationship between the FFRM andthe background model. The FFRM is to update the backgroundmodel CBs, while the background model is for backgroundsubtraction.

To adapt to the current situation, the algorithm of theFFRM, as illustrated in Fig. 7, is used to record the non-background information, and the construction method is iden-tical to the background model. Each time that a block isclassified as a nonbackground region, the FFRM is used torecord the information in this block. The FFRM is con-structed and updated in a frame, which can be expressed asSM =

{SM

i |1 ≤ i ≤ LSM}

, where LSM denotes the lengthof the FFRM, and includes a vector to record the features, aweight variable WsM

ito record the updated times (also known

as importance), and a time variable timesMi

. After updatingthe block-based FFRM, the two-stage FFRM procedure isproposed to address the current environment, as introducedbelow, which is illustrated in the bottom parts of Fig. 7.

1) Delete the FFRM codeword: The time variable is usedto check the codewords in the FFRM. If the block-basedcodeword stays in the FFRM for a long time and has notbeen updated during a period of time, then it is regardedas a temporary codeword to be deleted. At a detailed level,if the time period between the current time t and the lastupdated time of the ith temporary codeword timesM

iis greater

than the fake foreground deleting parameter deletes(= 5), thenremove this redundant codeword from the FFRM. Thus, thedetermined size of the updated SM is also changed accordingly,as follows:

SM ={

SMi |t − timeSM

i< deletes

}(19)

LSM = dim(SM). (20)

Fig. 7. FFRM updating background model for adapting scene.

2) Delete and add the background codeword: In theprevious procedure, if the weight variable WsM

iof the ith

temporary codeword in the FFRM is greater than or equalto the parameter addS,B = 100, then it denotes that a thresholdweight is used to determine whether a codeword should bemoved from FFRM to the background model or not; next,the codeword is added to the background model BM from theFFRM SM . Herein, the three parameters, deletes = 5, deleteB =500, and addS,B = 100 and in the FFRM, are fixed for variousenvironments. In the future, a further study can be conducted toadaptively adjust the parameters under various circumstancesto yield the optimum effects. This procedure means that thetemporary codeword stays in the FFRM for a long time and isstill updated; thus, the information in this temporary codewordis sufficiently robust to construct the background, as follows:

CM ={

CMi |t − timecM

i< deleteB

}∪

{SM

i |WSMi

< addS,B

}where M = 1, 4, 8, 16 (21)

SM ={

SMi |WSM

i< addS,B

}(22)

LCM = dim(CM) (23)

LSM = dim(SM). (24)

Notably, four layers are considered in the block-basedFFRM, which are associated with each block-based back-ground model; thus, the variable M in Fig. 7 can be 1, 4,8, and 16 (M = 1 associates with the pixel-based backgroundmodel). Similar to Fig. 3, the algorithm of the pixel-basedCB updating is similar to the block-based updating, and thecorresponding algorithm that is used to construct the pixel-based FFRM is identical to the construction of the block-based model. The only difference is that the input vectorchanges to the pixel vector, which is classified as foregroundby the color model in (18). In summary, the proposed FFRM

Page 8: Fast Background Subtraction Based on a Multilayer Codebook Model for Moving Object Detection

1816 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 10, OCTOBER 2013

provides an independent method to update the backgroundmodel according to the staying time of the foreground, ratherthan the former method, which must conform to the color Simmodel. By this updating strategy, a background model withcompound properties can address usual backgrounds as wellas a simultaneously moving foreground.

V. Experimental Results

In this section, the performance of the proposed method isevaluated with respect to various criteria. Herein, six criteria,which are the false positive rate (FPR), true positive rate(TPR), Pr, Sim, percentage of wrong classifications (PWC)[17], and F-measure (Fm) [17], are employed, as formulatedbelow

FPR =fp

fp + tn; TPR =

tp

tp + fn; Pr =

tp

tp + fp

Sim =tp

tp + fp + fn; PWC = 100 × fn + fp

tp + fn + fp + tn

Fm = 2 × Pr × TPR

Pr × TPR(25)

where tp, tn, fp, and fn denote the numbers of the truepositives, true negative, false positive, and false negative,respectively.

In addition, the video sequences of the public databases[18]–[20] used in this paper lack foregrounds at the begin-ning; the starting points of the sequences without foregroundsare CAMPUS 200, WATERSURFACE 480, MEETING-ROOM 1755, INDOORGTTEST1 342, and INTELLIGENTROOM 82. However, the public database for change detection[21] does not provide training frames; instead, it uses all ofthe frames before the first ground truth. Basically, differentvideo sequences have different frames for training. Althoughthe training frames are different, the background model isupdated somehow; hence, with respect to an infinite amountof time, the proposed method is quite steady.

A. Comparison of the Extracted Features

The average performance and the frame per second (FPS)comparisons between the HCB [12] and the proposed methodare presented in Tables I–IV. In addition, the proposed MCBSmodel can yield an even lower computing cost, as explainedbelow. With the block-based HCB model, the high- and low-mean values of the blocks were employed to solve the dynamicbackground and to improve the processing speed, while in thispaper, the reliability of the system is improved compared withthe dynamic background problem using the mean value only.Table I shows the comparison in terms of FPS under the SDsequence format, which shows that the proposed scheme canstill meet the real-time demand.

Fig. 8 shows the block-based results of these two meth-ods with the test sequence WATERSURFACE [19], of size160 × 128. The first column of Fig. 8 shows the originalimages of different frames, the second column shows theresults of block-based HCB with a block of size 8 × 8, andthe third column shows the results of the proposed method.

TABLE II

FPS Comparison Between HCB and Proposed Method Using SD

Images

TABLE III

Average Performance Comparison of HCB and the Proposed

Method (Block-Based Results) Using Video Sequence

Watersurface

TABLE IV

Fps Comparison of the HCB and the Proposed Method

The three rows represent different frames, #401 and #547.Fig. 9(a) shows the statistical curve of the HCB feature valuesin the same block position, which associates to the red blockin the first row of Fig. 8. Fig. 9(b) shows the statisticalcurve of the mean values used in this paper in the sameblock. The horizontal axis denotes the time index (t) of theframes, and the vertical axis represents the values of theextracted features. The two curves are mostly smooth to provethat the extracted features for the block-based backgroundmodels are satisfactory for most parts of this sequence (withoutforeground). Before a foreground (a person, in this sequence)moves into this block at frame #488, curve (a) changesdrastically for a long-time period, yet the curve should bestable before this block is filled with foreground. With thematch function in the HCB, this block should be determinedas background before the foreground enters, yet the curve inFig. 9(a) has very large fluctuations before frame #488, whichcauses wrong detections. Thus, the BTC feature values usedin the HCB are not stable to address this issue. Conversely,the curve of the mean feature values used in the proposedmethod shows good reliability in the block-based backgroundsubtraction, for the practical results shown in the third columnof Fig. 8. The average performance comparisons betweenthe HCB [12] and the proposed method are presented inTable III; these comparisons involve frames 481–525 of thetest sequence WATERSURFACE [18].

Except for the reliability, the processing time is also animportant issue in the background subtraction. Table III showsthe FPS rate comparisons of the HCB scheme and the proposedmethod. HCB’s BTC values require multiple calculations inone color channel and, thus, impede the processing speed.As discussed in this section, the processing speed is anotherimportant issue in computer vision. Herein, the SD (a.k.a.480 p) sequence is involved for the performance test in

Page 9: Fast Background Subtraction Based on a Multilayer Codebook Model for Moving Object Detection

GUO et al.: FAST BACKGROUND SUBTRACTION BASED ON A MULTILAYER CODEBOOK MODEL 1817

TABLE V

Change Detection Benchmark Dataset (Best Performance of Each Metric Had Been Circled)

Fig. 8. Block-based background subtraction results with WATERSURFACE.Column 1: Original images. Column 2: Block-based results of HCB [12].Column 3: Block-based results of proposed method.

terms of the processing efficiency as a test sequence of size720 × 480 and a total of 347 frames [the test sequences wereestablished from National Taiwan University of Science andTechnology (NTUST)]. Table II shows the comparison of theFPS rate between the HCB scheme [12] and the proposedmethod. As can be seen, the proposed method can meet thereal-time requirement under this scenario.

B. Performance Comparison

Fig. 10 shows the results using the test sequence WATER-SURFACE [18], which contains 636 frames of size 160 × 128.This sequence involves a nonstationary background, such as arippling sea surface. Compared with the three former methods,MoG [3], CB [8], and HCB [12], the proposed method andHCB can provide better performance. To further examine the

practical performance of the HCB and the proposed method,Table I shows the difference; the proposed method providesa better capacity to remove the dynamic background. Underthe block of size 8 × 8 scenario, the HCB [12] has somedrawbacks, such as a blocking effect, as shown at the person’sfeet in Fig. 10(e). In addition, Fig. 11 shows the test sequenceCAMPUS [18], which suffers from a serious dynamic back-ground, such as waving trees and a waving flag, in the scene.Again, the proposed method, as shown in Fig. 11(f), providesslightly superior performance than the other, former schemes.Fig. 12 shows the indoor scenario using the video sequenceMEETINGROOM [18] to show the performance of the HCBand the proposed method. The sequence contains 2964 framesof size 160 × 128, and the background is nonstationary. Theshutter in the background is waving, which is difficult toaddress. As shown in Fig. 12(c), the HCB renders a greaterfake foreground. Yet, in Fig. 12(d), the proposed methodpresents better quality by removing more fake foreground.

To provide a more objective evaluation, the average per-formances with all of the above three videos are organized inTable I, in which the best performance is circled for each met-ric. Obviously, superiority can be obtained with the proposedscheme compared with the former methods. Moreover, anothercomparison with 20 former techniques using a different dataset[21] is also provided, and the corresponding results are orga-nized in Table V. This dataset involves six different scenariosthat use the same parameter values, including the baseline,dynamic background, camera jitter, intermittent object motion,

Page 10: Fast Background Subtraction Based on a Multilayer Codebook Model for Moving Object Detection

1818 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 10, OCTOBER 2013

Fig. 9. Block-based feature values in the same block position [associated torow 1 in Fig. 7, size = 8 × 8, position (8,16) to (15,23)]. (a) BTC values ofHCB [12]. (b) Proposed mean values.

Fig. 10. Background subtraction results with WATERSURFACE. (a) Originalimage. (b) Ground truth. (c) MoG [3]. (d) CB [8]. (e) HCB [12]. (f) Proposedmethod.

shadow, and thermal. In addition, each measure in this tableis averaged from all of the videos in the six different cases.According to the results, the proposed method can obtainthe best performance in terms of the metrics PWC and Fm.Moreover, with respect to the remaining five measures (Re,Sp, FPR, FNR, and Pr), the proposed method can still beconsidered to be a good method across various environmentsbecause of its balanced performance (Figs. 13 and 14).

Regarding the processing efficiency, the two test sequencesWATERSURFACE and CAMPUS are adopted for testing with-out losing generality, and the corresponding results are or-ganized in Table IV. Moreover, when compared with theViBe [13] at a higher resolution of 320 × 240, because itsimply collects the mean value and a set of samples toestablish the background models, a high processing speed of200 FPS can be achieved [13], while the proposed schemecan yield 120 FPS with frames of the same size. Althougha relatively lower processing efficiency can be obtained by

Fig. 11. Background subtraction results with CAMPUS. (a) Original image(frame #695). (b) Ground truth. (c) MoG [3]. (d) CB [8]. (e) HCB [12].(f) Proposed method.

Fig. 12. Background subtraction results with MEETINGROOM. (a) Originalimage (frame #2236). (b) Ground truth. (c) HCB [12]. (d) Proposed method.

Fig. 13. Background subtraction results with INDOORGTTEST1. (a) Orig-inal image (frame #365). (b) HCB [12]. (c) Proposed method without lightprocedure. (d) Proposed method with light procedure.

the proposed method, it is still more than satisfactory interms of the real-time requirement for the prospective practicalapplications. Pertaining to the memory issue, suppose thatthe RGB color model is utilized in ViBe [13]; then, thecorresponding memory consumption for a frame of size I × Jis simply I × J × 3N bytes, where N denotes the numberof samples that are stored in each pixel-based model, and thenumber 3 denotes the red, green, and blue color channels.Conversely, in the proposed CB algorithm, either a block or apixel is represented by a CB individually, and it is representedas a compressive form of the background of a long-term imagesequence. Moreover, each CB is composed of codewords thatcomprise the colors that are transformed by an innovative colordistortion metric (as defined in Section III-B). The requiredmemory varies across different environments (the requirednumber of codewords is not a constant), which suggests thatthe memory consumption is difficult to accurately estimate.Hence, let C denote the length of each CB, and suppose thatthe four background layers (1, 4, 8, and 16) have an identicalamount of CB in this estimation. For each frame of size I × J ,the memory consumption is

(1

162 + 182 + 1

42 + 1) × I × J × C

bytes for the proposed method because each block simplyrequires one mean (one byte) only. Compared with the ViBe,because the average length of the proposed CB usage isapproximately 10 (C = 10 bytes) empirically and the length

Page 11: Fast Background Subtraction Based on a Multilayer Codebook Model for Moving Object Detection

GUO et al.: FAST BACKGROUND SUBTRACTION BASED ON A MULTILAYER CODEBOOK MODEL 1819

Fig. 14. Accuracy value for the sequence INDOORGTTEST1. (a) FPR. (b) TPR. (c) Pr. (d) Sim. (e) F-measure. (f) PWC.

(N) for ViBe’s background model is 20 according to itsexperimental settings [13], the memory consumption for theproposed method is 32.5 bytes/pixel, which is superior to the60 bytes/pixel required in ViBe. Based on the premise ofsuperiority in terms of memory consumption, the performanceof the proposed method is also better than ViBe as well asViBe +, as shown in Table V, in terms of the PWC and Fmmetrics.

Intuitively, the block size should adapt to the frame reso-lution to obtain a higher suitability. However, suppose thata larger block size (>16) is employed; then, two extremeconditions could arise: 1) all of the pixels in that block are

passed (to the layer of the pixel level) or 2) all of the pixelsin that block are rejected (as foreground/highlight/shadow).For the first scenario, the reason is that the large block sizecannot handle/describe all of the variations of the pixels insideis because only one mean is adopted in the proposed method;for the second scenario, the layer-structure of the proposedalgorithm cannot yield the expected speed-up effect becauseall of the elements within that block are processed by the pixel-based model, thus causing a relatively high computationalcomplexity. Consequently, according to these above consid-erations, we still opt to use the constant block sizes, 1, 4, 8,and 16, for the proposed method.

Page 12: Fast Background Subtraction Based on a Multilayer Codebook Model for Moving Object Detection

1820 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 10, OCTOBER 2013

Fig. 15. Result of the pixel classification using INTELLIGENT ROOM.(a) Original image. (b) Result of the pixel classification.

C. Performances With/Without the Detect IlluminationChanges Procedure

Fig. 10 shows the results using the test sequence WATER-SURFACE [18], which contains 636 frames of size 160 × 128.This sequence involves nonstationary background, such as arippling sea surface. Compared with the three former methods,MoG [3], CB [8], and HCB [12], the proposed method andHCB can provide better performance. To further examine thepractical performance between the HCB and the proposedmethod, Table I shows the difference; the proposed method canprovide better capacity for removing the dynamic background.Under the block of size 8 × 8 scenario, the HCB [12] hassome drawbacks, such as a blocking effect, as shown at theperson’s feet in Fig. 10(e). In addition, Fig. 11 shows thetest sequence CAMPUS [18], which suffers from a seriousdynamic background, such as waving trees and a wavingflag in the scene. Again, the proposed method, as shown inFig. 11(f), provides a slightly superior performance comparedwith the other former schemes. Fig. 12 shows the indoorscenario using the video sequence MEETINGROOM [18], toshow the performance of the HCB and the proposed method.The sequence contains 2964 frames of size 160 × 128, andthe background is nonstationary. The shutter in the back-ground is waving, which is difficult to address. As shown inFig. 12(c), the HCB renders a larger fake foreground. However,in Fig. 12(d), the proposed method presents better quality byremoving more of the fake foreground.

The concept of pixel classification has been introducedin Section III-B. In this section, the video sequenceINTELLIGENT ROOM [20] is employed for testing, withthe results given in Fig. 15, in which Fig. 15(a) shows theimages over a period of time, and the corresponding resultsare shown in Fig. 15(b), and they are not compensated byany post-processing. In the classification, the foreground,shadow, and highlight pixels are colored in blue, red, andgreen, respectively. As can be seen, the pixel-based colormodel can resist against the influence caused by illuminationchanges, and the foregrounds are classified successively. Thecorresponding execution speed is 90.56 FPS. Notably, the

proposed method cannot address the camouflage problem.Future work can be focused on using multiple cameras or aninfrared system to acquire the additional depth informationthat is required to solve this issue.

VI. Conclusion

In this paper, a multilayer adaptive block-based strategywas proposed along with the adopted mean feature from theseparated blocks. The proposed method removed most of thebackground when suffering from dynamic background andsolved the blocking effect deficiency in the HCB method.Moreover, the multilayer scheme also significantly improvedthe processing efficiency. Given a video of SD resolution, theproposed scheme still provided real-time processing capability.However, because the MCBS employed RGB informationfor modeling the background subtraction, it was difficult todistinguish the foreground and background when they havesimilar colors. In future work, the depth information wasinvolved in the proposed MCBS model to solve the camouflageissue. In more detail, by obtaining information from staticvideo surveillances and considering spatially registered, time-synchronized parameters of color and depth image spacesacross a series of times, we believe that we will achieve abetter result for extracting foreground objects of a similar colorfrom the backgrounds. Overall, the proposed method was agood candidate for intelligent object detection.

References

[1] R. Cucchiara, C. Grana, M. Piccardi, and A. Prati, “Detection mov-ing objects, ghosts, and shadows in video streams,” IEEE Trans.Pattern Anal. Mach. Intell., vol. 25, no. 10, pp. 1337–1342, Oct.2003.

[2] T. Horprasert, D. Harwood, and L. S. Davis, “A statistical approach forreal-time robust background subtraction and shadow detection,” in Proc.IEEE Int. Conf. Comput. Vision, vol. 99. Sep. 1999, pp. 1–19.

[3] E. J. Carmona, J. Martinez-Cantos, and J. Mira, “A new video segmenta-tion method of moving objects based on blob-level knowledge,” PatternRecognit. Lett., vol. 29, no. 3, pp. 272–285, Feb. 2008.

[4] C. Stauffer and W. E. L. Grimson, “Learning patterns of activity usingreal-time tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22,no. 8, pp. 747–757, Aug. 2000.

[5] N. Martel-Brisson and A. Zaccarin, “Learning and removing cast shad-ows through a multidistribution approach,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 29, no. 7, pp. 1133–1146, Jul. 2007.

[6] J.-S. Hu and T.-M. Su, “Robust background subtraction with shadowand highlight removal for indoor surveillance,” EURASIP J. Adv. SignalProcess., vol. 2007, no. 1, pp. 1–14, Jan. 2007.

[7] G. Xue, J. Sun, and L. Song, “Background subtraction based on phaseand distance transform under sudden illumination change,” in Proc.IEEE Int. Conf. Image Process., Sep. 2010, pp. 3465–3468.

[8] K. Kim, T. H. Chalidabhongse, D. Harwood, and L. Davis, “Real-timeforeground-background segmentation using codebook model,” Real-Time Imaging, vol. 11, no. 3, pp. 172–185, Jun. 2005.

[9] M. Wu and X. Peng, “Spatio-temporal context for codebook-baseddynamic background subtraction,” AEU – Int. J. Electron. Commun.,vol. 64, no. 8, pp. 739–747, Aug. 2010.

[10] L. Maddalena and A. Petrosino, “A self-organizing approach to back-ground subtraction for visual surveillance applications,” IEEE Trans.Image Process., vol. 17, no. 7, pp. 1168–1177, Jul. 2008.

[11] T. Kohonen, Self-Organization and Associative Memory. 2nd ed. Berlin,Germany: Springer-Verlag, 1988.

[12] J.-M. Guo, Y.-F. Liu, C.-H. Hsia, M.-H. Shih, and C.-S. Hsu, “Hi-erarchical method for foreground detection using codebook model,”IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 6, pp. 804–815,Jun. 2011.

Page 13: Fast Background Subtraction Based on a Multilayer Codebook Model for Moving Object Detection

GUO et al.: FAST BACKGROUND SUBTRACTION BASED ON A MULTILAYER CODEBOOK MODEL 1821

[13] O. Barnich and M. Van Droogenbroeck, “ViBe: A universal backgroundsubtraction algorithm for video sequences,” IEEE Trans. Image Process.,vol. 17, no. 6, pp. 1709–1724, Jun. 2011.

[14] V. Pham, P. Vo, V. T. Hung, and L. H. Bac, “GPU implementationof extended Gaussian mixture model for background subtraction,” inProc. IEEE Int. Conf. Computing Communication Technologies ResearchInnovation Vision Future, Nov. 2010, pp. 1–4.

[15] J.-S. Chiang, C.-H. Hsia, H.-W. Peng, C.-H. Lien, and H.-T. Li,“Saturation adjustment method based on human vision with YCbCr colormodel characteristics and luminance changes,” in Proc. IEEE Int. Symp.Intell. Signal Processing Commun. Syst., Nov. 2012, pp. 136–141.

[16] E. J. Carmona, J. Martinez, and J. Mira, “A new video segmentationmethod of moving objects based on blob-level knowledge,” PatternRecognit. Lett., vol. 29, no. 3, pp. 272–285, Feb. 2008.

[17] N. Goyette, P.-M. Jodoin, F. Porikli, J. Konrad, and P. Ishwar, “changede-tection.net: A new change detection benchmark dataset,” in Proc. IEEEComput. Soc. Conf. Computer Vision Pattern Recognition Workshops,Jun. 2012, pp. 1–8.

[18] Statistical Modeling of Complex Background for ForegroundObject Detection. [Online]. Available: http://perception.i2r.a-star.edu.sg/bk−model/bk−index.html

[19] Performance Evaluation of Surveillance Systems. [Online]. Available:http://www.research.ibm.com/peoplevision/performanceevaluation.html

[20] Shadow Detection. [Online]. Available:http://cvrr.ucsd.edu/aton/shadow/index.html

[21] A Change Detection Benchmark Dataset. [Online]. Available:http://www.changedetection.net

Jing-Ming Guo (M’06–SM’10) was born in Kaoh-siung, Taiwan, on November 19, 1972. He receivedthe B.S.E.E. and M.S.E.E. degrees from NationalCentral University, Taoyuan, Taiwan, in 1995 and1997, respectively, and the Ph.D. degree from theInstitute of Communication Engineering, NationalTaiwan University, Taipei, Taiwan, in 2004.

He is a Professor with the Department of ElectricalEngineering, National Taiwan University of Sci-ence and Technology, Taipei. His research interestsinclude multimedia signal processing, multimedia

security, computer vision, and digital halftoning.Dr. Guo was invited to be the Technical Program Chair for the IEEE

International Symposium on Intelligent Signal Processing and CommunicationSystems in 2012 and the IEEE International Symposium on ConsumerElectronics in 2013. He has been invited to be a Lecturer for the IEEE SignalProcessing Society Summer School on Signal and Information Processing in2012. He has been elected as the Chair of the IEEE Taipei Section GOLDGroup in 2012. He has served as a Guest Co-Editor of two special issues forthe Journal of the Chinese Institute of Engineers and the Journal of AppliedScience and Engineering. He serves on the Editorial Board of the Journal ofEngineering and The Scientific World Journal. Currently, he is an AssociateEditor of IEEE Signal Processing Letters, IEEE Transactions on

Multimedia, Information Sciences, and Signal Processing. He is a SeniorMember of the IEEE Signal Processing Society and a fellow of the IET. Hereceived the Outstanding Youth Electrical Engineer Award from the ChineseInstitute of Electrical Engineering in 2011, the Outstanding Young InvestigatorAward from the Institute of System Engineering in 2011, the Best Paper Awardfrom the IEEE International Conference on System Science and Engineeringin 2011, the Excellence Teaching Award in 2009, the Research ExcellenceAward in 2008, the Acer Dragon Thesis Award in 2005, the OutstandingPaper Awards from IPPR, Computer Vision and Graphic Image Processing in2005 and 2006, and the Outstanding Faculty Award in 2002 and 2003.

Chih-Hsien Hsia (M’10) was born in Taipei City,Taiwan, in 1979. He received the B.S. degree incomputer science and information engineering fromTaipei Chengshih University of Science and Tech-nology, Taipei, Taiwan, in 2003 and the M.S. degreein electrical engineering and the Ph.D. degree fromTamkang University, New Taipei, Taiwan, in 2005and 2010, respectively.

He was a Visiting Scholar with Iowa State Uni-versity, Ames, IA, USA, in 2007. From 2010 to2013, he was a Post-Doctoral Research Fellow with

the Department of Electrical Engineering, National Taiwan University of

Science and Technology, Taipei, Taiwan. He joined as Faculty Member of theDepartment of Electrical Engineering at Tamkeng University, from 2010 to2013 as an Adjunct Associate Professor. He currently is an Associate Professorwith the Department of Electrical Engineering, Chinese Culture University,Taipei, Taiwan. His research interests include DSP IC design, image/videoprocessing, multimedia compression system design, multiresolution signalprocessing, and computer/robot vision processing.

Dr. Hsia is a member of the Phi Tau Phi scholastic honor society. He hasserved as a Guest Editor of special issues for Journal of Applied Science andEngineering.

Yun-Fu Liu (S’09) was born in Hualien, Taiwan,on October 30, 1984. He received the M.S.E.E. de-gree from the Department of Electrical Engineering,Chang Gung University, Taoyuan, Taiwan, in 2009.He is currently pursuing the Ph.D. degree with theDepartment of Electrical Engineering, National Tai-wan University of Science and Technology, Taipei,Taiwan.

He was a Visiting Scholar with the Departmentof Electrical and Computer Engineering, Universityof California, Santa Barbara, CA, USA, in 2012.

His research interests include computer vision, machine learning, digitalhalftoning, steganography, image compression, and enhancement.

Mr. Liu is a Member of the IEEE Signal Processing Society.

Min-Hsiung Shih was born in Kaohsiung, Taiwan,on December 25, 1987. He received the B.S. de-gree from the Department of Computer and Com-munication Engineering, National Kaohsiung FirstUniversity of Science and Technology, Kaohsiung,Taiwan, in 2010. Currently, he is pursuing themaster’s degree with the Department of ElectronicEngineering, National Taiwan University of Scienceand Technology, Taipei, Taiwan.

His research interests include pattern recognitionand intelligent surveillance systems.

Cheng-Hsin Chang was born in Taipei, Taiwan,on September 4, 1990. He received the B.S.E.E.degree from National Taiwan University of Scienceand Technology, Taipei, Taiwan, in 2012, where heis currently pursuing the master’s degree with theDepartment of Electrical Engineering.

His research interests include video synopsis, ob-ject tracking, and intelligent surveillance systems.

Jing-Yu Wu was born in Nantou, Taiwan, on Oc-tober 29, 1990. She received the B.S. degree fromthe Department of Electronic Engineering and theB.A. degree from the Department of Applied ForeignLanguage, National Taiwan University of Scienceand Technology, Taipei, Taiwan, in 2012, where sheis currently working toward the master’s degree withthe Department of Electrical Engineering.

Her research interests include behavior analysis.


Recommended