Adaptive background model for non-static background subtraction by estimation of the color change...

Electron. Mater. Lett., Vol. 9(S): pp. 33-38 (2013)

Adaptive Background Model for Non-Static Background Subtraction by

Estimation of the Color Change Ratio

Jeisung Lee,1 Minkyu Cheon,

1 Chang-Ho Hyun,

2 Hyukmin Eum,

1 and Mignon Park

1,*

1School of Electrical & Electronic ENG, Yonsei University, Seoul 120-749, Korea

2Division of Electrical Electronic and Control Engineering, Kongju National University, Cheonan, Chungnam 331-717, Korea

(received date: 15 April 2013 / accepted date: 16 July 2013 / published date: 10 October 2013)

Background modeling, a preliminary processing step for foreground detection, is a challenging task becauseof the complexity and variety of background regions and unexpected scenarios such as sudden illuminationchanges, waving trees, rippling water, etc. In this work, we develop a pixel-based background modeling methodthat uses a probabilistic approach by means of changing color sequences. This method uses two backgroundmodels in tandem. The first model uses a static background, which is obtained via a probabilistic approachand is a standard from which the foreground is extracted. The second method uses an adaptive background,which is modeled by the degree of color change. This background functions as an additional standard fromwhich the foreground is extracted and is appropriate for eliminating non-static background elements. Thesemodels enable the developed method to automatically adapt to various environments. The algorithm wastested on various video sequences and its performance was evaluated by comparison with other state-of-the-art background subtraction methods.

Keywords: background modeling, foreground extraction, adaptive background, video surveillance

1. INTRODUCTION

Background subtraction is a key task in intelligent video

surveillance systems. Background subtraction methods play

an important role in many machine vision applications, such

as object detection and video surveillance systems (e.g.,

abnormal behavior detection or traffic flow monitoring).

Typically, each pixel is searched and compared systematically

with a predefined object dataset to detect or track an object.

However, searching every pixel requires considerable

computational time; consequently, background subtraction

methods are often used to reduce the region searched,

thereby improving computational performance. The use of

background subtraction methods to separate foreground

objects significantly affects the performance of the overall

vision system; therefore, a good background subtraction

method is critical for efficient overall system performance.

However, many challenges in background modeling remain

to be overcome. Backgrounds are generally non-static, often

including motion from swaying trees and curtains, escalators,

and rippling water surfaces. Illumination changes are also

frequent, either because of the passage of time or by the sun

being covered by clouds. The background model should be

able to adapt to such environmental changes. The background

can also be changed by objects moving into and staying in

the background, e.g., a car that has been parked for a long

period of time should be considered as part of the

background.

Pattern analysis is widely used in various field.[1] Most of

the background subtraction methods also based on pattern

analysis. A popular method for non-stationary background

subtraction is the mixture of Gaussians (MOG) method.[2,3]

MOG is used to resolve non-static background problems.

This approach is mathematically sound and can be applied to

various environments. However, because the number of

Gaussian models in each pixel must be selected before

implementation, a fixed number of Gaussian models cannot

always adapt to different environments. The process of

modeling and updating the Gaussians models also takes a

long time. Chiu et al.[4] developed a probabilistic approach

for background subtraction to extract the foreground for

each image environment using the color distribution. This

algorithm is very fast and robust. However, the algorithm

does not consider dynamic background environments and

thus is only effective for static background environments. V.

Reddy effectively used a Markov random field for background

estimation;[5] however, this method is more appropriate for

indoor environments. Kim et al.[6] developed a codebook

model that quantizes sample background values for each

pixel into codebooks that represent a compressed form of the

background model. The codewords that are absent from the

DOI: 10.1007/s13391-013-3172-5

*Corresponding author: [email protected]©KIM and Springer

34 J. Lee et al.

Electron. Mater. Lett. Vol. 9(S) (2013)

sequence for extended periods of time are eliminated from

the codebook model, and new images that have been

stationary for some time are quantized into the codebooks.

This algorithm is not especially fast but is very effective for

dynamic backgrounds. Maddalena et al.[7] developed an

approach based on a self-organizing feature map that has

been widely applied in human image processing and

implemented more generally in cognitive science. The

algorithm perform well and at faster speeds than the

codebook scheme, but many of its parameters must be

manually selected for each individual video environment;

Elgammal et al.[8] and Lanasi[9] developed non-parametric

methods to address this drawback. The kernel density

estimation (KDE) method is also a very effective method for

modeling backgrounds. This approach effectively adapts to

environments automatically; but places a heavy memory

burden on the system. Varcheie et al.[10] combined a region-

based method based on color histograms and texture

information with the Gaussian mixture model. This approach

performs well but is excessively complex. Heikkila et al.[11]

used an adaptive local binary pattern (LBP) to extract

features from an image by comparing neighboring pixel

values to a center pixel. It is difficult to distinguish between

areas with similar textures with this method, and the resulting

segmentation is also limited to a circular area around the

center pixel.

A non-stationary background subtraction method is

developed in this paper. In this method, the degree of color

change occurs in HSV color space rather than in RGB color

space because RGB color space is sensitive to illumination

changes, and more robust background subtraction results are

obtained by analyzing the color and illumination in the input

sequences. This methods uses two background models. The

first model uses a static background, which is obtained via a

probabilistic approach and is uses as a standard to extract the

foreground. The other model uses an adaptive background

that is continuously updated with the current input image if

the degree of color change is below the set thresholds. This

background works as an additional standard to extract the

foreground. The adaptive background model enables suddenly

changing backgrounds to be effectively subtracted. The two

backgrounds are used together to subtract non-stationary

backgrounds in a robust manner. We evaluate the performance

of the developed method by comparison with existing state-

of-the-art methods. We also show that the developed algorithm

is less complex and produces more accurate results than the

other methods.

2. BACKGROUND MODELING AND FORE-

GROUND EXTRACTION METHOD

2.1 Static Background Model

The developed method uses a distance classifier to classify

the input image of each pixel over Th_t frames. The

stochastically high components of each pixel are then

classified as static background pixels. Let us assume that ft(x,

y, k) is the tth frame in an image sequence, where x and y are

the pixel coordinates and k is the color dimension. Each

pixel in ft(x, y, k) is composed of three components: the hue

(ft(x, y, 1)), the saturation (ft(x, y, 2)), and the illumination

(ft(x, y, 3)). Following pixel classification, the parameters

H(x, y, nxy), S(x, y, nxy) and V(x, y, nxy) are defined as the hue,

the saturation, and the illumination, respectively, of the nxyth

cluster located at (x, y). The parameter nxy records the cluster

number located at (x, y). The parameters C(x, y, nxy) and T(x,

t, nxy) are the pixel number and the most recent time that a

pixel appeared, respectively, at the nxyth cluster located at (x,

y). The first input frame f1(x, y, k) is captured as follows:

(1)

Each input frame size is M × N pixels. The next input

frame ft(x, y, k) was captured and classified to the cluster with

a distance classifier, where t = t + 1. The color difference

Dt(x, y, nxy) between the input pixel and each cluster located

at (x, y) was calculated as follows:

(2)

Next, the closest cluster number in order is found.

. (3)

The value of Th_D affects the number of clusters during

the background modeling. The number of clusters and the

computational complexity decrease with increasing values

of Th_D but also greatly distort the colors of the background

pixels. In this paper, we set Th_D = 15.

To increase the speed of the algorithm, we relocated the

most recently updated cluster. If a matched mth cluster

exists, the input pixel belongs to the mth cluster, and the

parameters C(x, y, m) and T(x, y, m) are updated as follows:

(4)

Otherwise, the input pixel is assigned to a new cluster

located at (x, y). The new color cluster is defined by:

H x y nxy, ,( ) f1 x y 1, ,( )=

S x y nxy, ,( ) f1 x y 2, ,( )=

V x y nxy, ,( ) f1 x y 3, ,( )=

C x y nxy, ,( ) 1=

T x y nxy, ,( ) 1.=

, where

y 0~M−1=

y 0~N−1=

nxy 1=⎩⎪⎨⎪⎧

⎩⎪⎪⎪⎨⎪⎪⎪⎧

Dt x y nxy, ,( ) ft x y 1, ,( )−Ht x y nxy, ,( ) +=

ft x y 2, ,( )−St x y nxy, ,( ) +

ft x y 3, ,( )−Vt x y nxy, ,( ) , nxy∀

Dt x y m, ,( ) Th_D, where m nxy⊂≤

C x y m, ,( ) C x y m, ,( ) 1+=

T x y m, ,( ) t.=⎩⎨⎧

J. Lee et al. 35


(5)

In every frame, the clusters that are not matched with input

pixels for longer than Th_t/2 are assumed to be non-

background information and are deleted. The static background

is updated after a time period Th_t. This study sets Th_t =

120. A static background model can be formulated by

finding the cluster with the maximum number of pixels after

Th_t frames:

. (6)

The cluster with the maximum number of pixels has a high

probability of being background. Thus, the static background

can be obtained as follows:

(7)

The static background model is denoted by SBack(x, y, k),

which consists of the hue (SBack(x, y, 1)), the saturation

(SBack(x, y, 2)), and the illumination (SBack(x, y, 3)). The

static background model is updated every Th_t frames in the

same way.

2.2 Adaptive Background Model

The developed algorithm uses the average change in the

color sequence to extract the foreground. First, the adaptive

background model is determined by calculating; the degree

of color change fdistt(x, y, k) from the color difference

between the current and prior images. If the value of color

difference continuously increases or decreases, then the total

difference value becomes the sum of the previous and

current color change values.

(8)

, where k = 1, 2, 3.

We use the color difference to obtain the average color

change, AvrDt(x, y, k):

(9)

where α is the learning rate, which is set to 0.01 in this study.

The parameter AvrDt(x, y, k) is initialized by the first color

difference fdist1 (x, y, k) and is continuously updated by the

process above. The adaptive background ABack(x, y, k) is

initialized by the first frame, f1(x, y, k), in the video sequence

and is updated after extracting the foreground from every

frame. The parameter AvrDt(x, y, k) is used to update the

adaptive background model, as detailed in section 2.3.

2.3 Foreground Extraction and Background Updates

The static background and the adaptive background are

initialized using the initial input frame. After obtaining the

initial background models, the background is updated using

the background modeling processes. The foreground extraction

procedure consists of two steps. First, the developed method

is used to subtract the input video sequence ft(x, y, k) from the

static background SBack(x, y, k). If the difference is larger

than Th_D for each pixel, the input pixel is classified as

temporary foreground.

(10)

The value of Fg(x, y) is used to classify whether a pixel of

the input frame at (x, y) is a foreground pixel. If Fg(x, y) is 0,

then the pixel is classified as background. If the input values

are classified as background, then the adaptive background

is updated with the current input values. If Fg(x, y) is 1, then

the pixel of the input frame at (x, y) is classified as

foreground. In the developed method, the input pixel is

compared with ABack(x, y, k) to obtain a more robust result.

(11)

(12)

The threshold Th_R is obtained experimentally. If the

input values are classified as background, then the adaptive

background is updated with the current input values. Figure

1 shows an example of an input sequence and the result of

nxy nxy 1 and

H x y nxy, ,( ) ft x y 1, ,( )=

S x y nxy, ,( ) ft x y 2, ,( )=

V x y nxy, ,( ) ft x y 3, ,( )=

C x y nxy, ,( ) 1=

T x y nxy, ,( ) 1.=⎩⎪⎪⎪⎨⎪⎪⎪⎧

+=

C x y mc, ,( ) maxnxy

∀

= C x y nxy, ,( ), where mc nxy⊂

SBack x y 1, ,( ) H x y mc, ,( )=

SBack x y 2, ,( ) S x y mc, ,( )=

SBack x y 3, ,( ) V x y mc, ,( ).=⎩⎪⎨⎪⎧

ftmp ft x y k, ,( )− ft−1 x y k, ,( )=

if fdistt−1 x y k, ,( )> 0 and ftmp> 0( )

or fdistt−1 x y k, ,( )< 0 and ftmp< 0( ) then

fdistt x y k, ,( ) ftmp fdistt−1 x y k, ,( )+=

else

fdistt x y k, ,( ) ftmp=⎩⎪⎪⎪⎨⎪⎪⎪⎧

AvrDt x y k, ,( ) 1−α( ) AvrDt−1 x y k, ,( )×=

α fdistt x y k, ,( )× k∀+

if ft x y k, ,( )−SBack x y k, ,( ) >Th_Dk 1=

3

∑⎝ ⎠⎜ ⎟⎛ ⎞

then, Fg x y,( ) 1=

else, Fg x y,( ) 0=

SBack x y k, ,( ) ft x y k, ,( ) k.∀=

Sgft x y k, ,( )−ABack x y k, ,( )

AvrDt x y k, ,( )------------------------------------------------------------

k 1=

3

∑=

if Sg <Th_g( )& Fg( x y,( ) 1)=

then, Fg x y,( ) 0=

ABack x y k, ,( ) ft x y k, ,( ) k.∀=

36 J. Lee et al.


using an adaptive background model. In the Classification

Result, the blue line represents Sg and the black line

represents Th_R, a threshold for extracting the foreground.

Figure 2 illustrates how the adaptive background is used to

extract foreground. In Fig. 2(a), the saturation and the

illumination change but the hue remains relatively flat; in

this case, the foreground can be extracted by determining the

change in the saturation and the illumination. Figure 2(b)

shows a scenario in which the hue changes but the

illumination and the saturation remain relatively unchanged;

the foreground is determined in this case. These processes

are automatically executed using the developed method.

Figure 3 shows the performance of the developed method

for various Th_R settings. We used three performance

measures to evaluate the developed method: the recall, the

precision, and the F-measure. The recall is defined as the

ratio of the number of pixels designated as foreground to the

number of true foreground pixels; this measure shows the

rate at which the algorithm classifies the foreground pixels

as foreground pixels. The precision is defined as the ratio of

the number of true foreground pixels to the number of pixels

designated as foreground; this measure indicates how many

of the pixels classified as foreground pixels are true

foreground pixels. There is usually a trade-off between the

recall and the precision; optimizing the recall usually

deteriorates the precision and vice versa. We used the F1

measure[12] to exactly evaluate the performance by con-

sidering the precision and recall results simultaneously,

because of the trade-off between the precision and the recall.

The F1 measure is expressed as:

. (13)

The F1 measure is maximized at equally high values of

the recall and precision values. Figure 3 shows that when

Th_R is approximately 4.5, the recall and precision value are

F1 r p,( ) 2pr

p r+----------=

Fig. 2. A sequence illustrating the use of adaptive background forextracting foreground: (a) the saturation and illumination change butthe hue remains relatively flat, and (b) the hue changes, but the illu-mination and saturation remain relatively unchanged.

Fig. 3 Performance of the developed method for various Th_R set-tings.

Fig. 1. An example of an input sequence and a result from using theadaptive background model; the blue line represents the hue, thegreen line represents the saturation, and the red line represents thesequence illumination. In the Classification Result, the blue line rep-resents the sum of the ratio of the difference between the input frameand the adaptive background to the average color change, and Th_R isa threshold for foreground extraction.

J. Lee et al. 37


equally high and F1 is maximized. We set Th_R to 4.5 in this

study.

Even if comparing the input sequence with the static

background for a pixel results in the pixel being classified as

foreground, the pixel can be reclassified as background by

the aforementioned process that compares the pixel with the

adaptive background. The backgrounds are updated after

extracting the foreground. If comparing an input pixel and a

static background at (x, y) classifies the pixel is foreground,

the input pixel is created as a new cluster or increases the

count of the cluster that the input pixel belongs to.

The background pixels are updated as a cluster with the

maximum number of counts for every Th_t. Each adaptive

background pixel is updated as the current input pixel if the

pixel is classified as background. Otherwise, the adaptive

background remains unchanged.

3. EXPERIMENTAL RESULTS

The performance of the developed method was evaluated

using video sequences from the Li and Wallflower datasets

(the Li dataset is available at http://perception.i2r.a-star.

edu.sg/bk_model/bk_index.html, and the Wallflower dataset

is available at http://research.microsoft.com/en-us/um/people/

jckrumm/WallFlower/TestImages.htm). These two datasets

are well- known and are often used to test background

subtraction algorithms. We compared the results obtained

using the developed method with other state-of-the-art

methods to evaluate the performance of the developed

method. Figure 4 shows the recall, precision and F1 results.

In terms of the recall results, the proposed method outperformed

the other methods, indicating that the proposed method more

accurately extracted the foreground if the pixel was true

foreground. In terms of the precision results, the performance

of the developed method was not clearly superior to the other

methods and exhibited a relatively average performance.

However, the developed method outperformed the other

methods in terms of the F1 measure. The recall and the

precision measure are important performance indicators, but

the F1 measure is the most important measure because it

incorporates both the precision and recall results and

considers their relative strengths simultaneously.

To compare the developed method with existing state-of-

the-art methods, we tested the other methods using the

parameters presented in the respective studies or we found

Fig. 4. The recall, precision and F1 results obtained using the devel-oped method and other state-of-the-art methods; the AVG columnsrepresent the average values of the results in all of the datasets.

Fig. 5. Background subtraction results: the first column shows theoriginal images in each video sequence; the second column showsthe ground truth of the original images; the third column shows theresult using the developed method; and the fourth column onwardsshow the images resulting from use of other methods.

38 J. Lee et al.


appropriate parameters by repeating the test to obtain the

best results. If default parameters were presented in study or

if the algorithms were available on the internet, we used the

default parameters in the test. The Gaussian mixture model

was used and implemented by OpenCV in default mode.

ViBe software[13] was also used to test the ViBe algorithm in

the default mode. Barnich et al.[13] also showed that the best

results for the ViBe method were generally obtained using

the default mode. The codebook algorithm was tested using

a program found on the internet.[6] We repeated the test many

times and varied the parameters to obtain the best results.

Figure 5 shows the results of the developed method and the

existing state-of-the-art methods. We thus confirmed that the

developed method yielded better results than the other

methods shown in the figure. The performance of the

methods was compared by implementing the developed

method using C programming language on a 2.53 GHz CPU

with 2 GB of RAM. The classification time of the developed

method was approximately 93.56 frames per second with

160 × 128 pixel size. This result shows that the developed

method can be used for real-time processing.

4. CONCLUSIONS

We developed a new robust background subtraction

method in this study. This method used two background

models, a static background model and an adaptive background

model. The temporary foreground was first extracted from

the static background. The adaptive background was then

used to delete the dynamic background components, such as

waving trees and rippling waters, which were detected using

the degree of color change. The two backgrounds were used

together to subtract non-stationary backgrounds in a robust

manner. The first frame was used as the initial background;

thus, the backgrounds required periodic updating before

stabilization. In our experiment, the background model

stabilized after approximately 120 frames. The recall, the

precision, and the F-measure were used as performance

measures of the developed method and showed that the

developed method generally produced better results than

existing methods for most of the sequences. In our

experiments, the developed method outperformed existing

methods and shown to be applicable to real-time processing.

ACKNOWLEDGEMENTS

This research was supported by the Basic Science Research

Program of the National Research Foundation of Korea

(NRF), which is funded by the Ministry of Education,

Science and Technology (2012-0007462).

REFERENCES

1. D.-R. Jung, J. Kim, C. Nahm, H. Choi, S. Nam, and B.

Park, Electron. Mater. Lett. 7, 185 (2011).

2. C. Stauffer, W. E. L. Grimson, In Proc. IEEE Conf. on

Comp. V. and Pattern Recognit, p. 246, Santa Barbara, CA,

USA (1998).

3. P. KaewTraKulPong and R. Bowden, In Proc. 2nd

European Workshop on Adv. Video-Based Surveillance

Systems (AVBS01) (2001).

4. C. C. Chiu, M. Y. Ku, and L. W. Liang, IEEE Trans. Circ.

Syst. Vid. 20, 518 (2010).

5. V. Reddy, C. Sanderson, and B. C. Lovell, Eurasip J.

Image Vide. [DOI:10.1155/2011/164956] (2011).

6. K. Kim, T. H. Chalidabhongse, D. Harwood, and L. Davis,

Real-Time Imaging 11, 172 (2005).

7. L. Maddalena and A. Petrosino, IEEE Trans. on Image

Process 17, 1168 (2008).

8. A. Elgammal, R. Duraiswami, D. Harwood, and L. S.

Davis, Proc. of IEEE, 90, 1151 (2002).

9. C. Ianasi, V. Gui, C. Toma, and D. Pescaru, Facta Univer-

sitatis 18, 127 (2005).

10. P. D. Z. Varcheie, M. Sills-Lavoie, and G. A. Bilodeau, Sen-

sors-Basel 10, 1041 (2010).

11. M. Heikkila and M. Pietikainen, IEEE Trans. Pattern Anal.

28, 657 (2006).

12. Y. Yang, Inf. Retr. 1, 69 (1999).

13. O. Barnich and M. Van Droogenbroeck, IEEE Trans. on

Image Process 20, 1709 (2011).

Date post:	23-Dec-2016
Category:	Documents
Upload:	mignon
View:	215 times
Download:	1 times

Adaptive background model for non-static background subtraction by estimation of the color change...

Documents