Electron. Mater. Lett., Vol. 9(S): pp. 33-38 (2013)
Adaptive Background Model for Non-Static Background Subtraction by
Estimation of the Color Change Ratio
Jeisung Lee,1 Minkyu Cheon,
1 Chang-Ho Hyun,
2 Hyukmin Eum,
1 and Mignon Park
1,*
1School of Electrical & Electronic ENG, Yonsei University, Seoul 120-749, Korea
2Division of Electrical Electronic and Control Engineering, Kongju National University, Cheonan, Chungnam 331-717, Korea
(received date: 15 April 2013 / accepted date: 16 July 2013 / published date: 10 October 2013)
Background modeling, a preliminary processing step for foreground detection, is a challenging task becauseof the complexity and variety of background regions and unexpected scenarios such as sudden illuminationchanges, waving trees, rippling water, etc. In this work, we develop a pixel-based background modeling methodthat uses a probabilistic approach by means of changing color sequences. This method uses two backgroundmodels in tandem. The first model uses a static background, which is obtained via a probabilistic approachand is a standard from which the foreground is extracted. The second method uses an adaptive background,which is modeled by the degree of color change. This background functions as an additional standard fromwhich the foreground is extracted and is appropriate for eliminating non-static background elements. Thesemodels enable the developed method to automatically adapt to various environments. The algorithm wastested on various video sequences and its performance was evaluated by comparison with other state-of-the-art background subtraction methods.
Keywords: background modeling, foreground extraction, adaptive background, video surveillance
1. INTRODUCTION
Background subtraction is a key task in intelligent video
surveillance systems. Background subtraction methods play
an important role in many machine vision applications, such
as object detection and video surveillance systems (e.g.,
abnormal behavior detection or traffic flow monitoring).
Typically, each pixel is searched and compared systematically
with a predefined object dataset to detect or track an object.
However, searching every pixel requires considerable
computational time; consequently, background subtraction
methods are often used to reduce the region searched,
thereby improving computational performance. The use of
background subtraction methods to separate foreground
objects significantly affects the performance of the overall
vision system; therefore, a good background subtraction
method is critical for efficient overall system performance.
However, many challenges in background modeling remain
to be overcome. Backgrounds are generally non-static, often
including motion from swaying trees and curtains, escalators,
and rippling water surfaces. Illumination changes are also
frequent, either because of the passage of time or by the sun
being covered by clouds. The background model should be
able to adapt to such environmental changes. The background
can also be changed by objects moving into and staying in
the background, e.g., a car that has been parked for a long
period of time should be considered as part of the
background.
Pattern analysis is widely used in various field.[1] Most of
the background subtraction methods also based on pattern
analysis. A popular method for non-stationary background
subtraction is the mixture of Gaussians (MOG) method.[2,3]
MOG is used to resolve non-static background problems.
This approach is mathematically sound and can be applied to
various environments. However, because the number of
Gaussian models in each pixel must be selected before
implementation, a fixed number of Gaussian models cannot
always adapt to different environments. The process of
modeling and updating the Gaussians models also takes a
long time. Chiu et al.[4] developed a probabilistic approach
for background subtraction to extract the foreground for
each image environment using the color distribution. This
algorithm is very fast and robust. However, the algorithm
does not consider dynamic background environments and
thus is only effective for static background environments. V.
Reddy effectively used a Markov random field for background
estimation;[5] however, this method is more appropriate for
indoor environments. Kim et al.[6] developed a codebook
model that quantizes sample background values for each
pixel into codebooks that represent a compressed form of the
background model. The codewords that are absent from the
DOI: 10.1007/s13391-013-3172-5
*Corresponding author: [email protected]©KIM and Springer
34 J. Lee et al.
Electron. Mater. Lett. Vol. 9(S) (2013)
sequence for extended periods of time are eliminated from
the codebook model, and new images that have been
stationary for some time are quantized into the codebooks.
This algorithm is not especially fast but is very effective for
dynamic backgrounds. Maddalena et al.[7] developed an
approach based on a self-organizing feature map that has
been widely applied in human image processing and
implemented more generally in cognitive science. The
algorithm perform well and at faster speeds than the
codebook scheme, but many of its parameters must be
manually selected for each individual video environment;
Elgammal et al.[8] and Lanasi[9] developed non-parametric
methods to address this drawback. The kernel density
estimation (KDE) method is also a very effective method for
modeling backgrounds. This approach effectively adapts to
environments automatically; but places a heavy memory
burden on the system. Varcheie et al.[10] combined a region-
based method based on color histograms and texture
information with the Gaussian mixture model. This approach
performs well but is excessively complex. Heikkila et al.[11]
used an adaptive local binary pattern (LBP) to extract
features from an image by comparing neighboring pixel
values to a center pixel. It is difficult to distinguish between
areas with similar textures with this method, and the resulting
segmentation is also limited to a circular area around the
center pixel.
A non-stationary background subtraction method is
developed in this paper. In this method, the degree of color
change occurs in HSV color space rather than in RGB color
space because RGB color space is sensitive to illumination
changes, and more robust background subtraction results are
obtained by analyzing the color and illumination in the input
sequences. This methods uses two background models. The
first model uses a static background, which is obtained via a
probabilistic approach and is uses as a standard to extract the
foreground. The other model uses an adaptive background
that is continuously updated with the current input image if
the degree of color change is below the set thresholds. This
background works as an additional standard to extract the
foreground. The adaptive background model enables suddenly
changing backgrounds to be effectively subtracted. The two
backgrounds are used together to subtract non-stationary
backgrounds in a robust manner. We evaluate the performance
of the developed method by comparison with existing state-
of-the-art methods. We also show that the developed algorithm
is less complex and produces more accurate results than the
other methods.
2. BACKGROUND MODELING AND FORE-
GROUND EXTRACTION METHOD
2.1 Static Background Model
The developed method uses a distance classifier to classify
the input image of each pixel over Th_t frames. The
stochastically high components of each pixel are then
classified as static background pixels. Let us assume that ft(x,
y, k) is the tth frame in an image sequence, where x and y are
the pixel coordinates and k is the color dimension. Each
pixel in ft(x, y, k) is composed of three components: the hue
(ft(x, y, 1)), the saturation (ft(x, y, 2)), and the illumination
(ft(x, y, 3)). Following pixel classification, the parameters
H(x, y, nxy), S(x, y, nxy) and V(x, y, nxy) are defined as the hue,
the saturation, and the illumination, respectively, of the nxyth
cluster located at (x, y). The parameter nxy records the cluster
number located at (x, y). The parameters C(x, y, nxy) and T(x,
t, nxy) are the pixel number and the most recent time that a
pixel appeared, respectively, at the nxyth cluster located at (x,
y). The first input frame f1(x, y, k) is captured as follows:
(1)
Each input frame size is M × N pixels. The next input
frame ft(x, y, k) was captured and classified to the cluster with
a distance classifier, where t = t + 1. The color difference
Dt(x, y, nxy) between the input pixel and each cluster located
at (x, y) was calculated as follows:
(2)
Next, the closest cluster number in order is found.
. (3)
The value of Th_D affects the number of clusters during
the background modeling. The number of clusters and the
computational complexity decrease with increasing values
of Th_D but also greatly distort the colors of the background
pixels. In this paper, we set Th_D = 15.
To increase the speed of the algorithm, we relocated the
most recently updated cluster. If a matched mth cluster
exists, the input pixel belongs to the mth cluster, and the
parameters C(x, y, m) and T(x, y, m) are updated as follows:
(4)
Otherwise, the input pixel is assigned to a new cluster
located at (x, y). The new color cluster is defined by:
H x y nxy, ,( ) f1 x y 1, ,( )=
S x y nxy, ,( ) f1 x y 2, ,( )=
V x y nxy, ,( ) f1 x y 3, ,( )=
C x y nxy, ,( ) 1=
T x y nxy, ,( ) 1.=
, where
y 0~M−1=
y 0~N−1=
nxy 1=⎩⎪⎨⎪⎧
⎩⎪⎪⎪⎨⎪⎪⎪⎧
Dt x y nxy, ,( ) ft x y 1, ,( )−Ht x y nxy, ,( ) +=
ft x y 2, ,( )−St x y nxy, ,( ) +
ft x y 3, ,( )−Vt x y nxy, ,( ) , nxy∀
Dt x y m, ,( ) Th_D, where m nxy⊂≤
C x y m, ,( ) C x y m, ,( ) 1+=
T x y m, ,( ) t.=⎩⎨⎧
J. Lee et al. 35
Electron. Mater. Lett. Vol. 9(S) (2013)
(5)
In every frame, the clusters that are not matched with input
pixels for longer than Th_t/2 are assumed to be non-
background information and are deleted. The static background
is updated after a time period Th_t. This study sets Th_t =
120. A static background model can be formulated by
finding the cluster with the maximum number of pixels after
Th_t frames:
. (6)
The cluster with the maximum number of pixels has a high
probability of being background. Thus, the static background
can be obtained as follows:
(7)
The static background model is denoted by SBack(x, y, k),
which consists of the hue (SBack(x, y, 1)), the saturation
(SBack(x, y, 2)), and the illumination (SBack(x, y, 3)). The
static background model is updated every Th_t frames in the
same way.
2.2 Adaptive Background Model
The developed algorithm uses the average change in the
color sequence to extract the foreground. First, the adaptive
background model is determined by calculating; the degree
of color change fdistt(x, y, k) from the color difference
between the current and prior images. If the value of color
difference continuously increases or decreases, then the total
difference value becomes the sum of the previous and
current color change values.
(8)
, where k = 1, 2, 3.
We use the color difference to obtain the average color
change, AvrDt(x, y, k):
(9)
where α is the learning rate, which is set to 0.01 in this study.
The parameter AvrDt(x, y, k) is initialized by the first color
difference fdist1 (x, y, k) and is continuously updated by the
process above. The adaptive background ABack(x, y, k) is
initialized by the first frame, f1(x, y, k), in the video sequence
and is updated after extracting the foreground from every
frame. The parameter AvrDt(x, y, k) is used to update the
adaptive background model, as detailed in section 2.3.
2.3 Foreground Extraction and Background Updates
The static background and the adaptive background are
initialized using the initial input frame. After obtaining the
initial background models, the background is updated using
the background modeling processes. The foreground extraction
procedure consists of two steps. First, the developed method
is used to subtract the input video sequence ft(x, y, k) from the
static background SBack(x, y, k). If the difference is larger
than Th_D for each pixel, the input pixel is classified as
temporary foreground.
(10)
The value of Fg(x, y) is used to classify whether a pixel of
the input frame at (x, y) is a foreground pixel. If Fg(x, y) is 0,
then the pixel is classified as background. If the input values
are classified as background, then the adaptive background
is updated with the current input values. If Fg(x, y) is 1, then
the pixel of the input frame at (x, y) is classified as
foreground. In the developed method, the input pixel is
compared with ABack(x, y, k) to obtain a more robust result.
(11)
(12)
The threshold Th_R is obtained experimentally. If the
input values are classified as background, then the adaptive
background is updated with the current input values. Figure
1 shows an example of an input sequence and the result of
nxy nxy 1 and
H x y nxy, ,( ) ft x y 1, ,( )=
S x y nxy, ,( ) ft x y 2, ,( )=
V x y nxy, ,( ) ft x y 3, ,( )=
C x y nxy, ,( ) 1=
T x y nxy, ,( ) 1.=⎩⎪⎪⎪⎨⎪⎪⎪⎧
+=
C x y mc, ,( ) maxnxy
∀
= C x y nxy, ,( ), where mc nxy⊂
SBack x y 1, ,( ) H x y mc, ,( )=
SBack x y 2, ,( ) S x y mc, ,( )=
SBack x y 3, ,( ) V x y mc, ,( ).=⎩⎪⎨⎪⎧
ftmp ft x y k, ,( )− ft−1 x y k, ,( )=
if fdistt−1 x y k, ,( )> 0 and ftmp> 0( )
or fdistt−1 x y k, ,( )< 0 and ftmp< 0( ) then
fdistt x y k, ,( ) ftmp fdistt−1 x y k, ,( )+=
else
fdistt x y k, ,( ) ftmp=⎩⎪⎪⎪⎨⎪⎪⎪⎧
AvrDt x y k, ,( ) 1−α( ) AvrDt−1 x y k, ,( )×=
α fdistt x y k, ,( )× k∀+
if ft x y k, ,( )−SBack x y k, ,( ) >Th_Dk 1=
3
∑⎝ ⎠⎜ ⎟⎛ ⎞
then, Fg x y,( ) 1=
else, Fg x y,( ) 0=
SBack x y k, ,( ) ft x y k, ,( ) k.∀=
Sgft x y k, ,( )−ABack x y k, ,( )
AvrDt x y k, ,( )------------------------------------------------------------
k 1=
3
∑=
if Sg <Th_g( )& Fg( x y,( ) 1)=
then, Fg x y,( ) 0=
ABack x y k, ,( ) ft x y k, ,( ) k.∀=
36 J. Lee et al.
Electron. Mater. Lett. Vol. 9(S) (2013)
using an adaptive background model. In the Classification
Result, the blue line represents Sg and the black line
represents Th_R, a threshold for extracting the foreground.
Figure 2 illustrates how the adaptive background is used to
extract foreground. In Fig. 2(a), the saturation and the
illumination change but the hue remains relatively flat; in
this case, the foreground can be extracted by determining the
change in the saturation and the illumination. Figure 2(b)
shows a scenario in which the hue changes but the
illumination and the saturation remain relatively unchanged;
the foreground is determined in this case. These processes
are automatically executed using the developed method.
Figure 3 shows the performance of the developed method
for various Th_R settings. We used three performance
measures to evaluate the developed method: the recall, the
precision, and the F-measure. The recall is defined as the
ratio of the number of pixels designated as foreground to the
number of true foreground pixels; this measure shows the
rate at which the algorithm classifies the foreground pixels
as foreground pixels. The precision is defined as the ratio of
the number of true foreground pixels to the number of pixels
designated as foreground; this measure indicates how many
of the pixels classified as foreground pixels are true
foreground pixels. There is usually a trade-off between the
recall and the precision; optimizing the recall usually
deteriorates the precision and vice versa. We used the F1
measure[12] to exactly evaluate the performance by con-
sidering the precision and recall results simultaneously,
because of the trade-off between the precision and the recall.
The F1 measure is expressed as:
. (13)
The F1 measure is maximized at equally high values of
the recall and precision values. Figure 3 shows that when
Th_R is approximately 4.5, the recall and precision value are
F1 r p,( ) 2pr
p r+----------=
Fig. 2. A sequence illustrating the use of adaptive background forextracting foreground: (a) the saturation and illumination change butthe hue remains relatively flat, and (b) the hue changes, but the illu-mination and saturation remain relatively unchanged.
Fig. 3 Performance of the developed method for various Th_R set-tings.
Fig. 1. An example of an input sequence and a result from using theadaptive background model; the blue line represents the hue, thegreen line represents the saturation, and the red line represents thesequence illumination. In the Classification Result, the blue line rep-resents the sum of the ratio of the difference between the input frameand the adaptive background to the average color change, and Th_R isa threshold for foreground extraction.
J. Lee et al. 37
Electron. Mater. Lett. Vol. 9(S) (2013)
equally high and F1 is maximized. We set Th_R to 4.5 in this
study.
Even if comparing the input sequence with the static
background for a pixel results in the pixel being classified as
foreground, the pixel can be reclassified as background by
the aforementioned process that compares the pixel with the
adaptive background. The backgrounds are updated after
extracting the foreground. If comparing an input pixel and a
static background at (x, y) classifies the pixel is foreground,
the input pixel is created as a new cluster or increases the
count of the cluster that the input pixel belongs to.
The background pixels are updated as a cluster with the
maximum number of counts for every Th_t. Each adaptive
background pixel is updated as the current input pixel if the
pixel is classified as background. Otherwise, the adaptive
background remains unchanged.
3. EXPERIMENTAL RESULTS
The performance of the developed method was evaluated
using video sequences from the Li and Wallflower datasets
(the Li dataset is available at http://perception.i2r.a-star.
edu.sg/bk_model/bk_index.html, and the Wallflower dataset
is available at http://research.microsoft.com/en-us/um/people/
jckrumm/WallFlower/TestImages.htm). These two datasets
are well- known and are often used to test background
subtraction algorithms. We compared the results obtained
using the developed method with other state-of-the-art
methods to evaluate the performance of the developed
method. Figure 4 shows the recall, precision and F1 results.
In terms of the recall results, the proposed method outperformed
the other methods, indicating that the proposed method more
accurately extracted the foreground if the pixel was true
foreground. In terms of the precision results, the performance
of the developed method was not clearly superior to the other
methods and exhibited a relatively average performance.
However, the developed method outperformed the other
methods in terms of the F1 measure. The recall and the
precision measure are important performance indicators, but
the F1 measure is the most important measure because it
incorporates both the precision and recall results and
considers their relative strengths simultaneously.
To compare the developed method with existing state-of-
the-art methods, we tested the other methods using the
parameters presented in the respective studies or we found
Fig. 4. The recall, precision and F1 results obtained using the devel-oped method and other state-of-the-art methods; the AVG columnsrepresent the average values of the results in all of the datasets.
Fig. 5. Background subtraction results: the first column shows theoriginal images in each video sequence; the second column showsthe ground truth of the original images; the third column shows theresult using the developed method; and the fourth column onwardsshow the images resulting from use of other methods.
38 J. Lee et al.
Electron. Mater. Lett. Vol. 9(S) (2013)
appropriate parameters by repeating the test to obtain the
best results. If default parameters were presented in study or
if the algorithms were available on the internet, we used the
default parameters in the test. The Gaussian mixture model
was used and implemented by OpenCV in default mode.
ViBe software[13] was also used to test the ViBe algorithm in
the default mode. Barnich et al.[13] also showed that the best
results for the ViBe method were generally obtained using
the default mode. The codebook algorithm was tested using
a program found on the internet.[6] We repeated the test many
times and varied the parameters to obtain the best results.
Figure 5 shows the results of the developed method and the
existing state-of-the-art methods. We thus confirmed that the
developed method yielded better results than the other
methods shown in the figure. The performance of the
methods was compared by implementing the developed
method using C programming language on a 2.53 GHz CPU
with 2 GB of RAM. The classification time of the developed
method was approximately 93.56 frames per second with
160 × 128 pixel size. This result shows that the developed
method can be used for real-time processing.
4. CONCLUSIONS
We developed a new robust background subtraction
method in this study. This method used two background
models, a static background model and an adaptive background
model. The temporary foreground was first extracted from
the static background. The adaptive background was then
used to delete the dynamic background components, such as
waving trees and rippling waters, which were detected using
the degree of color change. The two backgrounds were used
together to subtract non-stationary backgrounds in a robust
manner. The first frame was used as the initial background;
thus, the backgrounds required periodic updating before
stabilization. In our experiment, the background model
stabilized after approximately 120 frames. The recall, the
precision, and the F-measure were used as performance
measures of the developed method and showed that the
developed method generally produced better results than
existing methods for most of the sequences. In our
experiments, the developed method outperformed existing
methods and shown to be applicable to real-time processing.
ACKNOWLEDGEMENTS
This research was supported by the Basic Science Research
Program of the National Research Foundation of Korea
(NRF), which is funded by the Ministry of Education,
Science and Technology (2012-0007462).
REFERENCES
1. D.-R. Jung, J. Kim, C. Nahm, H. Choi, S. Nam, and B.
Park, Electron. Mater. Lett. 7, 185 (2011).
2. C. Stauffer, W. E. L. Grimson, In Proc. IEEE Conf. on
Comp. V. and Pattern Recognit, p. 246, Santa Barbara, CA,
USA (1998).
3. P. KaewTraKulPong and R. Bowden, In Proc. 2nd
European Workshop on Adv. Video-Based Surveillance
Systems (AVBS01) (2001).
4. C. C. Chiu, M. Y. Ku, and L. W. Liang, IEEE Trans. Circ.
Syst. Vid. 20, 518 (2010).
5. V. Reddy, C. Sanderson, and B. C. Lovell, Eurasip J.
Image Vide. [DOI:10.1155/2011/164956] (2011).
6. K. Kim, T. H. Chalidabhongse, D. Harwood, and L. Davis,
Real-Time Imaging 11, 172 (2005).
7. L. Maddalena and A. Petrosino, IEEE Trans. on Image
Process 17, 1168 (2008).
8. A. Elgammal, R. Duraiswami, D. Harwood, and L. S.
Davis, Proc. of IEEE, 90, 1151 (2002).
9. C. Ianasi, V. Gui, C. Toma, and D. Pescaru, Facta Univer-
sitatis 18, 127 (2005).
10. P. D. Z. Varcheie, M. Sills-Lavoie, and G. A. Bilodeau, Sen-
sors-Basel 10, 1041 (2010).
11. M. Heikkila and M. Pietikainen, IEEE Trans. Pattern Anal.
28, 657 (2006).
12. Y. Yang, Inf. Retr. 1, 69 (1999).
13. O. Barnich and M. Van Droogenbroeck, IEEE Trans. on
Image Process 20, 1709 (2011).