ROBUST COLOR EDGE DETECTION THROUGH TENSOR VOTING
Rodrigo Moreno1, Miguel Angel Garcia2, Domenec Puig1, Carme Julia1∗
1 Rovira i Virgili University, Intelligent Robotics and Computer Vision Group,Dept. of Computer Science and Mathematics, Av. Paısos Catalans 26, 43007 Tarragona, Spain
2 Autonomous University of Madrid, Dept. of Informatics Engineering,Cra. Colmenar Viejo Km 15, 28049 Madrid, Spain
ABSTRACT
This paper presents a new method for color edge detection
based on the tensor voting framework, a robust perceptual
grouping technique used to extract salient information from
noisy data. The tensor voting framework is adapted to en-
code color information via tensors in order to propagate them
into a neighborhood through a voting process specifically de-
signed for color edge detection by taking into account percep-
tual color differences, region uniformity and edginess accord-
ing to a set of intuitive perceptual criteria. Perceptual color
differences are estimated by means of an optimized version
of the CIEDE2000 formula, while uniformity and edginess
are estimated by means of saliency maps obtained from the
tensor voting process. Experiments show that the proposed
algorithm is more robust and has a similar performance in
precision when compared with the state-of-the-art.
Index Terms— Image edge analysis, tensor voting,
CIELAB, CIEDE2000.
1. INTRODUCTION
The performance of many computer vision applications di-
rectly depends on the effectiveness of a previous edge de-
tection process. The final goal of edge detection is to find
“meaningful discontinuities” in a digital image. Although
many edge detectors have been proven effective (e.g. [1], [2]),
their performance decreases for noisy images.
This paper proposes a new edge detector that has a sim-
ilar performance to the state-of-the-art methods for noiseless
images and, in addition, a better one for noisy images. The
proposed detector is based on an adaptation to edge detection
(Section 2) of the tensor voting framework (TVF) [3]. First,
an encoding process specifically designed to encode color,
uniformity and edginess into tensors is introduced (Section
2.1). Second, a voting process specifically tailored to the edge
∗This research has been partially supported by the Spanish Ministry of
Science and Technology under project DPI2007-66556-C03-03, by the Com-
missioner for Universities and Research of the Department of Innovation,
Universities and Companies of Catalonia’s Government and by the European
Social Fund.
detection problem is also presented (Section 2.2 and 2.3). Al-
though every color channel is processed independently, possi-
ble correlations between channels are also taken into account
by the proposed method. A comparison of the proposed de-
tector with state-of-the-art methods is shown in Section 3.
2. TENSOR VOTING FRAMEWORK FOR COLOREDGE DETECTION
The input of the proposed method is the set of pixels of a color
image. Thus, positional and color information is available for
every input pixel. Positional information is used to determine
the neighborhood of every pixel, while color information is
used to define the tensors in the encoding step. The next sub-
sections describe the details of the proposed edge detector.
2.1. Encoding of Color Information
Before applying the proposed method, color is converted to
the CIELAB space. Every CIELAB channel is then normal-
ized in the range [0, π/2]. In the first step of the method, the
information of every pixel is encoded through three second or-
der 2D tensors, one for each normalized CIELAB color chan-
nel. These tensors are represented by 2×2 symmetric positive
semidefinite matrices that can be graphically represented by
2D ellipses. There are two extreme cases for the proposed
tensors: stick tensors, which are stick-shaped ellipses with a
single eigenvalue, λ1, different from zero, and ball tensors,
which are circumference-shaped ellipses whose λ1 and λ2
eigenvalues are equal to each other. Three perceptual mea-
sures are encoded in the tensors associated with every input
pixel, namely: the most likely normalized noiseless color at
the pixel (in the specific channel), a metric of local unifor-
mity (how edgeless its neighborhood is), and an estimation
of edginess (how likely finding edges or texture at the pixel’s
location is). The most likely normalized noiseless color is en-
coded by the angle α between the x axis, which represents the
lowest possible color value in the corresponding channel, and
the eigenvector corresponding to the largest eigenvalue. For
example, in channel L, a tensor with α = 0 encodes black,
whereas a tensor with α =π
2encodes white. In addition,
local uniformity and edginess are encoded by means of the
2153978-1-4244-5654-3/09/$26.00 ©2009 IEEE ICIP 2009
Fig. 1. Encoding process for channel L. Color, uniformity and
edginess are encoded by means of α and the normalized s1 = (λ1−λ2)/λ1 and s2 = λ2/λ1 saliencies respectively.
normalized s1 = (λ1 − λ2)/λ1 and s2 = λ2/λ1 saliencies
respectively. Thus, a pixel located at a completely uniform
region is represented by means of three stick tensors, one for
each color channel. In contrast, a pixel located at an ideal
edge is represented by means of three ball tensors, one for
every color channel. Figure 1 shows the graphical interpreta-
tion of a tensor for channel L.
Before applying the voting process, it is necessary to ini-
tialize the tensors associated with every pixel. The most
likely noiseless colors can be initialized with the colors of
the input pixels encoded by means of the angle α between
the x axis and the principal eigenvector, as described be-
fore. However, since metrics of uniformity and edginess
are usually unavailable at the beginning of the voting pro-
cess, normalized saliency s1 is initialized to one and nor-
malized saliency s2 is initialized to zero. These initializa-
tions allow the method to estimate more appropriate values
of the normalized saliencies for the next stages, as described
in the next subsection. Hence, the initial color information
of a pixel is encoded through three stick tensors oriented
along the directions that represent that color in the normal-
ized CIELAB channels: Tc(p) = �tc(p) �tc(p)T , where Tc(p)is the tensor of the c-th color channel (L, a and b) at pixel p,�tc(p) = [cos (Cc(p)) sin (Cc(p))]T , and Cc(p) is the nor-
malized value of the c-th color channel at p.
2.2. Voting Process
The voting process requires three measurements for every pair
of pixels p and q: the perceptual color difference, ΔEpq; the
joint uniformity measurement, Uc(p, q), used to determine if
both pixels belong to the same region; and the likelihood of a
pixel being impulse noise, ηc(p). ΔEpq is calculated through
CIEDE2000 [4], while Uc(p, q) = s1c(p) s1c(q), and ηc(p) =s2c(p) − μ ˆs2c
(p) if p is located at a local maximum and zero
otherwise, where μ ˆs2c(p) represents the mean of s2c over the
neighborhood of p.
In the second step of the method, the tensors associated
with every pixel are propagated to their neighbors through
a convolution-like process. This step is independently ap-
plied to the tensors of every channel (L, a and b). The voting
process is carried out by means of specially designed tenso-
rial functions referred to as propagation functions, which take
into account not only the information encoded in the tensors
but also the local relations between neighbors. Two prop-
agation functions are proposed for edge detection: a stickand a ball propagation function. The stick propagation func-
tion is used to propagate the most likely noiseless color of a
pixel, while the ball propagation function is used to increase
edginess where required. The application of the first func-
tion leads to stick votes, while the application of the second
function produces ball votes. Stick votes are used to elimi-
nate noise and increase the edginess where the color of the
voter and the voted pixels are different. Ball votes are used
to increase the relevance of the most important edges. The
voting process described in [3] cannot directly be applied to
edge detection, since a pixel cannot appropriately propagate
its information to its neighbors without taking into account
the local relations between that pixel and its neighbors.
A stick vote can be seen as a stick-shaped tensor, STc(p),with a strength modulated by three scalar factors. The pro-
posed stick propagation function, Sc(p, q), which allows a
pixel p to cast a stick vote to a neighboring pixel q for channel
c is given by:
Sc(p, q) = GS(p, q) ηc(p) SV ′c(p, q) STc(p), (1)
with STc(p), GS(p, q), ηc(p) and SV ′c(p, q) being defined
as follows. First, the tensor STc(p) encodes the most likely
normalized noiseless color at p. Thus, STc(p) is defined as
the tensorized eigenvector corresponding to the largest eigen-
value of the voter pixel, that is, STc(p) = �e1c(p) �e1c(p)T ,
being �e1c(p) the eigenvector with the largest eigenvalue of
the tensor associated with channel c at p. Second, the three
scalar factors in (1), each ranging between zero and one, are
defined as follows. The first factor, GS(p, q), models the in-
fluence of the distance between p and q in the vote strength.
Thus, GS(p, q) = Gσs(||p − q||), where Gσs(·) is a de-
caying Gaussian function with zero mean and a user-defined
standard deviation σs. The second factor, ηc(p) defined as
ηc(p) = 1 − ηc(p), is introduced in order to prevent a pixel
p previously classified as impulse noise from propagating its
information. The third factor, SV ′c, takes into account the in-
fluence of the perceptual color difference, the uniformity and
the noisiness of the voted pixel. This factor is given by:
SV ′c(p, q) = ηc(q) SV c(p, q) + ηc(q), (2)
where: SV c(p, q) = [Gσd(ΔEpq)+Uc(p, q)]/2, and ηc(q) =
1 − ηc(q). SV c(p, q) allows a pixel p to cast a stronger stickvote to q either if both pixels belong to the same uniform re-
gion, or if the perceptual color difference between them is
small. That behavior is achieved by means of the factors
Uc(p, q) and the decaying Gaussian function on ΔEpq with
a user-defined standard deviation σd. A normalizing factor of
two is used in order to make SV c(p, q) to vary from zero to
2154
one. The term ηc(q) in (2) makes noisy voted pixels, q, to
adopt the color of their voting neighbors, p, disregarding lo-
cal uniformity measurements and perceptual color differences
between p and q. The term ηc(q) in (2) makes SV ′c to vary
from zero to one. The effect of ηc(q) and ηc(q) on the strength
of the stick vote received at a noiseless pixel q is null.
In turn, a ball vote can be seen as a ball-shaped ten-
sor, BT(p), with a strength controlled by the scalar factors
GS(p, q), ηc(p) and BV c(p, q), each varying between zero
and one. The ball propagation function, Bc(p, q), which al-
lows a pixel p to cast a ball vote to a neighboring pixel q for
channel c is given by:
Bc(p, q) = GS(p, q) ηc(p) BV c(p, q) BT(p), (3)
with BT(p), GS(p, q), ηc(p) and BV c(p, q) being defined as
follows. First, the ball tensor, represented by the identity ma-
trix, I, is the only possible tensor for BT(p), since it is the
only tensor that complies with the two main design restric-
tions: a ball vote must be equivalent to casting stick votes for
all possible colors using the hypothesis that all of them are
equally likely and, the normalized s1 saliency must be zero
when only ball votes are received at a pixel. Second, GS(p, q)and ηc(p) are the same as the factors introduced in (1) for the
stick propagation function. They are included for similar rea-
sons to those given in the definition of the stick propagation
function. Finally, the scalar factor BV c(p, q) is given by:
BV c(p, q) =Gσd
(ΔEpq) + Uc(p, q) + Gσd(ΔEc
pq)3
, (4)
where Gσd(·) = 1 − Gσd
(·) and Uc(p, q) = 1 − Uc(p, q).BV c(p, q) models the fact that a pixel p must reinforce the
edginess at the voted pixel q either if there is a big percep-
tual color difference between p and q, or if p and q are not
in a uniform region. This behavior is modeled by means of
Gσd(ΔEpq) and Uc(p, q). The additional term Gσd
(ΔEcpq) is
introduced in order to increase the edginess of pixels in which
the only noisy channel is c, where ΔEcpq denotes the per-
ceptual color difference only measured in the specific color
channel c. The normalizing factor of three in (4) allows the
ball propagation function to cast ball votes with a strength
between zero and one.
The proposed voting process at every pixel is carried out
by adding all the tensors propagated towards it from its neigh-
bors by applying the above propagation functions. Thus,
the total vote received at a pixel q for each color channel
c, TVc(q), is given by: TVc(q) =∑
p∈neigh(q) Sc(p, q) +Bc(p, q). The voting process is applied twice. The first ap-
plication is used to obtain an initial estimation of the normal-
ized s1 and s2 saliencies, as they are necessary to calculate
Uc(p, q) and ηc(p). For this first estimation, only perceptual
color differences and spatial distances are taken into account.
At the second application, the tensors at every pixel are ini-
tialized with the tensors obtained after the first application.
After this initialization, (1) and (3) can be applied in their full
definition, since all necessary data are available.
After applying the voting process described above, it is
necessary to obtain eigenvectors and eigenvalues of TVL(p),TVa(p) and TVb(p) at every pixel p in order to analyze its
local perceptual information. The voting results can be in-
terpreted as follows: uniformity increases with the normal-
ized s1 saliency and edginess increases as the normalized
s2 saliency becomes greater than the normalized s1 saliency.
Hence, the map of normalized s2 saliencies can be used di-
rectly as an edginess map. Standard post-processing steps
such as non-maximum suppression, hysteresis or thresholding
can then be applied to the normalized s2 saliency map in or-
der to obtain binary edge maps. The results can be improved
by reducing the noise in the image. This denoising step can be
achieved by replacing the pixel’s color by the most likely nor-
malized noiseless color encoded in its tensors. The method
can then be applied to the denoised images iteratively, which
improves the final performance of the edge detector.
2.3. Parameters of the CIEDE2000 formula
The CIEDE2000 formula [4], which estimates the percep-
tual color difference between two pixels p and q, ΔEpq, has
three parameters, kL, kC and kH , to weight the differences in
CIELAB luminance, chroma and hue respectively. They can
be adjusted to make the CIEDE2000 formula more suitable
for every specific application by taking into account factors
such as noise or background luminance, since those factors
were not explicitly taken into account in the definition of the
formula. These parameters must be greater than or equal to
one. Based on the formulation given in [5], the following
equations for these parameters are proposed:
kL = FBLFηL
, kC = FBCFηC
, kH = FBhFηh
, (5)
where FBm are factors that take into account the influence of
the background color on the calculation of color differences
for the color component m (L, C and h) and Fηm are factors
that take into account the influence of noise on the calculation
of color differences in component m. On the one hand, big
color differences in chromatic channels become less percep-
tually visible as background luminance decreases. Thus, the
influence of the background on the CIEDE2000 formula can
be modeled by FBL= 1 and FBC
= FBh= 1 + 3 (1 − YB),
where YB is the mean background luminance. On the other
hand, big color differences become less perceptually visible
as noise increases. The influence of noise on CIEDE2000 can
be modeled by means of Fηm= MAD(I)m − MAD(G)m,
where I is the image, G is a Gaussian blurred version of I and
MAD(·)m is the median absolute difference (MAD) calcu-
lated on component m. Fηmis set to 1 in noiseless regions.
3. RESULTSFifteen outdoor images from the Berkeley segmentation data
set [6] and their corresponding ground truths have been used
2155
LGC Compass TVED LGC Compass TVED
PSNR (dB)
FOMO , FOMN
16.74
0.45, 0.38
16.05
0.45, 0.43
21.13
0.45, 0.43
PSNR (dB)
FOMO , FOMN
16.93
0.45, 0.43
20.20
0.44, 0.40
22.28
0.46, 0.44
Fig. 2. First row: original image and the edginess maps generated by the LGC, Compass and TVED methods respectively for two different
images. Second row: noisy version of the same images and their corresponding edginess maps (LGC, Compass and TVED). PSNR and FOM
for the original (FOMO) and the noisy image (FOMN ) are indicated below the images.
in the experiments. The methods proposed by Maire et al. [2],
referred to as the LGC method, and by Ruzon and Tomasi
[1], referred to as the Compass method, have been used in
the comparisons, since they are representative of the state-of-
the-art in edge detection. The default parameters of the LGC
method have been used. The Compass algorithm has been ap-
plied with σ = 2, since the best overall performance of this
algorithm has been attained with this standard deviation. Five
iterations of the proposed method, referred to as TVED, have
been run with parameters σs = 1.3 and σd = 2.5. Gaus-
sian noise with a standard deviation of 30 has been added to
the images for the robustness analysis in order to simulate
very noisy scenarios. Performance has been evaluated by us-
ing two metrics: the Pratt’s Figure of Merit (FOM) [7] in or-
der to measure precision, and the Peak Signal to Noise Ratio
(PSNR) in order to measure robustness by comparing differ-
ences between two edginess maps: those generated for both
the noiseless and the noisy version of the same image.
Figure 2 shows the edginess maps detected for two of
the tested images1. It can be seen that LGC generates fewer
edges than the others but misses some important edges and
their strength is reduced for the noisy images. The Compass
operator generates too many edges and the number of edges
increases with noise. TVED has a better behavior, since it
only detects the most important edges and is less influenced
by noise. The PSNR confirms that TVED is the most robust
detector, whereas the FOM indicates that the three methods
have a similar performance in precision, with TVED being
slightly better.
1All the images are available at http://deim.urv.cat/˜rivi/tved.html
4. CONCLUDING REMARKSA new method for edge detection based on an adaptation
of the TVF has been proposed. An optimized version of
CIEDE2000 has been used to measure perceptual color differ-
ences in non-controlled environments by modifying its orig-
inal parameters. Experimental results show that the use of
a specific voting process makes the TVF a powerful tool for
edge detection. PSNR and FOM have been used to compare
the performance of the TVED against two of the most rep-
resentative state-of-the-art edge detectors. TVED has been
found to be more robust and slightly more precise than the
other algorithms.
5. REFERENCES
[1] M. Ruzon and C. Tomasi, “Edge, junction, and corner detection
using color distributions,” IEEE Trans. PAMI, vol. 23, no. 11,
pp. 1281–1295, 2001.
[2] M. Maire, P. Arbelaez, C. Fowlkes, and J. Malik, “Using con-
tours to detect and localize junctions in natural images,” in Proc.CVPR, 2008, pp. 1–8.
[3] G. Medioni, M. S. Lee, and C. K. Tang, A ComputationalFramework for Feature Extraction and Segmentation, Elsevier
Science, 2000.
[4] M. R. Luo, G. Cui, and B. Rigg, “The development of the CIE
2000 colour-difference formula: CIEDE2000,” Color Res. andApplication, vol. 26, no. 5, pp. 340–350, 2001.
[5] C-H Chou and K-C Liu, “A fidelity metric for assessing visual
quality of color images,” in Proc. ICCCN, 2007, pp. 1154–1159.
[6] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of hu-
man segmented natural images and its application to evaluating
segmentation algorithms and measuring ecological statistics,” in
Proc. ICCV, 2001, pp. II:416–423.
[7] W. K. Pratt, Digital Image Processing: PIKS Scientific Inside,
Wiley-Interscience, fourth edition, 2007.
2156