+ All Categories
Home > Documents > Visual Comparison of Images Using Multiple Kernel …Visual Comparison of Images Using Multiple...

Visual Comparison of Images Using Multiple Kernel …Visual Comparison of Images Using Multiple...

Date post: 21-May-2020
Category:
Upload: others
View: 21 times
Download: 0 times
Share this document with a friend
1
Visual Comparison of Images Using Multiple Kernel Learning for Ranking Amr Sharaf 1 [email protected] Mohamed E. Hussein 2, 1 [email protected] Mohamed A. Ismail 1 [email protected] 1 Alexandria University El-Shatby, Alexandria 21526, Egypt 2 Egypt-Japan University of Science and Technology (E-JUST) New Borg Al-Arab, Alexandria 21934, Egypt Abstract Ranking is the central problem for many applications such as web search, recommendation systems, and visual comparison of images. In this pa- per, the multiple kernel learning framework is generalized for the learn- ing to rank problem. This approach extends the existing learning to rank algorithms by considering multiple kernel learning and consequently im- proves their effectiveness. The proposed approach provides the conve- nience of fusing different features for describing the underlying data. As an application to our approach, the problem of visual image comparison is studied. Several visual features are used for describing the images and multiple kernel learning is adopted to find an optimal feature fusion. Ex- perimental results on three challenging datasets show that our approach outperforms the state-of-the art and is significantly more efficient in run- time. Proposed Approach (RankMKL) Given two images, it is required to learn which image exhibits a particular visual attribute more than the other. Our approach works on a per attribute basis, thus a separate model is learned for each visual attribute. Figure 1 demonstrates the outline of our approach. The first step is to extract a set of features from each image. Several feature sets are selected to capture different visual cues in the image. To capture the image texture, we extract Local Binary Patterns (LBP) [3] and compute the response from a set of Gabor filters. For capturing the shape and appearance of the images, GIST [4] and HoG [1] descriptors are used. Finally, a color histogram is computed in the LAB color space to capture the color information. The second step is to fuse the different feature sets and learn the rank- ing model. For this task, a separate kernel function is computed for each set of features (i.e. we compute five different kernels). The computed ker- nels are considered as base kernels for our multiple kernel learning mod- ule. Using the multiple kernel learning algorithm, we learn the optimal weights for creating a linear combination from the base kernels together with the optimal parameters for the ranking model. Instead of using a single kernel matrix (K) for learning the ranking model, an optimal combination from several base kernels is learned, and the combination of the base kernels matrix (K d ) is used for training the ranking model, where k d (x i , x j )= φ (x i ) T d φ (x j ) d represents the dot prod- uct in feature space φ and is parametrized by d such that: k d (x i , x j )= f d ({k i (x i , x j )} t i=1 ), (1) where t is the number of base kernels, d R t is the optimal kernel weights to be learned, and the combination function f d can be a linear or a non- linear function for combining the base kernels . Our goal is to learn the optimal values for (d) together with the optimal values for the Lagrange multipliers (α ) representing the learned ranking model. Accordingly, the standard rankSVM [2] objective function is updated as follows: maximize α {1 T α - 1 2 α T Q d α + r(d)} subject to 0 α i, j C, (i, j) P , d 0, (2) Q d,(i, j),(u,v) = k d (x i , x u )+ k d (x j , x v ) - k d (x i , x v ) - k d (x j , x u ), (3) where both the regularizer r and the kernel k d can be any general differ- entiable functions of d with continuous derivatives and P represents the set of preference paris such that: P = {(i, j)|x i x j }. In our approach, GIST LBP HoG Feature Extraction Which image has a stronger ‘smiling’ attribute? Color Gabor 15 30 base kernel 1 base kernel 2 base kernel 3 base kernel 4 base kernel 5 Multiple Kernel Learning RankMKL Figure 1: Illustration of our proposed approach. Given two images, it is required to detect which image has a stronger visual attribute than the other. Different features are extracted and Multiple Kernel Learning (MKL) is used for fusing the kernels from each feature set. RankMKL is used for ranking the images. five base kernels are used, one for each of the five feature sets (LBP, HoG, Gabor, GIST, and Color). The kernel function k d is selected as a linear combination from the five base kernels: k d (x u , x v )= 5 i=1 d i k i (x u , x v ) and L2 regularization function is used for r(d). Gradient descent is used for solving Eq. (2) using the same algorithm in [5]. Conclusion In this paper, the standard multiple kernel learning formulation is ex- tended to the learning to rank problem. Effectiveness of the proposed approach is demonstrated on the visual image comparison task. Although MKL has been extensively used for object recognition and image cate- gorization, this is the first time it has been used for image comparison. Through extensive experiments, the advantage of our approach is clearly demonstrated both in terms of accuracy and runtime efficiency. Future work includes exploring more applications of multiple kernel learning for ranking, such as web search and recommendation systems. [1] Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 886–893. IEEE, 2005. [2] Thorsten Joachims. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international con- ference on Knowledge discovery and data mining, pages 133–142. ACM, 2002. [3] Timo Ojala, Matti Pietikainen, and Topi Maenpaa. Multiresolu- tion gray-scale and rotation invariant texture classification with lo- cal binary patterns. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(7):971–987, 2002. [4] Aude Oliva and Antonio Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. International jour- nal of computer vision, 42(3):145–175, 2001. [5] Manik Varma and Bodla Rakesh Babu. More generality in efficient multiple kernel learning. In Proceedings of the 26th Annual Inter- national Conference on Machine Learning, pages 1065–1072. ACM, 2009.
Transcript

Visual Comparison of Images Using Multiple Kernel Learning for Ranking

Amr Sharaf1

[email protected]

Mohamed E. Hussein2,1

[email protected]

Mohamed A. Ismail1

[email protected]

1 Alexandria UniversityEl-Shatby, Alexandria 21526, Egypt

2 Egypt-Japan University of Science and Technology (E-JUST)New Borg Al-Arab, Alexandria 21934, Egypt

Abstract

Ranking is the central problem for many applications such as web search,recommendation systems, and visual comparison of images. In this pa-per, the multiple kernel learning framework is generalized for the learn-ing to rank problem. This approach extends the existing learning to rankalgorithms by considering multiple kernel learning and consequently im-proves their effectiveness. The proposed approach provides the conve-nience of fusing different features for describing the underlying data. Asan application to our approach, the problem of visual image comparisonis studied. Several visual features are used for describing the images andmultiple kernel learning is adopted to find an optimal feature fusion. Ex-perimental results on three challenging datasets show that our approachoutperforms the state-of-the art and is significantly more efficient in run-time.

Proposed Approach (RankMKL)

Given two images, it is required to learn which image exhibits a particularvisual attribute more than the other. Our approach works on a per attributebasis, thus a separate model is learned for each visual attribute. Figure 1demonstrates the outline of our approach. The first step is to extract a setof features from each image. Several feature sets are selected to capturedifferent visual cues in the image. To capture the image texture, we extractLocal Binary Patterns (LBP) [3] and compute the response from a setof Gabor filters. For capturing the shape and appearance of the images,GIST [4] and HoG [1] descriptors are used. Finally, a color histogram iscomputed in the LAB color space to capture the color information.

The second step is to fuse the different feature sets and learn the rank-ing model. For this task, a separate kernel function is computed for eachset of features (i.e. we compute five different kernels). The computed ker-nels are considered as base kernels for our multiple kernel learning mod-ule. Using the multiple kernel learning algorithm, we learn the optimalweights for creating a linear combination from the base kernels togetherwith the optimal parameters for the ranking model.

Instead of using a single kernel matrix (K) for learning the rankingmodel, an optimal combination from several base kernels is learned, andthe combination of the base kernels matrix (Kd) is used for training theranking model, where kd(xi,x j) = φ(xi)

Td φ(x j)d represents the dot prod-

uct in feature space φ and is parametrized by d such that:

kd(xi,x j) = fd({ki(xi,x j)}ti=1), (1)

where t is the number of base kernels, d∈Rt is the optimal kernel weightsto be learned, and the combination function fd can be a linear or a non-linear function for combining the base kernels . Our goal is to learn theoptimal values for (d) together with the optimal values for the Lagrangemultipliers (α) representing the learned ranking model. Accordingly, thestandard rankSVM [2] objective function is updated as follows:

maximizeα

{1Tα− 1

T Qdα + r(d)}

subject to 0≤ αi, j ≤C,∀(i, j) ∈ P,

d≥ 0,

(2)

Qd,(i, j),(u,v) = kd(xi,xu)+ kd(x j,xv)− kd(xi,xv)− kd(x j,xu), (3)

where both the regularizer r and the kernel kd can be any general differ-entiable functions of d with continuous derivatives and P represents theset of preference paris such that: P = {(i, j)|xi � x j}. In our approach,

GIST LBP HoG

Feature Extraction

Which image has a stronger ‘smiling’ attribute?

ColorGabor

15

30

base kernel 1 base kernel 2 base kernel 3 base kernel 4 base kernel 5

Multiple Kernel Learning

RankMKL

Figure 1: Illustration of our proposed approach. Given two images, itis required to detect which image has a stronger visual attribute thanthe other. Different features are extracted and Multiple Kernel Learning(MKL) is used for fusing the kernels from each feature set. RankMKL isused for ranking the images.

five base kernels are used, one for each of the five feature sets (LBP, HoG,Gabor, GIST, and Color). The kernel function kd is selected as a linearcombination from the five base kernels: kd(xu,xv) =∑

5i=1 diki(xu,xv) and

L2 regularization function is used for r(d). Gradient descent is used forsolving Eq. (2) using the same algorithm in [5].

Conclusion

In this paper, the standard multiple kernel learning formulation is ex-tended to the learning to rank problem. Effectiveness of the proposedapproach is demonstrated on the visual image comparison task. AlthoughMKL has been extensively used for object recognition and image cate-gorization, this is the first time it has been used for image comparison.Through extensive experiments, the advantage of our approach is clearlydemonstrated both in terms of accuracy and runtime efficiency. Futurework includes exploring more applications of multiple kernel learning forranking, such as web search and recommendation systems.

[1] Navneet Dalal and Bill Triggs. Histograms of oriented gradients forhuman detection. In Computer Vision and Pattern Recognition, 2005.CVPR 2005. IEEE Computer Society Conference on, volume 1, pages886–893. IEEE, 2005.

[2] Thorsten Joachims. Optimizing search engines using clickthroughdata. In Proceedings of the eighth ACM SIGKDD international con-ference on Knowledge discovery and data mining, pages 133–142.ACM, 2002.

[3] Timo Ojala, Matti Pietikainen, and Topi Maenpaa. Multiresolu-tion gray-scale and rotation invariant texture classification with lo-cal binary patterns. Pattern Analysis and Machine Intelligence, IEEETransactions on, 24(7):971–987, 2002.

[4] Aude Oliva and Antonio Torralba. Modeling the shape of the scene:A holistic representation of the spatial envelope. International jour-nal of computer vision, 42(3):145–175, 2001.

[5] Manik Varma and Bodla Rakesh Babu. More generality in efficientmultiple kernel learning. In Proceedings of the 26th Annual Inter-national Conference on Machine Learning, pages 1065–1072. ACM,2009.

Recommended