+ All Categories
Home > Documents > Spline Regression Hashing for Fast Image Search

Spline Regression Hashing for Fast Image Search

Date post: 04-Dec-2016
Category:
Upload: alexander-g
View: 217 times
Download: 3 times
Share this document with a friend
12
4480 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 10, OCTOBER 2012 Spline Regression Hashing for Fast Image Search Yang Liu, Fei Wu, Yi Yang, Yueting Zhuang, and Alexander G. Hauptmann Abstract—Techniques for fast image retrieval over large databases have attracted considerable attention due to the rapid growth of web images. One promising way to accelerate image search is to use hashing technologies, which represent images by compact binary codewords. In this way, the similarity between images can be efficiently measured in terms of the Hamming distance between their corresponding binary codes. Although plenty of methods on generating hash codes have been proposed in recent years, there are still two key points that needed to be improved: 1) how to precisely preserve the similarity structure of the original data and 2) how to obtain the hash codes of the previ- ously unseen data. In this paper, we propose our spline regression hashing method, in which both the local and global data similarity structures are exploited. To better capture the local manifold structure, we introduce splines developed in Sobolev space to find the local data mapping function. Furthermore, our framework simultaneously learns the hash codes of the training data and the hash function for the unseen data, which solves the out-of- sample problem. Extensive experiments conducted on real image datasets consisting of over one million images show that our proposed method outperforms the state-of-the-art techniques. Index Terms— Hashing, image retrieval, spline regression. I. I NTRODUCTION W ITH the advance of Internet, many photo sharing web- sites, such as Flickr and Picasa, have become popular with numerous photos uploaded everyday. As the size of image data continues to grow, there is an increasing need of searching similar images from very large databases. However, tradi- tional Content-Based Image Retrieval (CBIR) methods, such as [1]–[3], are infeasible for searching over large scale data databases due to the high dimensionality of the visual feature space. For example, given one query image, exhaustively searching its nearest neighbors over all images in the database has a linear time complexity which is not scalable when the number of images is very large. Instead of performing exhaus- tive search through linear scan, faster indexing techniques with Manuscript received October 20, 2011; revised May 7, 2012; accepted June 12, 2012. Date of publication July 10, 2012; date of current version September 13, 2012. The work of Y. Liu, F. Wu, and Y. Zhuang was supported in part by 973 Program under Grant 2012CB316400, the National Natural Science Foundation of China under Grant 60833006 and Grant 90920303, and the National HeGaoJi Key under Project 2010ZX01042-002- 003. The work of Y. Yang and A. Hauptmann was supported in part by the National Science Foundation under Grant IIS-0917072. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Theo Gevers. Y. Liu, F. Wu, and Y. Zhuang are with the College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China (e-mail: [email protected]; [email protected]; [email protected]). Y. Yang and A. G. Hauptmann are with the School of Computer Sci- ence, Carnegie Mellon University, Pittsburgh, PA 15213 USA (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2012.2207394 sub-linear or even constant time complexity is highly desirable for large scale CBIR applications. Over the past decades, plenty of techniques have been proposed to accelerate the nearest neighbors search. Tree based methods e.g., binary search trees [4], can be carried out efficiently with pre-build space-partitioning index structure. However, due to the space limitation, these methods are not suitable to index multimedia data with high dimensional feature space. On the other hand, if the retrieval result is not needed to exactly match the nearest neighbors, we can develop approximate similarity search techniques based on hashing methods. The idea is that similar data should be mapped into similar binary codewords within a small Hamming distance. By using the codewords as indexing keys, the similarity search in Hamming space often has sub-linear time complexity and is feasible for large scale dataset. Locality Sensitive Hashing (LSH) [5] and its variations [6]–[9] are the representative hashing methods that preserve similarity by random projection and thresholding. Following LSH, several hashing techniques have been proposed recently, including Locality Preserving Indexing [10], Spectral Hash- ing [11], Semantic Hashing [12] with Restricted Boltzmann Machines [13], Self-Taught Hashing [14], Multiple Feature Hashing [15], Sparse Spectral Hashing [16] and Hypergraph Spectral Hashing [17], etc. All of these methods attempt to find better data-aware hash functions through machine learning techniques. Although these recently proposed techniques have been successfully applied to different retrieval tasks, there are still several shortcomings that need to be further improved. Firstly, preserving the similarity structure of the original data points is a key issue of hashing [11]. Different methods emphasize on different aspects of the data similarity structure. For example, Locality Sensitive Hashing [18] aims to find locality sensitive hash functions which are data independent while Spectral Hashing [11] aims to preserve the global similarity structure of the training data points. Nevertheless, the locality similarity structure of data is not explicitly utilized in the above approaches. As demonstrated in [10], [15], [19] and [14], compared with the global structure, the local structure of data is more straightforward to certain retrieval applications and has more discriminating power under the manifold assumption of data distribution. The performance of the learned hash function can be further improved by utilizing methods that are more capable to capture the local geometry. Secondly, Locality Preserving Indexing [10] and Self- Taught Hashing [14] are designed to preserve the local structure of data. However, as demonstrated in [20], empha- sizing local structure only may induce over-fitting, which will degrade the performance consequently. To address this 1057–7149/$31.00 © 2012 IEEE
Transcript
Page 1: Spline Regression Hashing for Fast Image Search

4480 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 10, OCTOBER 2012

Spline Regression Hashing for Fast Image SearchYang Liu, Fei Wu, Yi Yang, Yueting Zhuang, and Alexander G. Hauptmann

Abstract— Techniques for fast image retrieval over largedatabases have attracted considerable attention due to the rapidgrowth of web images. One promising way to accelerate imagesearch is to use hashing technologies, which represent images bycompact binary codewords. In this way, the similarity betweenimages can be efficiently measured in terms of the Hammingdistance between their corresponding binary codes. Althoughplenty of methods on generating hash codes have been proposedin recent years, there are still two key points that needed to beimproved: 1) how to precisely preserve the similarity structure ofthe original data and 2) how to obtain the hash codes of the previ-ously unseen data. In this paper, we propose our spline regressionhashing method, in which both the local and global data similaritystructures are exploited. To better capture the local manifoldstructure, we introduce splines developed in Sobolev space to findthe local data mapping function. Furthermore, our frameworksimultaneously learns the hash codes of the training data andthe hash function for the unseen data, which solves the out-of-sample problem. Extensive experiments conducted on real imagedatasets consisting of over one million images show that ourproposed method outperforms the state-of-the-art techniques.

Index Terms— Hashing, image retrieval, spline regression.

I. INTRODUCTION

W ITH the advance of Internet, many photo sharing web-sites, such as Flickr and Picasa, have become popular

with numerous photos uploaded everyday. As the size of imagedata continues to grow, there is an increasing need of searchingsimilar images from very large databases. However, tradi-tional Content-Based Image Retrieval (CBIR) methods, suchas [1]–[3], are infeasible for searching over large scale datadatabases due to the high dimensionality of the visual featurespace. For example, given one query image, exhaustivelysearching its nearest neighbors over all images in the databasehas a linear time complexity which is not scalable when thenumber of images is very large. Instead of performing exhaus-tive search through linear scan, faster indexing techniques with

Manuscript received October 20, 2011; revised May 7, 2012; acceptedJune 12, 2012. Date of publication July 10, 2012; date of current versionSeptember 13, 2012. The work of Y. Liu, F. Wu, and Y. Zhuang wassupported in part by 973 Program under Grant 2012CB316400, the NationalNatural Science Foundation of China under Grant 60833006 and Grant90920303, and the National HeGaoJi Key under Project 2010ZX01042-002-003. The work of Y. Yang and A. Hauptmann was supported in part by theNational Science Foundation under Grant IIS-0917072. The associate editorcoordinating the review of this manuscript and approving it for publication wasProf. Theo Gevers.

Y. Liu, F. Wu, and Y. Zhuang are with the College of Computer Scienceand Technology, Zhejiang University, Hangzhou 310027, China (e-mail:[email protected]; [email protected]; [email protected]).

Y. Yang and A. G. Hauptmann are with the School of Computer Sci-ence, Carnegie Mellon University, Pittsburgh, PA 15213 USA (e-mail:[email protected]; [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIP.2012.2207394

sub-linear or even constant time complexity is highly desirablefor large scale CBIR applications.

Over the past decades, plenty of techniques have beenproposed to accelerate the nearest neighbors search. Tree basedmethods e.g., binary search trees [4], can be carried outefficiently with pre-build space-partitioning index structure.However, due to the space limitation, these methods arenot suitable to index multimedia data with high dimensionalfeature space. On the other hand, if the retrieval result is notneeded to exactly match the nearest neighbors, we can developapproximate similarity search techniques based on hashingmethods. The idea is that similar data should be mapped intosimilar binary codewords within a small Hamming distance.By using the codewords as indexing keys, the similarity searchin Hamming space often has sub-linear time complexity andis feasible for large scale dataset.

Locality Sensitive Hashing (LSH) [5] and its variations[6]–[9] are the representative hashing methods that preservesimilarity by random projection and thresholding. FollowingLSH, several hashing techniques have been proposed recently,including Locality Preserving Indexing [10], Spectral Hash-ing [11], Semantic Hashing [12] with Restricted BoltzmannMachines [13], Self-Taught Hashing [14], Multiple FeatureHashing [15], Sparse Spectral Hashing [16] and HypergraphSpectral Hashing [17], etc. All of these methods attempt tofind better data-aware hash functions through machine learningtechniques. Although these recently proposed techniques havebeen successfully applied to different retrieval tasks, there arestill several shortcomings that need to be further improved.

Firstly, preserving the similarity structure of the originaldata points is a key issue of hashing [11]. Different methodsemphasize on different aspects of the data similarity structure.For example, Locality Sensitive Hashing [18] aims to findlocality sensitive hash functions which are data independentwhile Spectral Hashing [11] aims to preserve the globalsimilarity structure of the training data points. Nevertheless,the locality similarity structure of data is not explicitly utilizedin the above approaches. As demonstrated in [10], [15],[19] and [14], compared with the global structure, the localstructure of data is more straightforward to certain retrievalapplications and has more discriminating power under themanifold assumption of data distribution. The performanceof the learned hash function can be further improved byutilizing methods that are more capable to capture the localgeometry.

Secondly, Locality Preserving Indexing [10] and Self-Taught Hashing [14] are designed to preserve the localstructure of data. However, as demonstrated in [20], empha-sizing local structure only may induce over-fitting, whichwill degrade the performance consequently. To address this

1057–7149/$31.00 © 2012 IEEE

Page 2: Spline Regression Hashing for Fast Image Search

LIU et al.: SPLINE REGRESSION HASHING FOR FAST IMAGE SEARCH 4481

problem, a possible solution is to impose proper regular-ization term into the objective function. In this paper, wepropose our Spline Regression Hashing approach in whichthe global structure of data is also considered in the objectivefunction.

Thirdly, it is also critical for hash functions to handle noveldata points that are unseen during the training phase, whichis usually referred as the out-of-sample problem. In SpectralHashing [11], the solution of out-of-sample extension relieson a restrictive assumption that data are uniformly distributedin a hyper-rectangle, which often doesn’t hold. In Self-TaughtHashing [14], the out-of-sample problem is solved via a two-stage approach. Unfortunately, separately optimizing the hashcode of the training data and the hash function for noveldata may incur more noise and degrade the performance.Furthermore, the two-stage approach is also time consumingwhen training the hash functions.

In this paper, we propose a novel hashing method namelySpline Regression Hashing in which splines are developedto accurately capture the local data structure. Moreover,both local and global similarity structure are utilized in ourapproach. It is worthwhile to highlight the following aspectsof our method.

1) To better exploit the local structure of data, we use thespline developed in Sobolev space [21] to accurately pre-serve the local similarity structure in our hash function.By composing the polynomials and Greens functionsinto the local spline, the data points can be smoothlyand accurately mapped into their corresponding localhash codes according to their local manifold structure.Moreover, compared with Gaussian function based affin-ity measures, the nonlinear mapping developed by thespline has no parameters to be tuned, which makes ourmethods more stable.

2) Unlike Self-Taught Hashing and Locality PreservingIndexing which only focus on the local similarity struc-ture of data, the global structure of data is also consid-ered and optimized in our approach. By utilizing boththe local and global structure of the data, over-fittingcan be decreased and our method thus becomes morerobust.

3) In our method, the hash codes of the training data andthe hash function for out-of-sample data are simulta-neously learned in the regression framework which issignificantly different from other existing approaches.This strategy not only simplifies the process of learningthe hash function but also makes the learned hashfunction more effective as the consistency of the hashcode and hash function are explicitly considered in theobjective function.

To evaluate the effective of our proposed method, weconducted experiments on four real-world image datasets(including dataset with one million images). The experimentalresults show that the proposed approach outperforms the state-of-the-art techniques.

The rest of this paper is organized as follows. In Section II,we review the related work. Then, in Section III we presentour approach in details. The experimental results and analysis

are showed in Section IV. Finally, in Section V, we concludethis paper and discuss the future work.

II. RELATED WORK

There has been extensive research towards fast similaritysearch with hashing techniques. In this section, we brieflyreview some of the representative related methods.

A. Locality Sensitive Hashing

The key idea of Locality Sensitive Hashing (LSH) [18]is to map the data points using several hash functions toensure that points close to each other will have a highprobability to be mapped into the same buckets. Then, approx-imate nearest neighbors search can be performed by find-ing items in the buckets with the same hash code of thequery.

The LSH scheme relies on the hash functions that satisfythe locality sensitive hashing property. According to [22],considering a family H of hash functions (with an associatedprobability distribution), each hash function h ∈ H shouldsatisfy

Pr(h(xi ) = h(x j )) = sim(xi , x j ) (1)

where Pr corresponds to the probability, xi and x j are itemsto be hashed and sim(xi , x j ) ∈ [0, 1] is the similarity functiondefined on the collection of objects. Different similarity met-rics can be applied here to measure the similarity for differentapplications.

To generate the l bit hash codes, LSH randomly selectsl hash functions from H with thresholds to build the hashfunction for the query data.

As a special case of LSH, Min-Hash uses Jaccard simi-larity coefficient to measure the similarity of two sets andhas successfully applied in images retrieval with bag ofwords representation [7], [23]. Comparing with these meth-ods, our approach can generate more compact binary hashcodes and is more feasible to deal with different featuretypes.

B. Restricted Boltzmann Machines

In [12], stacking Restricted Boltzmann Machines (RBMs)[13] were utilized to obtain the compact binary codes. Bytraining the deep graphical model with multiple latent layers,the values of the latent variables in the deepest layer withsmall number of binary variable are supposed to be a goodrepresentation of each data.

The RBMs consist of two critical stages: unsupervised pre-training and supervised fine-tuning. In the pre-training stage,each two layers are trained at a time and the output of the firstlayer is used as the input to the second layer. The process isrepeated layer by layer in a greedy way until all the layers havebeen trained. After that, the parameters of all layers are fine-tuned using back propagation according to different supervisedconstrains. In this way, the generated binary codes can wellcapture the structure of the training data. However, trainingRestricted Boltzmann Machines is extremely time consumingand sufficient training data for fine-tuning are required for fine-tuning.

Page 3: Spline Regression Hashing for Fast Image Search

4482 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 10, OCTOBER 2012

......

......

......

......

1 1 0 1 1 0 1 0 0 10 0 1 1 0 1 1 0 1 00 0 0 0 0 1 1 1 1 01 0 1 1 1 0 0 1 0 01 1 0 1 0 1 0 1 0 10 1 0 1 1 0 0 0 1 01 1 0 0 1 0 1 0 0 10 0 1 0 1 1 0 1 1 0

1 1 0 1 1 0 1 0 0 10 0 1 1 0 1 1 0 1 00 0 0 0 0 1 1 1 1 01 0 1 1 1 0 0 1 0 01 1 0 1 0 1 0 1 0 10 1 0 1 1 0 0 0 1 01 1 0 0 1 0 1 0 0 10 0 1 0 1 1 0 1 1 0

0 0 10 1 11 1 01 0 00 1 10 1 01 0 10 1 1

0.0 0.1 0.6 0.6 0.7 0.6 0.8 0.50.1 0.0 0.3 0.5 0.8 0.7 1.0 0.90.6 0.3 0.0 0.1 0.6 0.9 1.0 0.90.6 0.5 0.1 0.0 0.2 0.8 0.9 0.80.7 0.8 0.6 0.2 0.0 0.1 0.8 1.00.6 0.7 0.9 0.8 0.1 0.0 0.2 0.70.8 1.0 1.0 0.9 0.8 0.2 0.0 0.10.5 0.9 0.9 0.8 1.0 0.7 0.1 0.0

Image Database

Hash Functions

Binary Hashing CodesTraining Images

Local manifold

Global structure

Query Images

Hash Functions

Matching

Binary Codes of Base ImagesBinary Codes of Query Images Query Results

Fig. 1. Hash functions are learned from the training images by considering both the local and global data structure. Then the learned hash functions areapplied to all of the images in the database to obtain their corresponding binary codewords. For query images, we can calculate their binary codes using hashfunctions, and find similar images by comparing the hamming distance between the query image and the image in the database.

C. Spectral Hashing

Spectral Hashing (SpH) [11] formalized the criterion of agood hash code as the following objective function:

minimize:∑

i, j

Wi, j ‖ yi − y j ‖2

s.t.: yi ∈ {−1, 1}l∑

i

yi = 0

1

n

i, j

yi yTj = I (2)

where {yi }ni=1 is the list of codewords with length l for ntraining data points and Wn×n is the affinity matrix whichreflects the global similarity structure of the training data.After spectral relaxation, the codewords of the training datacan be solved via generalized eigen decomposition withthresholds.

By utilizing recent results on convergence of graph Lapla-cian eigenvectors to the Laplace-Beltrami eigenfunctions ofmanifold, SpH can efficiently calculate the hash codes of thenovel data points. However, the out-of-sample extension ofSpH relies on the assumption that the data are uniformlydistributed in a hyper-rectangle, which often can’t be met.

Intuitively, given an image, only a few visual featuresare enough to represent this image. Motivated by the recentadvance in compressed sensing, sparsity-based feature selec-tion approaches are developed in computer vision and mul-timedia retrieval. The basic idea of sparsity-based featureselection is to impose a (structural) sparse penalty to selectdiscriminative features [24], for example, structural groupingsparsity was conducted in [25] to induce a (structural) sparseselection model and identify subgroups features.

A sparsity-based spectral hashing approach, called SparseSpectral Hashing (SSH), was proposed in [16]. The basicidea in SSH is to introduce sparse principal component analy-sis (Sparse PCA) and Boosting Similarity Sensitive Hashing(Boosting SSC) into traditional spectral hashing for botheffective and data-aware binary coding for real data. The out-of-sample problem is also a challenge in SSH for data witharbitrary distributions.

D. Self-Taught Hashing

Self-Taught Hashing (STH) [14] divided the semantic hash-ing process into two distinct stages. In the unsupervisedlearning stage, the optimal l-bit hash codes of the trainingdata are obtained through solving the LapEig problem andthresholding at the median value. However, different fromSpH which aims to preserve the global similarity of all datapairs, the unsupervised learning stage of STH focus on thelocal similarity structure of data, which is thought to be moremeaningful to retrieval applications. In the supervised learningstage, by taking the binary codes obtained from the first stagesas labels, STH respectively trains l classifiers using the trainingdata via Support Vector Machine to predict the l-bit hash codeof query data points which are unseen before.

However, only considering the local similarity of data mayincur over-fitting when training the hash functions. Moreover,training the Support Vector Machine is very time consumingcomparing to other methods.

III. SPLINE REGRESSION HASHING

In this section, we will present the proposed Spline Regres-sion Hashing (SRH) approach. The main framework of ourapproach is given in Figure 1.

Page 4: Spline Regression Hashing for Fast Image Search

LIU et al.: SPLINE REGRESSION HASHING FOR FAST IMAGE SEARCH 4483

A. Notations

Let X = [x1, x2, . . . , xn] ⊂ Rn×m represent the collection

of training data X where n is the number of images and mis the dimensionality of their feature space, the hash functionaims to map each datum xi into Hamming space to obtainits compact representation. Assuming that the desired lengthof code is l bits, the hash function corresponding to eachbit of the hash codes can be denoted by hi (·), where i =1, 2, . . . , l. Then we can further define the complete hashfunction H (·) = [h1(·), h2(·), . . . , hl(·)]. Furthermore, we useY = [y1, y2, . . . , yn] ∈ [−1, 1]n×l to denote the obtainedbinary codewords of each image in the training set.

B. Objective Function

As discussed in [11], a “good” hash function should satisfythe following requirements.

1) Similarity preserving, which means similar data shouldbe mapped into similar codewords within a small Ham-ming distance;

2) Extendibility, which means the flexibility to calculate thehash codes of novel inputs (out-of-sample data);

3) Efficiency, which means small number of code bits areused to encode the whole dataset.

Since the third requirement can be satisfied by letting eachbit to have a 50% chance of being 1 or −1, we temporarilyignore it and discuss the first two requirements which aremore important. As for the similarity preserving criterion,while local structure is indeed important for inferring a betterhash function, focusing only on local information may inferoverfitting when learning the hash function for novel datapoints that are unseen during the training phase. To the end,our approach utilizes both the local and global similaritystructure of training data to find the compact codewords whichare able to better preserve the original similarity. Moreover,in order to obtain the hash function for novel data, as alreadydiscussed, our approach starts with an idea which simultaneouslearns the hash codes of the training data and the hash functionfor unseen data in a regression model. More specifically, giventraining data points {xi }ni=1, their corresponding binary hashcodes {yi }ni=1 and the global hash function H can be obtainedby the following optimization function:

argminH,Y,J

n∑

i=1

J (xi)+ λ

(n∑

i=1

�(H (xi), yi )+ γ�(H )

)

s.t.:

⎧⎪⎨

⎪⎩

yi ∈ {−1, 1}lH (·) ∈ {−1, 1}lY T Y = I

(3)

where J (·) is the regularized loss function at each data pointbased on its local structure and �(·) is the prediction loss func-tion of the regression model. Furthermore, the regularizationfunction �(H ) and its corresponding control parameter γ areused to measure the complexity of the hash function H . λrepresents the trade-off between the prediction loss from localstructure and the prediction loss from global structure. Besides,yi ∈ {−1, 1}l and H (·) ∈ {−1, 1}l in Eq. (3) enforce the hash

codes of H and Y to be binary while the constraint Y Y T = Iguarantees each bit to be uncorrelated.

We first discuss the local loss function J (·) in our objectivefunction. Similar to [20], [26], [27], given each datum xi ∈ X ,to exploit its local similarity structure, we add its k−1 nearestneighbors as well as xi itself into a local clique denoted asNi = {xi , xi1 , xi2 , . . . , xik−1 }. Furthermore, we assume thatthere is a local hash function Hi that can well predict thehash codes of the data points in Ni . Given data x p ∈ Ni , wecan map it into its corresponding binary codes yp by the localhash function Hi .

In this way, we can formulate the local loss functionJ (·) by

J (xi) =∑

x p∈Ni

�(Hi(x p), yp)+ γ�l(Hi) (4)

where �(·) is the prediction loss function and the regularizationterm �l measures the smoothness of Hi . By summing thelocal loss for all data points in N together, we can rewrite theobjection function Eq (3) as:

n∑

i=1

J (xi)+ λ

(n∑

i=0

�(H (xi), yi )+ γ�(H )

)(5)

=n∑

i=1

⎝∑

x p∈Ni

�(Hi(x p), yp)+ γ�l(Hi)

(n∑

i=0

�(H (xi), yi )+ γ�(H )

)(6)

With different hypothesis, Hi can be generated in differentforms. Similar to [26], [27], if the structure of local manifoldis considered as linear, Hi can thus be defined as a linearregression model. In addition, nonlinear mappings such asGaussian kernel function can also be used to improve theperformance of linear methods. However, researches in [28]indicated that above representations of local manifold may lackthe ability to accurately capture the local geometry. Moreover,the performance of linear function or Gaussian kernel functionis significantly affected by the setting of the parameters. Tosolve these defects, by developing mappings in Sobolev space[21], spline functions can be smooth, nonlinear and thereforeable to interpolate the scattered data points with arbitrarilycontrollable accuracy.

Inspired by above good characteristics of spline, we employspline into our hashing framework by finding continuous localhash functions in Sobolev space. According to [29], providedthat the penalty term �l(Hi) is defined as a semi-norm, theminimizer Hi in Eq. (4) will be given by:

Hi(x) =d∑

j=1

βi, j p j (x)+k∑

j=1

αi, j Gi, j (x) (7)

where d = (m + s − 1)!/(m!(s − 1)!) and {p j (x)}dj=1 are aset of primitive polynomials which can span the polynomialspace with a degree less than s. αi = {αi1 , αi2 , . . . , αik } andβi = {βi1 , βi2 , . . . , βik } are the coefficients with respect to

Page 5: Spline Regression Hashing for Fast Image Search

4484 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 10, OCTOBER 2012

point xi . Gi, j is a Green’s function [29], defined by:

Gi, j (x) ={

(‖ x − xi j ‖)2s−m log(‖ x−xi j ‖), if m is even;(‖ x − xi j ‖)2s−m, if m is odd.

(8)In this way, the local hash function Hi can better fit the

local structure near the scattered points as the data points canbe locally wrapped by the Green’s function G which is partof the local hash function Hi .

As has been shown that the global distribution of imagesis usually nonlinear [30] (especially for large scale imagedatasets), to better exploit the global structural information,it is helpful to employ kernel trick to achieve the non-linearglobal mapping which is more general and flexible. Morespecifically, we can write the global hash functions as:

H (x) = φ(W )T φ(x)+ b (9)

where φ is the kernel function that maps x from Euclideanspace R

m to a Hilbert space H, φ(W ) is the projection matrixfrom H to Eculidean space R

l and b is a bias term. Inthis way, the hash function H can map each datum throughnonlinear projection and then binarize it into l dimensionbinary codewords.

By employing the Frobenius distance as the loss function �,the objective function can be rewritten as:

argminφ(W ),b,Hi,Y

n∑

i=1

⎝∑

x p∈Ni

‖ Hi(x p)− yp ‖2F +γ�i (Hi)

(n∑

i=1

‖ H (xi)− yi ‖2F +γ ‖ φ(W ) ‖2F)

s.t.:

⎧⎪⎨

⎪⎩

yi , yp ∈ {−1, 1}lHi(·), H (·) ∈ {−1, 1}l,Y T Y = I

(10)

where the definitions of H and Hi are given out in Eq. (9)and Eq. (7) respectively.

C. Optimization

As the objective function in Eq. (10) is NP hard, following[11], we relax the constraints Y ∈ {−1, 1} and Hi(·), H (·) ∈{−1, 1} to make the problem computationally trackable. Therelaxed optimization function turns to be:

argminφ(W ),b,Hi,Y

n∑

i=1

⎝∑

x p∈Ni

‖ Hi(x p)− yp ‖2F +γ�i (Hi)

(n∑

i=1

‖ φ(W )T φ(xi )+ b − yi ‖2F +γ ‖ φ(W ) ‖2F)

s.t. Y T Y = I (11)

We first discuss how to solve the parameters of Hi thatminimize the objective function Eq. (11). As discussed, thelocal hash function exploited by spline can be formulated by

Hi(x) =d∑

j=1

βi, j p j (x)+k∑

j=1

αi, j Gi, j (x) (12)

Then for each datum xi , by substituting its k nearest datapoints in Ni into Eq. (12), we can obtain k equations:

yip =d∑

j=1

βi, j p j (xip )+k∑

j=1

αi, j Gi, j (xip ) (13)

where p = 1, 2, . . . k. Moreover, according to the defini-tion of conditionally positive semidefiniteness of functionsin [28], [31], we add boundary conditions to ensure that thereconstruction error of �i (Hi) is zero at infinity:

d∑

j=1

αi, j · pi(x) = 0 (14)

By combing Eq. (13) and Eq. (14) together, we have totallyk + d variables and k + d equations. The coefficients αi andβi can be solved via

A ·(

αi

βi

)=(

Y Ti0

)(15)

where Yi = [yi , yi1 , yi2 , . . . , yi(k−1) ] corresponds to the hashcode of data points in Ni generated by the local hash functionHi and A =

(Kl PPT 0

). In matrix A, Kl is a k × k symmetrical

matrix with element Kl pq = G p,q(‖ xip − xiq ‖) and P isan k × l matrix with elements Pij = pi(xi j ). Considering thereconstruction error J (xi ), following [32], we have

J (xi) ≈k∑

j=1

(yi j − Hi(xi j ))2 + ηαi Kiα

Ti (16)

In Eq. (16), since only a small number of nearest neighbor-hoods are selected for each data point, if the regularizationparameter η is small enough, the constraint of the first termcan be well satisfied. So we can evaluate

∑kj=1(yi j−Hi(xi j ))

2

as zero in this case. Therefore, by omitting the scalar η, wehave

J (xi) ∝ αT Kiα (17)

Denoting Mi as the upper left k × k submatrix of the matrixA−1, it can be further demonstrated that

J (xi ) ≈ ηY Ti Mi Yi (18)

holds.Since there are n local hash functions with respect to n local

cliques, now we consider how to add together the hash codesgenerated by different local hash functions. As can be seenthat each local hash code matrix Yi = [yi , yi1 , . . . , yik−1 ] is asubmatrix of the global hash code matrix Y = [y1, y2, . . . , yn],we can find a column selection matrix Si ∈ R

n×k to map theglobal hash code matrix into local hash code matrix.

More specifically, given the r th row and cth column elementSi (r, c), if the column selection matrix Si satisfy

Si (r, c) ={

1, if r = ic.

0, otherwise(19)

then we will have Yi = Y Si . In this way, the global codematrix Y can be mapped into n local code matrix by n

Page 6: Spline Regression Hashing for Fast Image Search

LIU et al.: SPLINE REGRESSION HASHING FOR FAST IMAGE SEARCH 4485

column selection matrices. Then the combined local loss termturns to be

n∑

i=1

J (xi ) = η

n∑

i=1

Y Ti Mi Yi = ηST Y T MY S (20)

where S = [S1, . . . , Sn] and M = diag(M1, . . . , Mn). So foreach data point, the local hash codes generated by differenthash functions are combined into one matrix to find the overalloptimized hash codes. The process of mapping a single pointinto binary codewords is illustrated in Figure 2.

Defining Ll = ST M S, the objective function now becomes:

argminφ(W ),b,Y

ηY T LlY + λ

( n∑

i=1

‖ φ(W )T φ(xi )+ b − yi ‖2F

+γ ‖ φ(W ) ‖F

)

s.t.: Y T Y = I (21)

Now we try to find b and φ(W ) that minimize the objectivefunction in Eq. (21). By setting the derivative of Eq. (21) w.r.t.b to zero, we get:

1Tn

(W T φ(X)1n + 1nb − Y

)= 0 (22)

and b can be solved as:

b = 1

n

(1nY − 1T

n φ(X)T W)

(23)

By setting the derivative of Eq. (21) w.r.t φ(W ) to zero, wehave:

φ(X)(φ(X)T φ(X)+ 1nb − Y

)+ γφ(W ) = 0 (24)

Then we substitute Eq. (23) into Eq. (24) and get:

φ (X) φ (X)T W + X1n

(1

n

(1T

n Y − 1Tn φ (X)T W

))

−φ (X)+ γφ (W ) = 0

⇒ φ (W ) =(φ (X) Hnφ (X)T + γ I

)−1φ (X) HnY

(25)

where Hn = I − 1n 1n1T

n is the global centering matrix.Using L to denote the objective function in Eq. (21), by

employing the property that for any matrix M , ‖ M ‖2F=tr(MT M) holds, L can be rewritten as:

L = ηY T Ll Y + λ(

tr( (

φ (X)T φ (W )+ 1nbT − Y)T

×(φ (X)T φ (W )+1nbT−Y

))+γ tr

(φ (W )T φ (W )

) )

(26)

Letting Q = (φ(X)Hnφ(X)T +γ I )−1, L can be rewritten as:

L = ηY T LlY + λ(

tr[Y T(

Hnφ(X)T Qφ(X)Hn − Hn

)T

×(

Hnφ(X)T Qφ(X)Hn−Hn

)Y]+γ tr [φ(W )T φ(W )]

)

= ηY T LlY + λ(

tr[Y T (Hn − Hnφ(X)T Qφ(X)Hn)Y

])

(27)

Substituting Q = (φ(X)Hnφ(X)T +β I )−1 back into Eq. (27),we have

L = ηY T LlY + λ(

Y T (Hn − Hnφ(X)T (φ(X)Hφ(X)T

+γ I )−1φ(X)H)Y)

= ηY T LlY + λ(γ Y T (Hn(Hnφ(X)T φ(X)Hn

+γ I )−1 Hn)Y)

(28)

where φ(X)T φ(X) can be calculated by a kernel functionK (xi , x j ) = φ(xi )

T φ(x j ) and we can use a variety of kernelfunctions with different properties. In this way, we get

L = ηY T LlY + λ(γ Y T Hn (HnKHn + γ I )−1 H Y

)(29)

where K is the kernel matrix with its element Ki j = K (xi , x j ).Moreover, we can define matrix Lg = Hn(HnKHn +

γ I )−1 Hn and our objective function can be rewritten as:

argminY

Y T (Ll + μLg)Y

s.t.: Y T Y = I (30)

where μ = λγ×η .

It can be demonstrated that both Ll and Lg are symmetricand positive matrix. So Eq. (30) corresponds to the generaleigen problem and Y can be obtained from the l eigenvectorsof Ll +μLg with the smallest eigenvalues (the trivial solutionwith zero eigenvalue is removed).

To obtain the hash function for out-of-sample data, as wealready have φ(X) and b defined as:

φ (W ) =(φ (X) Hnφ (X)T + γ I

)−1φ (X) HnY

= φ (X) Hn

(Hnφ (X)T X Hn + γ I

)−1Y

b = 1

nY T 1n − 1

nW T φ (X) 1n (31)

by substituting Eq. (31) into H (x) = φ(W )T φ(X) + b, wehave:

H(xq) = Y T

(Hnφ

(xq)T

φ(xq)Hn + γ I

)−1Hnφ (X)T φ

(xq)

+1

nY T 1n − 1

nY T

(Hφ (X)T φ (X) Hn + γ I

)−1

×Hφ (X)T φ (X) 1n (32)

where xq is the novel data out of the training set and X isthe matrix that contains all training data. Denoting Kxq as thekernel matrix calculated by xq and X , we have:

H (xq) = Y T (HnKHn + γ I )−1 HnKxq +1

nY T 1n

−1

nY T (HnKHn + γ I )−1 HnK1n (33)

Recall that we relaxed the constraints Y, H (x) ∈ {−1, 1}during the optimization process. Given a novel data pointxq , to make its hash code binary, similar to [11] and [31],the most straight way is to set thresholds to binarize therelaxed hash code Y . Moreover, To make the binary codesmore efficient, we use the median value of each dimension in

Page 7: Spline Regression Hashing for Fast Image Search

4486 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 10, OCTOBER 2012

x2

x1

xq

. . .

xL

x1

xq

x2

xq

xq

xL

H1

H2

HL

y1q

y2q

yLq

x1

xq

x2

xL

yqH

k-Nearest Neighbor Graph Local Hash functions Global Hash Function

Fig. 2. Single data point xq is mapped by its L associated local hash functions H1, H2, . . . , HL corresponding to the L kNN graphs it belongs to. Then,by exploiting the L local hash codes with the global structure of all the data points, the final hash code yq is generated by a global hash function H .

Y as the threshold. Denoting the threshold vector as t in whicheach column contains the median of the corresponding columnin Y , we obtain the hash function as:

H (xq) = sgn[Y T (HnKHn + γ I )−1 H Kxq +1

nY T 1n

−1

nY T (HnKHn + γ I )−1 HnK1n − t] (34)

where sgn is the sign function which makes the value binary.The whole approach for learning the hash function H is

concluded in Algorithm 1.

IV. EXPERIMENTS

We compared our proposed Spline Regression Hash-ing approach with four state-of-the-art hashing meth-ods, LSH, RBMs, STH and SpH on four real imagedatasets.

A. Datasets and Experiment Setup

To evaluate the effectiveness of our proposed SRH method,we performed experiments on four real image datasets.The dataset and their corresponding features we selectedare also widely adopted in the evaluation of other hash-ing based methods. The details of the datasets are listedas follows:

1) Caltech-101 [33]. The Caltech-101 dataset is a standardbenchmark for object recognition, which contains 10,213images belonging to 101 object categories.

2) CIFAR-10 [34]. The CIFAR-10 dataset consists of60, 000 32 × 32 tiny color images in 10 classes, with6000 images per class.

3) Photo Tourism Patch [35]. The Photo Tourism Patchdataset includes 100, 000 32×32 patches extracted from

Trevi pictures from Photo Tourism Dataset, which iswidely used as benchmark for evaluating the hashingbased retrieval methods.

4) ANN-GIST 1M [36]. The ANN-GIST 1M dataset is alarge scale dataset to evaluate the quality of approxi-mate nearest neighbors search. In this dataset, 1 millionimages are collected into the base set and 1, 000 imagesare used as testing set.

Only ANN-GIST 1M dataset provides images along withthe extracted features, i.e., 960 dimensional gist feature[37]. We need to extract the features ourselves for theother three datasets. For Caltech-101 dataset, because it isused for object recognition, we extracted the local featureSIFT descriptors [38] from the images and then clusteredthem into 1000 dimensional bag-of-word codewords. BothCIFAR-10 and Photo Tourism Patch are tiny image datasets.We therefore used 960 dimensional gist feature on thesetwo datasets which characterizes the global pattern of theimages.

In our experiment, all images were splitted into the baseset and the testing set for further evaluation. For CIFAR-10dataset, we used the default splitting [34] in which 50, 000images were splitted into base set while other 10000 imageswere splitted into testing set. For ANN-GIST 1M dataset, wealso used the splitting provide in [36] with 1M images in thebase set and 1000 image in the testing set. In Caltech-101 andPhotoTourism datasets, 90% images were randomly splittedinto base set and the other 10% images were splitted into thetesting set.

B. Algorithms and Parameter Setting

We firstly trained the hash functions using our SRHapproach from the training set. For Caltech-101 and CIFAR-10

Page 8: Spline Regression Hashing for Fast Image Search

LIU et al.: SPLINE REGRESSION HASHING FOR FAST IMAGE SEARCH 4487

Algorithm 1 Spline Regression Hashing

1: Input: Training images: X = [x1, x2, . . . , xn] ⊂ Rn×m

2: For each image xi in X :3: Construct local clique Ni by adding xi with its k − 1

nearest neighbors.4: Construct matrix Ki using Green’s function Gi defined

on Ni according to Eq. (8).

5: Construct matrix A =[

Ki PPT 0

].

6: Construct matrix Mi which is the up left k×k submatrixof the matrix A−1.

7: End

8: Ll ←[

S1 . . . Sn]⎡⎢⎣

M1 . . . 0...

. . ....

0 . . . Mn

⎤⎥⎦

⎡⎢⎣

S1...

Sn

⎤⎥⎦ where matrix Si

is defined in Eq. (19).9: Construct kernel matrix K where K(i, j) = φ(xi )

T φ(x j )10: Lg ← Hn(HnKHn + γ I )−1 Hn, where Hn = I − 1

n 1n1Tn .

11: Obtain Y by solving the smallest l eigenvectors of matrixLl + μLg .

12: Obtain the final hash function H (x) according to Eq. (34).13: Output: Hash codes of training data Y and hash function

H .

datasets, 30% images in the base set were treated as trainingset while for Photo Tourism Patch datasets and ANN-GIST1M dataset, as the base sets are large, we random sampled10000 images from the base sets as training sets. During thetraining phase, we used cosine kernel for Caltech-101 datasetand Radial basis function (rbf) kernel for other three datasetsto compute the global kernel matrix K. Both parameter μand λ were set to 10−6 while k and d were set to 10 and 5respectively.

After the binary codewords for both base and testing setwere generated, we used the codewords of image in thetesting set as a query to retrieve images in the training setwithin a specified Hamming distance varies from 2 to 64. Wefirstly drew the Precision-Recall [39] curves to measure theperformance which are defined as follows:

recall = Number of retrieved relevant images

Total number of all relevant images(35)

precision = Number of relevant retrieved images

Total number of retrieved images(36)

F1 score [39] was then obtained by calculating the harmonicmean of precision and recall:

F1 = 2 · precicion · recall

precision+ recall(37)

We compared the performance of Spline Regression Hash-ing (SRH) with four state-of-the-art methods listed below:

1) Locality Sensitive Hashing (LSH) [6] and min-Hash [7].2) Restricted Boltzmann Machines (RBM) [12].3) Spectral Hashing (SpH) [11].4) Self-Taught Hashing (STH) [14].

TABLE I

COMPARISON OF THE F1 SCORE OF SRH AND MIN-HASH AT

DIFFERENT CODE LENGTHS

Nearest neighbors as ground truth

Code length 40 80 160 320

SRH 0.2588 0.3560 0.4329 0.5440

min-Hash 0.0202 0.1054 0.1314 0.1816

Class information as ground truth

Code length 40 80 160 320

SRH 0.1663 0.1812 0.1866 0.2010

min-Hash 0.0755 0.1245 0.1464 0.1823

C. Experimental Results

To determine whether the retrieved images are relevant tothe query, we followed the convention of previous hashingpapers and adopted the nearest neighbors of query imagesin original vector space as ground truth. Figure 3 shows theprecision-recall curve of different hashing methods for alldatasets. In this experiment, each feature vector was com-pressed into 128 bit hash codes and the Hamming radius (themaximum Hamming distance between any query image andretrieved image) was varied from 2 to 48. The experimentalresults indicate that our method outperforms the other com-petitive methods on all datasets.

Besides, Caltech-101 and CIFAR-10 datasets also providethe class (or category) information of each image. We thenconsidered the images in the same class (or category) asground truth for evaluating relevant images. We used the sameexperiment setting as the former one and the experimentalresults reported in Figure 4 show that our method also yieldsthe best performance when considering class information asground truth. The experiments on two evaluation methodolo-gies also prove that our method is suitable and flexible fordifferent aims of retrieval task.

We also compared the F1 score on caltech-101 datasetwith min-Hash [7], [23] and reported the experiment resultsin Table I. We can see that our algorithm outperforms min-hashing in terms of F1 score. When using the binary codesgenerated by our SRH algorithm, we can employ bitwiseXOR operation to search k-nearest neighbors. Unlike SRH, theoutput of each min-hash function is an integer value between1 and 1000. Thus another advantage of our method is thatbinary codes generated by our algorithm are more compactand image retrieval is much more efficient. There are fiveimportant parameters to be tuned in our method, namely k,d , λ, μ and l. Taking CIFAR-10 dataset as an example, wetest different parameter settings in our SRH method to see theperformance variation.

Firstly, we set the parameter μ and γ at different valuesranging from 10−6 to 106 and computed the F1 score ondifferent parameter settings. Figure 5 shows the F1 score ofdifferent parameter settings w.r.t. λ and μ. We can see that theperformance of our method is sensitive to the setting of the λand μ. Generally speaking, in order to obtain a better perfor-mance, λ and μ should be set to small values less than 10−3.

Page 9: Spline Regression Hashing for Fast Image Search

4488 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 10, OCTOBER 2012

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Pr

ecis

ion

SRH

LSH

RBM

SpH

STH

(a)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Prec

isio

n

SRH

LSH

RBM

SpH

STH

(b)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Recall

Prec

isio

n

SRH

LSH

RBM

SpH

STH

(c)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Recall

Prec

isio

n

SRH

LSH

RBM

SpH

STH

(d)

Recall Recall

Fig. 3. Comparison of precision-recall curve on four datasets using nearest neighbors as ground truth. (a) Caltech-101. (b) CIFAR-10. (c) Photo tourismpatch. (d) ANN-GIST1M.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Prec

isio

n

SRH

LSH

RBM

SpH

STH

(a)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Prec

isio

n

SRH

LSH

RBM

SpH

STH

(b)Recall Recall

Fig. 4. Comparison of precision-recall curve on two datasets using class information as ground truth. (a) Caltech-101. (b) CIFAR-10.

Furthermore, we set the parameter k which corre-sponds to the number of local nearest neighbors at values5, 10, 15, 20, 25, and the local dimensionality parameter d was

set at 2, 5, 10, 15, 20 (d < k). By fixing μ = 10−6 andλ = 10−6, we show the performance variations on differentk and d in Figure 6. The figure shows that the experiment

Page 10: Spline Regression Hashing for Fast Image Search

LIU et al.: SPLINE REGRESSION HASHING FOR FAST IMAGE SEARCH 4489

10^−610^−3

110^3

10^6

10^−6

10^−3

1

10^3

10^6

0

0.1

0.2

0.3

0.4

λ

μ

Fig. 5. Comparison of F1 score at different μ and λ settings on CIFAR-10dataset.

510

1520

15

2

5

10

15

20

0

0.1

0.2

0.3

0.4

k

d

Fig. 6. Comparison of F1 score at different k and d settings on CIFAR-10dataset.

results are not very sensitive to k and d when k is set to asmall value.

We obtained an empirical setting of parameters forCIFAR-10 dataset from the experiments. Yet, the perfor-mance is also data dependent. For different datasets, wecan use cross validation to determine the best parametersettings.

In addition, we evaluated the effectiveness of hashing atdifferent code length l on CIFAR dataset. For each hashingmethod, we selected the Hamming radius at which the best F1score can yield. Figure 7 shows the experiment results with thecode length from 32 to 128. From the figure we can see thatas the code length increases, the performance of SRH growsfaster than other methods and yields the best performance with128 bit codes.

32 48 64 80 96 112 1280.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

code length

F1 m

easu

re

LGR

LSH

RBM

SpH

STH

Fig. 7. Comparison of F1 score at different code lengths on CIFAR-10dataset.

Finally, in Figure 8, we show a retrieval example on CIFAR-10 dataset. It can be seen that the retrieval results of ourmethods is more close to the ground-truth results than thatof other methods.

The superior performance of our SRH method once againconforms its effectiveness. As already discussed, the effective-ness of our method comes from three aspects. The exploitationof local spline function better captures the local structureof data. The combination of local and global similaritystructure reduces over-fitting and makes the learned hashcode more robust. Finally, the out-of-sample extension strat-egy in SRH which simultaneously optimizes the hash codeand hash function is more effective than that of traditionalapproaches.

D. Computational Complexity Analysis

Now we discuss the computational complexity. Themain computation in SRH includes three steps: (1) find-ing the k nearest neighbors for each data point and con-structing the global kernel matrix K; (2) calculating thematrix Mi defined in Eq. (18) with the time complex-ity of O((k + d)3); (3) computing the optimal Y inEq. (30) and H (x) in Eq. (34) with the time complexityO(n3). Moreover, for large scale training set, approxima-tion techniques can be further used to speed up the kernelalgorithms

Once the hash functions have been trained, computing thehash codes for novel inputs has a linear time complexitywhich is very fast. Table II shows the experiment resultwith 128 bit codes on four datasets. From the table wecan see that the nearest neighbors search in Hamming spaceis very efficient. Even in dataset with one million images,the retrieval result can be obtained within one second. Allof our experiments were conducted on a Linux worksta-tion with Intel 2.67GHz Xeon CPU and 32GB RAM usingMatlab 7.10.

Page 11: Spline Regression Hashing for Fast Image Search

4490 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 10, OCTOBER 2012

query groundtruth SRH SpH LSH STH RBM

Fig. 8. Examples of retrieval results on CIFAR-10 datasets. We select images from the testing image set as queries, and use the 25 nearest neighbor imagesin the base image set as the ground-truth.

TABLE II

SEARCH TIME PER IMAGE ON DIFFERENT DATASETS

WITH 128-BIT CODES

Dataset Base size Search time (seconds)

Caltech-101 8223 0.0084

CIFAR-10 60000 0.0469

Photo Tourism Patch 96 064 0.0966

ANN-GIST 1M 1 000 000 0.9823

V. CONCLUSION

In this paper, we proposed a novel framework of learningefficient hash codes for fast image search. The proposedmethod combines both the local and global similarities andexploits spline to precisely preserve the local manifold struc-ture of data. Moreover, the out-of-sample problem was wellsolved in our methods by simultaneously learning the hashcodes of training data and the hash function for unseen data.Extensive experiments on four real image datasets showed thesuperior performance of our proposed method over existingstate-of-the-art techniques.

It has been demonstrated that the data labels can greatlyfacilitate the learning of hash function, especially for imageswhose semantic similarity is usually given in terms of theirlabels or tags. In the future, we tend to explore label datainto our method and develop a supervised or semi-supervisedhashing framework.

REFERENCES

[1] Y. Rui, T. S. Huang, and S. F. Chang, “Image retrieval: Currenttechniques, promising directions and open issues,” J. Visual Commun.Image Represent., vol. 10, no. 1, pp. 39–62, 1999.

[2] Y. Gao, M. Wang, Z.-J. Zha, Q. Tian, Q. Dai, and N. Zhang, “Less ismore: Efficient 3-D object retrieval with query view selection,” IEEETrans. Multimedia, vol. 13, no. 5, pp. 1007–1018, Oct. 2011.

[3] Y. Yang, F. Nie, D. Xu, J. Luo, Y. Zhuang, and Y. Pan, “A multimediaretrieval framework based on semi-supervised ranking and relevancefeedback,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 4, pp.723–742, Apr. 2012.

[4] J. L. Bentley, “Multidimensional binary search trees used for associativesearching,” Commun. ACM, vol. 18, no. 9, pp. 509–517, 1975.

[5] A. Gionis, P. Indyk, and R. Motwani, “Similarity search in highdimensions via hashing,” in Proc. 25th Int. Conf. Very Large Data Bases,1999, pp. 518–529.

[6] M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni, “Locality-sensitivehashing scheme based on p-stable distributions,” in Proc. Symp. Comput.Geometry, 2004, pp. 253–262.

[7] O. Chum, J. Philbin, and A. Zisserman, “Near duplicate image detection:Min-hash and TF-IDF weighting,” in Proc. British Mach. Vis. Conf.,2008, pp. 1–10.

[8] B. Kulis and K. Grauman, “Kernelized locality-sensitive hashing forscalable image search,” in Proc. Int. Conf. Comput. Vis., 2009, pp. 2130–2137.

[9] Y. Tao, K. Yi, C. Sheng, and P. Kalnis, “Quality and efficiency in highdimensional nearest neighbor search,” in Proc. Int. Conf. Manage. Data,2009, pp. 563–576.

[10] X. He, D. Cai, H. Liu, and W.-Y. Ma, “Locality preserving indexing fordocument representation,” in Proc. SIGIR, 2004, pp. 96–103.

[11] Y. Weiss, A. Torralba, and R. Fergus, “Spectral hashing,” in Proc. NeuralInf. Process. Syst., 2008, pp. 1753–1760.

[12] R. Salakhutdinov and G. E. Hinton, “Semantic hashing,” Int. J. Approx.Reason., vol. 50, no. 7, pp. 969–978, 2009.

[13] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality ofdata with neural networks,” Science, vol. 313, no. 5786, pp. 504–507,Jul. 2006.

Page 12: Spline Regression Hashing for Fast Image Search

LIU et al.: SPLINE REGRESSION HASHING FOR FAST IMAGE SEARCH 4491

[14] D. Zhang, J. Wang, D. Cai, and J. Lu, “Self-taught hashing for fastsimilarity search,” in Proc. SIGIR, 2010, pp. 18–25.

[15] J. Song, Y. Yang, Z. Huang, H. Shen, and R. Hong, “Multiple featurehashing for real-time large scale near-duplicate video retrieval,” in Proc.ACM Multimedia, 2011, pp. 423–432.

[16] J. Shao, F. Wu, C. Ouyang, and X. Zhang, “Sparse spectral hashing,”Pattern Recognit. Lett., vol. 33, no. 3, pp. 271–277, 2012.

[17] Y. Zhuang, Y. Liu, F. Wu, Y. Zhang, and J. Shao, “Hypergraph spectralhashing for similarity search of social image,” in Proc. ACM Multimedia,2011, pp. 1457–1460.

[18] S. Berchtold, B. Ertl, D. A. Keim, H. P. Kriegel, and T. Seidl, “Fastnearest neighbor search in high-dimensional space,” in Proc. Int. Conf.Data Eng., 1998, pp. 209–218.

[19] Y. Yang, Y. Zhuang, F. Wu, and Y. Pan, “Harmonizing hierarchicalmanifolds for multimedia document semantics understanding and cross-media retrieval,” IEEE Trans. Multimedia, vol. 10, no. 3, pp. 437–446,Apr. 2008.

[20] Y. Yang, F. Nie, S. Xiang, Y. Zhuang, and W. Wang, “Local andglobal regressive mapping for manifold learning with out-of-sampleextrapolation,” in Proc. Nat. Conf. Artif. Intell., 2010, pp. 649–654.

[21] R. Adams, Sobolev Spaces. New York: Academic, 1975.[22] M. Charikar, “Similarity estimation techniques from rounding algo-

rithms,” in Proc. STOC, 2002, pp. 380–388.[23] Y. Zhang, Z. Jia, and T. Chen, “Image retrieval with geometry-preserving

visual phrases,” in Proc. Conf. Comput. Vis. Pattern Recognit., 2011, pp.809–816.

[24] F. Wu, Y. Han, X. Liu, J. Shao, Y. Zhuang, and Z. Zhang, “Theheterogeneous feature selection with structural sparsity for multimediaannotation and hashing: A survey,” Int. J. Multimedia Inf. Retrieval,vol. 1, no. 1, pp. 1–13, 2012.

[25] F. Wu, Y. Han, Q. Tian, and Y. Zhuang, “Multi-label boosting for imageannotation by structural grouping sparsity,” in Proc. ACM Multimedia,2010, pp. 15–24.

[26] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction bylocally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326,2000.

[27] Z. Zhang and H. Zha, “Principal manifolds and nonlinear dimensionalityreduction via tangent space alignment,” SIAM J. Sci. Comput., vol. 26,no. 1, pp. 313–338, 2004.

[28] S. Xiang, F. Nie, and C. Zhang, “Semi-supervised classification via localspline regression,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32,no. 11, pp. 2039–2053, Nov. 2010.

[29] J. Duchon, “Splines minimizing rotation-invariant semi-norms inSobolev spaces,” in Constructive Theory of Functions of Several Vari-ables. New York: Springer-Verlag, 1977.

[30] Y. Yang, D. Xu, F. Nie, S. Yan, and Y. Zhuang, “Image clustering usinglocal discriminant models and global integration,” IEEE Trans. ImageProcess., vol. 19, no. 10, pp. 2761–2773, Oct. 2010.

[31] J. Yoon, “Spectral approximation orders of radial basis function inter-polation on the Sobolev space,” SIAM J. Math. Anal., vol. 33, no. 4, pp.346–958, 2001.

[32] G. Wahba, Spline Models for Observational Data. Philadelphia, PA:SIAM, 1990.

[33] F. F. Li, R. Fergus, and P. Perona, “Learning generative visual modelsfrom few training examples: An incremental Bayesian approach testedon 101 object categories,” Comput. Vis. Image Understand., vol. 106,no. 1, pp. 59–70, 2007.

[34] A. Krizhevsky, “Learning multiple layers of features from tiny images,”Dept. Comput. Sci., Univ. Toronto, Toronto, ON, Canada, 2009.

[35] S. A. J. Winder and M. Brown, “Learning local image descriptors,” inProc. Conf. Comput. Vis. Pattern Recognit., 2007, pp. 1–8.

[36] H. Jégou, M. Douze, and C. Schmid, “Product quantization for nearestneighbor search,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 1,pp. 117–128, Jan. 2011.

[37] C. Siagian and L. Itti, “Rapid biologically-inspired scene classificationusing features shared with visual attention,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 29, no. 2, pp. 300–312, Feb. 2007.

[38] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004.

[39] P. R. C. D. Manning and H. Schuetze, Introduction to InformationRetrieval. Cambridge, U.K.: Cambridge Univ. Press, 2008.

Yang Liu received the B.Sc. degree from theSoftware College, Zhejiang University, Hangzhou,China, in 2008, where he is currently pursuing thePh.D. degree with the College of Computer Scienceand Technology.

His current research interests include multimediaretrieval, data mining, and social network analyses.

Fei Wu received the B.Sc. degree from LanzhouUniversity, Lanzhou, China, in 1996, the M.Sc.degree from the University of Macau, Macau, China,in 1999, and the Ph.D. degree from Zhejiang Uni-versity, Hangzhou, China, in 2002, all in computerscience.

He is currently a Full Professor with the Collegeof Computer Science and Technology, Zhejiang Uni-versity. He was a Visiting Scholar with Prof. B. Yu’sGroup, University of California, Berkeley, from2009 to 2010. His current research interests include

multimedia retrieval, sparse representation, and machine learning.

Yi Yang received the Ph.D. degree in computerscience from Zhejiang University, Hangzhou, China,in 2010.

He is currently a Post-Doctoral Research Fellowwith the School of Computer Science, CarnegieMellon University, Pittsburgh, PA. He was a Post-Doctoral Research Fellow with the University ofQueensland, Australia, from 2010 to 2011. Hethen joined Carnegie Mellon University. His cur-rent research interests include machine learning andits applications to multimedia content analyses and

computer visions, particularly, multimedia indexing.

Yueting Zhuang received the B.Sc., M.Sc., andPh.D. degrees in computer science from ZhejiangUniversity, Hangzhou, China, in 1986, 1989, and1998, respectively.

He is currently a Full Professor and the Dean ofthe College of Computer Science and Technology,Zhejiang University. He was a Visiting Scholar withProf. Thomas Huang’s Group, University of Illinoisat Urbana-Champaign, Urbana, from 1997 to 1998.His current research interests include artificial intel-ligence, multimedia retrieval, computer animation,

and digital libraries.

Alexander G. Hauptmann received the B.A. andM.A. degrees in psychology from Johns HopkinsUniversity, Baltimore, MD, the Diplom in Germanydegree in computer science from the TechnischeUniversit at Berlin, Berlin, Germany, in 1984, andthe Ph.D. degree in computer science from CarnegieMellon University (CMU), Pittsburgh, PA, in 1991.

He is currently with the Faculty of the Departmentof Computer Science and the Language Technolo-gies Institute, CMU. From 1984 to 1994, he was withthe Informedia project for digital video analysis and

retrieval, and led the development and evaluation of news-on-demand applica-tions, where he was involved in research on speech and machine translation.His current research interests include man-machine communication, naturallanguage processing, speech understanding and syntheses, video analyses, andmachine learning.


Recommended