Query Refinement

8/3/2019 Query Refinement

1/4

Query Refinement for Multimedia Similarity Retrieval in MARS *Kriengkrai Porkaew Kaushik ChakrabartiDepartment of Computer Science Department of Computer ScienceUniversity of Illinois at Urbana-Champaign University of Illinois at Urbana-Champaign

Sharad MehrotraInformation and Computer Science DepartmentUniversity of California at Irvine

1 introductionAdvances in image processing, database management,and information retrieval has resulted in content-basedmultimedia retrieval to emerge as an important area ofresearch. Typical content-based retrieval systems allowusers to specify queries by providing examples of objectssimilar to the ones they wish to retrieve. Due to the sub-jective nature of retrieval, it is unlikely that the answersto the starting query will satisfy the users informationneed. Rather, among answers retrieved, the user mayfind one or more objects that are closer to what she hasin mind compared to the original examples.In the Multimedia Analysis and Retrieval System(MARS), we have explored query refinement techniquesto modify the query based on the relevance feedbackof the user on the retrieved objec ts. Query refinementin MARS consists of query reweighting (QR) and querymodification (QM) techniques. QR learns the users no-tion of similarity between ob jects and adjus ts the weightsof different components of the query. It has been studiedin [5, 41. QM, on the other hand, uses the feedback in-formation to change the query representation to bettersuit the users information need. In [5, 21 query pointmovement (&PM) approach to QM is explored in whicha query is represented by a single point in each featurespace. At each iteration, the query point is moved to thecentroid of the points marked relevant by the user. Inthis paper, we study a different approach to QM based onquery expansion (QEX) which, at each iteration, uses aclustering technique to identify a set of (one or more ) ob-jects to be added to the query representation. We studyefficient query processing techniques to implement theQEX approach as well as efficient techniques to executerefined queries for both QEX and &PM models. Our

This work was supported by NSF awards IIS-9734300, and CDA-9624396; in part by the Army Research Laboratory under CooperativeAgreement No. DAALOl-96-2-0003.Permtssion to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided thatcopies are not made or distributed for profit or commercial advant-age and that copies bear this notice and the full citation on the first page.To copy otherwise, to republish, to post on servers or toredistribute to lists, requires prior specific permission and/or a fee.ACM Multimedia 99 10199 Orlando, FL, USA0 1999 ACM l-58113-151~8/99/0010...$5.00

experimental results show that query expansion signifi-cantly outperforms query point movement both in termsof retrieval effectiveness and execution cost for all visualfeatures used in MARS.2 Content-Based Multimedia Retrieval in MARSA multimedia object is represented as a collection of ex-tracted features that describe its perceptual propertiessuch as color, texture, and shape. Each feature can beviewed as a multidimensional space. A feature vector ofan image is a point in the multidimensional space of thatfeature. A metric distance between the points is used todefine the dissimilarity between the corresponding fea-ture vectors.A query is also represented as a collection of features.A user may use multiple objec ts as a query. In this case,we represent a query as a collection of features. Eachfeature consists of multiple instances. Each instancecorresponds to the feature vector of each object in thequery. Each component of a query is associated with aweight indicating the relative importance of that compo-nent compared to other components in the query. Simi-larity between the query and the object is computed asfollows. Let & be a query node and Fi, . . . ,F, (the fea-ture nodes) be the children of Q. Let wi be the weights ofthe feature nodes. Let Ril, . . . ,Ri, (the object nodes)be the children of a feature node F i. Let wij be theweights of the objec t nodes. Let Simij be the similarityof an object 0 with an object node Rij based on the itfeature. The similarity Simi of 0 to Q with respect tofeature F; is defined as Simi = cy=, WijSimij whereCy=, wij = 1 and th e overall similarity Sim of 0 to Qare defined as Sim = CyEl wiSimi where Cy=i wi = 1.3 Query Refinement in MARSWhen a user submits an initial query, the system re-turns a ranked list of answers in the decreasing order ofsimilarity to the query. Only the top few answers arereturned. Subsequently, the user marks the answers sheconsiders relevant to the query and submits her feedbackto the system. The query refinement models exploit themultiple levels of relevance input by the user to improve

235


2/4

-11 1(a) Query Point Movement (b) Query Expansion

Figure 1: Query Refinement Approaches: the figureshows iso-similarity contours in a given feature space.

the query in subsequent iterations. MARS refines thequery in two ways: query rezueighting (QR) and querymodification (QM) as described in Section 1. Both re-finement mechanisms are combined seamlessly. In thispaper, we concentrate only on query modification andthe techniques developed can be easily integrated withquery reweighting techniques discussed in [4, 51. MARSsupports two query modification approaches as describedbelow.Query Point Movement (QPM): &PM allows only asingle object per feature as the query. When the useruses multiple examples to construct the query, the cen-troid is used as the single point query. Similarly, at eachiteration of relevance feedback, when a user marks sev-eral objec ts as relevant, the weighted centroid of the fea-ture value of the relevant objec ts for each feature spaceis used as the refined query. The weights used are ob-tained from the level of relevance provided by the user.Let El ,... ,Ei ,... , E, denote the n objects marked rel-evant by the user. For simplicity, we assume that theEis also denote the corresponding feature vectors. LetWI,... ,wj )... ,w, be the corresponding levels of rele-vance. Let Ei[j] denotes the value of point E; along thejth dimension of the feature space, 1 5 j 5 m, m beingthe dimensionality of the feature space. The weightedcentroid C is defined as: C[j] = e.

Besides changing the location of the query po int inthe feature space corresponding to each feature Fi) QPMadjusts the distance function used to compute the dis-tance of the query from the objec ts in Fis feature space.This is achieved by associating weights with each dimen-sion of t,he feature space of Fi. The weights assigned areinversely proportional to the standard deviation of fea-ture values of the relevant objec ts along that dimens ion.Intuitively, among the relevant objects, the higher thevariance along a dimension, the lower the significance ofthat dimension [5]. Figure 1 (a) shows how QPM re-fines the query. The figure shows contours representingequidistant ranges from the new query.

Query Expansion (QEX): QEX allows multiple objec tsper feature as a query. Such queries are referred to asmultipoint queries. When the user marks several pointsas relevant, a small number of good representative pointsare selected to construct the multipoint query. For thispurpose, we cluster the set of relevant points and choosethe centroid of the clusters as the representatives. Thedetails of the clustering algorithm used can be found in[3]. Note that in constructing the query, objects deem edrelevant during previous iterations are also incorporatedinto the clusters. Implicitly, relevant po ints get addedwhile the non-relevant ones get dropped as we move fromone iteration to the next.

We do not directly use the non-relevant answers toguide the search away from non-relevant answers sinceour experience shows no significant improvement by do-ing so. Intuitively, the reason is that fea ture based repre-sentation does not fully capture the visual perception oa user. For example, two objects may have similar colorhistograms but maybe visually very different from theusers perspective. Therefore, directing a search awayfrom non-relevant points in a feature space may actuallycause the refined query to move away from the optimalrepresentation in that feature space.The distance of an objec t from a multipoint queryis defined as the weighted combination of the individualdistances from the representatives in the query, wherethe weight associated with a representative in a multi-point is proportional to the number of of relevant objectsin its cluster. Figure 1 (b) shows the distance function formultipoint queries. The dashed lines are contours repre-senting equidistant ranges for each of the representativeswhile the solid lines are contours representing equidistantranges from the entire multipoint query.Comparison: In each iteration of relevance feedback,QPM moves the query point and reweighs each dimen-sion of the feature space to reduce the distances betweenthe relevant points. This changes the distance functionin a limited fashion. Although the weights are modi-fied, the function for computing the distance is notchanged. On the other hand, QEX does not change thedistance function for each query point individually. Butby adding relevant points and removing irrelevant onesQEX implicitly changes the distance function from themultipoint query as a whole as shown in Figure 1 (b).Furthermore, while QEX captures local clusters amongrelevant points, &PM ignores these clusters and treatsall relevant points equivalently. Our experiments showthat these differences have a significant impact on theretrieval effectiveness. In terms of execution cost of thequery, QEX may appear to be more expensive since iinvolves evaluation of queries consisting of multiple objects. In Section 4, we discuss evaluation techniques forsuch queries. With the developed techniques, QEX imore efficient compared to &PM in terms of execution

236


3/4

cost as well.4 Query Evaluation TechniquesMultimedia Feature Indexing: We index each individ-ual feature space using a multidimensional index struc-ture, referred to as the Feature Index or F-Zndex. Similar-ity queries then correspond to k-nearest neighbor search(k-NN) on the F-Index. The goal of the F-index is toorganize the feature vectors on disk so as to minimizethe average number of disk accesses required to executesimilarity queries on that feature. In addition to the I/Ocost, the F-index also reduces the average CPU cost ofa query since fewer node accesses implies fewer distancecomputations required. Even though the mechanism de-scribed below can be used in combination with any F-Index, we choose the hybrid tree [l] which is particularlysuited as the F-Index.The similarity query is executed using a k-NN algo-rithm defined as follows. The algorithm starts at the rootnode and accesses the index nodes in increasing order oftheir distances from the query. The distances are com-puted by calling back the distance function providedby the application. A priority queue is used to imple-ment the ordered traversal over the index structure. Ateach step, the node with the minimum distance from thequery is popped from the queue and its children are ex-plored. The distance between the query and each childis computed and each child along with its distance fromthe query is pushed back in the queue. This algorithmguarantees that minimum number of nodes need to bevisited. To use this algorithm, we need to define thedistance function between the query and an index node.The following section defines the distance between themultipoint query and an index node.Multiple Point Queries: The query expansion modelrequires support for similarity queries consisting of mul-tiple query points. We refer to such queries as multipointqueries. A multipoint query is defined as follows.Definition 1 (Multipoint Query) A multipoint queryM = (n,?, W, V) consists of a set of points P ={PI, . , P,}, a set of weights W = (2~1, . . . , w,} anda distance function V that given two points, returns thedistance between them. n is called the size of the multi-point query. We assume that the weights are normalized;i.e., Cyz1 wi = 1. The distance of M from CI point P isdefined as D(M, P) = Cy=, wi2)(Pi, P).

Note that the multipoint query is a generalization ofthe single point query i.e. the latter is a special case ofthe former with size equal to 1.Definition 2 (Similarity Mdtipoint Query) Asimilarity multipoint query & on the feature databaseDB is dejned by the following association: Q = (M, k)where M is the multipoint query and k is the number ofobjects to be retrieved. Q returns the set NN$ c DL?

of k objects such that: Vo E NN;, Vo E DB - NN4,D(M, o) 2 D(M, 0).

The distance between a multipoint query and an in-dex node of an F-index is defined as follows.Definition 3 (MINDIST) Given the bounding boxRN = (L, H) of a node N, where L = (11,12,. . , lm)and H = (h,, hz, . . . , hm) are the two endpoints of themajor diagonal of R, li 5 h; for 1 5 i 5 m. The nearestpoint NP( P;, N) in RN (including the surface) to eachpoint Pi in the multipoint query M is defined as follows.

lj if Pi[j] < ljNP(Pi, N)[j] = hj if Pi[j] > hj

0 otherwisewhere P[j] denotes the value of point P along theth3 dimension of the feature space, 1 < j 5 m.-INDIST(M, N) is defined as:

MINDIST(M, N) = kwiD(Pi, NP(Pi, N))i=l

For further details on the proof of correctness of mul-tipoint query and the k-NN a lgorithm, we refer readersto [3].Processing of Refined Queries: A naive approach ofprocessing a refined query is to treat it like a fresh queryand execute it from scratch. In each iteration of re-finement, the refined query is passed to the F-indexwhich invokes the k-NN algorithm and returns the de-sired number of answers to the application. This ap-proach is wasteful since the same nodes may be accessedrepeatedly by the k-NN algorithm iteration after iter-ation, leading to large number of unnecessary disk ac-cesses and hence poor performance. Our goal is to elim-inate any repeated disk access during the evaluation ofthe refined query. To achieve the goal, the MARS queryprocessor use the old priority queue to construct a newpriority queue based on the distance function of the re-fined query. Due to space limitations, we refer readersto [3] for further details of the approach.5 Empirical EvaluationFor our experiments, we use the Core1 Image Featuresdataset available online at http : //kdd. its . uci . edu.This collection contains features extracted from around70,000 photo images. In our experiments, we use thecolor histogram and co-occurrence texture features. Forfurther details of experimental setting, we refer readersto [3]. The experimental results shown below are aver-aged over a hundred queries.Our experiments show that the retrieval performancein term o f precision and recall improves from one itera-tion to the next for both QPM and QEX. Figure 2 showsthe precision-recall graph for QEX. Figure 3 compare the

237


4/4

final recall of each approach after retrieving 100 pointsfor each iteration. Both &PM and QEX start with asingle example as the query. They produce the same re-trieved set and hence the same recall for the initial query.Figure 3 shows that for QEX, the recall increases witheach iteration and significantly outperforms QPM.For QEX, we studied how the number of query pointsadded (i.e. number of clusters produced) affects retrievaleffectiveness of the approach. Our experiments showthat a small number of well-chosen representatives pro-vide a good approximation of the entire relevant set.

Figure 4 compares the evaluation technique proposedfor refined queries to the naive approach. For both QPMand QEX, the proposed technique outperforms the naiveapproach by several orders of magnitude. In addition toretrieval effectiveness, QEX performs better than &PMin terms of execution cost as well.

Figure 2 : Precision Recall Graph for Query Expansion

Figure 3: Comparison of retrieval effectiveness betweenthe two approaches6 ConclusionsIn this paper, we studied a query expansion (QEX) ap-proach to query refinement in multimediadatabases. InQEX, at each iterations of the feedback, for each featurespace, multiple well-chosen relevant po ints are addedto the query represen tation. This is in contrast to thepreviously proposed approach of query point movement(&PM) in which a query point in each feature space

Figure 4: Execution cost of refined queriesis replaced by a weighted centroid of relevant points.We developed efficient techniques to evaluate multipointqueries over feature indices that result in the QEX ap-proach. We also developed techniques to progressivelyevaluate refined queries efficiently by exploiting the workdone in previous iterations for both the QEX and QPMapproaches. Our experiments over a large image col-lection show that: (1) the query processing techniquesdeveloped very significantly improve the performance oboth QEX and QPM, and (2) the QEX approach out-performs the &PM approach both in terms of retrievaleffectiveness (precision and recall) as well as the costof query evaluation (which is somewhat counterintuitivegiven that QEX results in multiple queries per featurespace per iteration in contrast to QPM which results inonly a single query per feature space per iteration). Dueto space limitations, many details of the techniques de-veloped are missing from this paper. We refer interestedreaders to [3] for details.References[l]K. Ch k b tra ar i and S. Mehrotra. The hybrid tree: Anindex structure for high dimensional feature spaces. InICDE, 1999.[2]Y. Ishikawa, R. Subramanya, and C. Faloutsos. Min-dreader: Querying databases through multiple exam-

ples. In VLDB, 1998.[3]K. Porkaew, K. Chakrabarti, and S. Mehrotra. Queryrefinement for multimedia similarity retrieval in mars .

Technical Report TR-MARS-99-05, Univ. of Califor-nia at Irvine, 1999.[4]K. Porkaew, S. Mehrotra, and M. Ortega. Query re

formulation for content based multimedia retrieval inMARS. In IEEE ICMCS, 1999.[5]Y. Rui, T. S. Huang, M. Ortega, and S. Mehrotra. Rel-evance feedback: A power tool for interactive content-based image retrieval. IEEE Trans. on Circuits andVideo Tech., September 1998.

238

Date post:	06-Apr-2018
Category:	Documents
Upload:	lohija-ja
View:	230 times
Download:	0 times

Query Refinement

Documents