+ All Categories
Home > Documents > web science Presentation on the topic - Query Recommendation

web science Presentation on the topic - Query Recommendation

Date post: 31-Jan-2016
Category:
Upload: mora
View: 43 times
Download: 0 times
Share this document with a friend
Description:
Papers: Query Suggestions in the Absence of Query Logs DQR: A Probabilistic Approach to Diversified Query Recommendation. web science Presentation on the topic - Query Recommendation. Muhammad Nuruddin ITIS M. Sc. Student Leibniz Universitat Hannover Winter Semester 2012/13 - PowerPoint PPT Presentation
Popular Tags:
33
web science Presentation on the topic - Query Recommendation Muhammad Nuruddin ITIS M. Sc. Student Leibniz Universitat Hannover Winter Semester 2012/13 Matrikelnummer: 2961230 Papers: 1.Query Suggestions in the Absence of Query Logs 2.DQR: A Probabilistic Approach to Diversified Query Recommendation
Transcript
Page 1: web science Presentation on the topic - Query Recommendation

web science Presentation on the topic -Query Recommendation

Muhammad NuruddinITIS M. Sc. Student

Leibniz Universitat HannoverWinter Semester 2012/13Matrikelnummer: 2961230

Papers:

1.Query Suggestions in the Absence of Query Logs

2.DQR: A Probabilistic Approach to Diversified Query Recommendation

Page 2: web science Presentation on the topic - Query Recommendation

Query Suggestion/Recommendation

Assist users providing a

list of queries have been

proven to be effective.

Page 3: web science Presentation on the topic - Query Recommendation

Paper 1. Query Suggestions in the Absence of Query Logs

Page 4: web science Presentation on the topic - Query Recommendation

1. Query Suggestions in the Absence of Query Logs

Background: Most of the existing query suggestion works

based on query logs.Log based suggestion suitable for system with

large user base, large interactions, past usageNot suitable for system with smaller user base,

system without large log.Not suitable for newly deployed systems query

suggestion.Example: desktop search, personal email search.

Page 5: web science Presentation on the topic - Query Recommendation

1. Query Suggestions in the Absence of Query Logs

How to suggest query where users and query log are insufficient?

They proposed a document centric probabilistic metcanism.

Query phrases present in documents are suggested.

Index phrases from the document corpus suggested to complete the partial user query.

Page 6: web science Presentation on the topic - Query Recommendation

1. Query Suggestions in the Absence of Query Logs

Steps:

1. Phrase Extraction.- N-gram phrases of order 1,2 and 3 from the

document corpus.- Ex: “president of Germany”, “president of”,

“of Germany”, “president” , “Germany”.2. Query suggestion - following a probabilistic model

Page 7: web science Presentation on the topic - Query Recommendation

1. Query Suggestions in the Absence of Query Logs2. Query suggestion 1/9Probabilistic Model for Query Suggestion

Suppose a user typed an incomplete queryThe query can be decomposed as follows:

denotes completed portion of the query denotes the last word of that the user is still typing Example: Einsteins Rel…..

Page 8: web science Presentation on the topic - Query Recommendation

1. Query Suggestions in the Absence of Query Logs2. Query suggestion 2/9Probabilistic Model for Query SuggestionPi = phrase i ( N-gram ) from the Document corpus from

step 1 ( Phrase extraction from documents)

Using Bayes’ theorem if we calculate ( probability / suitability of Pi as a

suggested completion of query for ) Then we will be able to recommend m phrases of P which

have higher value of Pi = phrase i ( N-gram ) from the Document corpus from

step 1 ( Phrase extraction from documents)

Page 9: web science Presentation on the topic - Query Recommendation

1. Query Suggestions in the Absence of Query Logs2. Query suggestion 3/9Probabilistic Model for Query Suggestion

They derived the probability equation to:

P(pi|Qt) = Probability of phrase pi can be typed that he has already typed Qt ( phrase selection probability )

P(Qc|pi) = Correlation between phrase pi and already typed complete part ( Qc) of query

Albert Einsteins Rel….. =

Page 10: web science Presentation on the topic - Query Recommendation

1. Query Suggestions in the Absence of Query Logs2. Query suggestion 4/9Probabilistic Model for Query SuggestionExample:

P(pi|Qt) = Probability of phrase pi can be typed that he has already typed Qt ( phrase selection probability )

Bill Gate….. = Qc + Qt

Qt = Gate

P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates life”, “Bill Gates Foundation”, “India Gate Rice” …}

Page 11: web science Presentation on the topic - Query Recommendation

1. Query Suggestions in the Absence of Query Logs2. Query suggestion 5/9Probabilistic Model for Query SuggestionExample:

P(Qc|pi) = Correlation between phrase pi and already typed complete part ( Qc) of query

Bill Gate….. = Qc + Qt

Qc = Bill

P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates life”, “Bill Gates Foundation”, “India Gate Rice” …}

Page 12: web science Presentation on the topic - Query Recommendation

1. Query Suggestions in the Absence of Query Logs

P(pi|Qt) = Probability of phrase pi can be typed that he has already typed Qt ( phrase selection probability )

C=c1,c2 …. cm the set of m possible words for Qt

Bill Gate….. = Qc + Qt

Qt = Gate

C = { “Gates” , “Gate”, “Gateway”…}P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates life”, “Bill

Gates Foundation”, “India Gate Rice” …}

Page 13: web science Presentation on the topic - Query Recommendation

Bill Gate….. = Qc + Qt

Qt = Gate

C = { “Gates” , “Gate”, “Gateway”… Cm}

P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates life”, “Bill Gates Foundation”, “India Gate Rice” … Pn }

Page 14: web science Presentation on the topic - Query Recommendation

Bill Gate….. = Qc + Qt

Qt = Gate

C = { “Gates” , “Gate”, “Gateway”… Cm}

P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates life”, “Bill Gates Foundation”, “India Gate Rice” … Pn }

P(ci| Qt ) ~ freq( ci ), more used wordsIn the corpus have higher probability to be usefulFor query recommendation

Without IDF some rare but relevant words will be suppressed

Page 15: web science Presentation on the topic - Query Recommendation

1. Query Suggestions in the Absence of Query Logs2. Query suggestion 9/9Probabilistic Model for Query SuggestionExample:

Bill Gate….. = Qc + Qt

Qc = Bill

P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates life”, “Bill Gates Foundation”, “India Gate Rice” …}

Page 16: web science Presentation on the topic - Query Recommendation

1. Query Suggestions in the Absence of Query Logs

P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates life”, “Bill Gates Foundation”, “India Gate Rice” …pn}

Qc, Qt

PBill Gate….. := Qc = Bill, Qt = Gate

Pi

Document Corpus

Page 17: web science Presentation on the topic - Query Recommendation

References[1] Solr–Enterprise Search Platform,

http://lucene.apache.org/solr/.[2] R. Baeza-Yates, C. Hurtado, and M. Mendoza.

Query Recommendation Using Query Logs in Search Engines, volume 3268/2004 of Lecture Notes in Computer Science, pages 588–596. Springer Berlin / Heidelberg, November 2004.

[3] R. Baraglia, C. Castillo, D. Donato, F. M. Nardini, R. Perego, and F. Silvestri. Aging effects on query flow graphs for query suggestion. In CIKM ’09: Proceeding of the 18th ACM conference on Information and knowledge management, pages 1947–1950, 2009.

[4] M. Barouni-Ebarhimi and A. A. Ghorbani. A novel approach for frequent phrase mining in web search engine query streams. In CNSR ’07: Proceedings of the Fifth Annual Conference on Communication Networks and Services Research, pages 125–132, Washington, DC, USA, 2007. IEEE Computer Society.

[5] H. Bast and I. Weber. Type less, find more: Fast autocompletion search with a succinct index. In SIGIR’06, pages 364–371, 2006.

[6] H. Bast and I. Weber. The CompleteSearch Engine: Interactive, Efficient, and Towards IR& DB integration. In CIDR’07, pages 88–95, 2007.

[7] S. Bhatia and P. Mitra. Adopting inference networks for online thread retrieval. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, pages 1300–1305, Atlanta, Georgia, USA, July 11-15 2010.

[8] D. C. Blair and M. E. Maron. An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun. ACM, 28(3):289–299, 1985.

[9] P. Boldi, F. Bonchi, C. Castillo, D. Donato, and S. Vigna. Query suggestions using query-flow graphs. In WSCD ’09: Proceedings of the 2009 workshop on Web Search Click Data, pages 56–63, 2009.

[10] H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, and H. Li. Context-aware query suggestion by mining click-through and session data. In KDD’08, pages 875–883, 2008.

Page 18: web science Presentation on the topic - Query Recommendation

End of Discussion on

1. Query Suggestions in the Absence of Query Logs

Page 19: web science Presentation on the topic - Query Recommendation

2. DQR: A Probabilistic Approach to Diversified Query Recommendation

Page 20: web science Presentation on the topic - Query Recommendation

2. DQR: A Probabilistic Approach to Diversified Query Recommendation

• In this paper they proposed a query recommendation methodology for log based system

• Two components of their proposed system1. Query concept building (Concept Mining)

- Clustering the search logs.2. Recommending query from the concepts.

- Probabilistic model to select top m query concepts and selecting representative query of each concept.

Page 21: web science Presentation on the topic - Query Recommendation

A good quality recommender system should have 5 property:1. Relevancy: Recommended queries should be semantically relevant to the

user search query.2. Redundancy Free: The recommendation should not contain redundant

queries that repeat similar search intents.3. Diversity: The recommendation should cover search intents of different

interpretations of the keywords given in the input query.4. Ranking: Highly relevant queries should be ranked first ahead of less

relevant ones in the recommendation list.5. Efficiency: Query recommendation provides online helps. Therefore,

recommendation algorithms should achieve fast response times

They claimed that DQR is the first system to address all the 5 requirements

2. DQR: A Probabilistic Approach to Diversified Query Recommendation

Page 22: web science Presentation on the topic - Query Recommendation

2. DQR: A Probabilistic Approach to Diversified Query Recommendation

A click-through bipartite graph

Page 23: web science Presentation on the topic - Query Recommendation

2. DQR: A Probabilistic Approach to Diversified Query Recommendation

• Two components of their proposed system1. Query concept building (Concept Mining)

- Clustering the search logs.2. Recommending query from the concepts.

- Probabilistic model to select top m query concepts and selecting representative query of each concept.

Page 24: web science Presentation on the topic - Query Recommendation

2. DQR: A Probabilistic Approach to Diversified Query Recommendation

• number of queries in Q is huge• 10 million queries in the AOL dataset• Even picking, say, m = 10 recommended queries

from Q involves a huge search space.

Page 25: web science Presentation on the topic - Query Recommendation

2. DQR: A Probabilistic Approach to Diversified Query Recommendation

1. Query concept building (Concept Mining) 1/3

1.1) Concept Mining:- Similar queries are grouped to form query concept.- For this grouping each query is represented by a |D|- dimentional vector- User-frequency-inverse-query-frequency(UF-IQF) scores qi for dimensions dj UF IQF

Normalized weight

Similarity of query qi and qj

Nu(qi,dj) = No. of Unique users issued qi and clicking URL dj

Nq(dj) = No. of queries that lead to clicking URL dj

Page 26: web science Presentation on the topic - Query Recommendation

2. DQR: A Probabilistic Approach to Diversified Query Recommendation

1. Query concept building (Concept Mining) 2/3

1.1) Concept Mining:- K means clustering is not suitable, algorithm did not terminated for two days.- Instead a one pass algorithm is porposed- very efficient but highly sensitive to order

Example:Compactness:Average pairwise Distance in a cluster< 0.5q1,q2,q3 : C1 = {{q1,q2},{q3}} ; q2,q3,q1 : C1 = {{q1},{q2,q3}}

Page 27: web science Presentation on the topic - Query Recommendation

2. DQR: A Probabilistic Approach to Diversified Query Recommendation

1. Query concept building (Concept Mining) 3/31.1) Concept Mining:

Diameter measuer L(c) of cluster C

Page 28: web science Presentation on the topic - Query Recommendation

2. DQR: A Probabilistic Approach to Diversified Query Recommendation

2. Recommending query from the concepts.- Probabilistic model to select top m query concepts and selecting

representative query of each concept.

- A heuristic algorithm is applied to find a setof m query concepts such that is maximum. To construct Yc incrementally they applied greedy strategy.

- In the greedy approach, they added one more concept at a time until m. At each step it picks the concept

to maximize the probability increment:

where is input query, query concept belongs and the set of m query concepts

Page 29: web science Presentation on the topic - Query Recommendation

2. DQR: A Probabilistic Approach to Diversified Query Recommendation

2. Recommending query from the concepts.- Probabilistic model to select top m query concepts and

selecting representative query of each concept.

Page 30: web science Presentation on the topic - Query Recommendation

Selecting representative query of each concept.

- By popularity vote from the log- For concept C, its representative query is the

one that is issued by large no. of distinct user among all the queries in C

2. DQR: A Probabilistic Approach to Diversified Query Recommendation

Page 31: web science Presentation on the topic - Query Recommendation

2. DQR: A Probabilistic Approach to Diversified Query Recommendation

Result comparison from different approaches:

SR= Similarity based ranking. Finding similar query in past in log, ignores redundancyMMR = Maximal Marginal Relevance, Considers relevancy & diversity*,

ignores redundancyCACB = Context-Aware Concept-Based Method.

Based on search session, builds query concepts. Ignores diversity*DQR-ND = DQR with no Diversity. Same to DQR, ignores diversity.DQR-OPC = DQR with One Pass Clustering, Same to DQR, but uses

only one pass for clusteringDQR = Diversified Query Recommendation

*Diversity: The recommendation should cover search intents of different interpretations of the keywords given in the input query.

Top 10 queries recommended by the 6 methods for the input query “yahoo”

Page 32: web science Presentation on the topic - Query Recommendation

Rererences[1]

http://www.cs.hku.hk/research/techreps/document/TR-2012-06.pdf.

[2] R. A. Baeza-Yates, C. A. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In EDBT Workshops, 2004.

[3] R. Baraglia, C. Castillo, D. Donato, F. M. Nardini, R. Perego, and F. Silvestri. Aging effects on query flow graphs for query suggestion. In CIKM, 2009.

[4] D. Beeferman and A. L. Berger. Agglomerative clustering of a search engine query log. In KDD, 2000.

[5] C. J. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. N. Hullender. Learning to rank using gradient descent. In ICML, 2005.

[6] H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, and H. Li. Context-aware query suggestion by mining click-through and session data. In KDD, 2008.

[7] J. G. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR, 1998.

[8] P.-A. Chirita, C. S. Firan, and W. Nejdl. Personalized queryexpansion for the web. In SIGIR, 2007.[

9] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. JASIS, 41(6), 1990.

[10] H. Deng, I. King, and M. R. Lyu. Entropy-biased models for query representation on the click graph. In SIGIR, 2009.

[11] B. M. Fonseca, P. B. Golgher, B. Pôssas, B. A. Ribeiro-Neto, and N. Ziviani. Concept-based interactive query expansion. In CIKM, 2005.

[12] J. Guo, X. Cheng, G. Xu, and H. Shen. A structured approach to query recommendation with social annotation data. In CIKM, 2010.

[13] J. Guo, X. Cheng, G. Xu, and X. Zhu. Intent-aware query similarity. In CIKM, 2011.

[14] K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., 20(4), 2002.

[15] H. Ma, M. R. Lyu, and I. King. Diversifying query suggestion results. In AAAI, 2010.

[16] Q. Mei, D. Zhou, and K. W. Church. Query suggestion using hitting time. In CIKM, 2008. Torgeson. A picture of search. In Infoscale, 2006.

[18] M. Sanderson. Ambiguous queries: test collections need more sense. In SIGIR, 2008.[19] E. M. Voorhees. The TREC-8 question answering rack report. In TREC, 1999.[20] X. Wang and C. Zhai. Learn from web search logs to organize search results. In SIGIR, 2007.[21] J.-R. Wen, J.-Y. Nie, and H. Zhang. Clustering user queries of a search engine. In WWW, 2001.

Page 33: web science Presentation on the topic - Query Recommendation

End of the Presentation

Thank you very much for your attention!


Recommended