+ All Categories
Home > Documents > Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik...

Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik...

Date post: 14-Dec-2015
Category:
Upload: owen-glenn
View: 215 times
Download: 0 times
Share this document with a friend
24
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
Transcript
Page 1: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Ranking objects based on relationships

Computing Top-K over Aggregation

Sigmod 2006

Kaushik Chakrabarti et al.

Page 2: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Outline

• Motivation

• The framework and problem definition

• Proposed solution

• Discussions

• Experiments

Page 3: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Heating up discussion

• We basically know how web search engines work. – Having web crawlers collecting web-page

information, index and rank them.

• How do we define searching in a relational database– Free-style search v.s. SQL + predicates ?– What’s the expected outcome?– How do we rank results?

Page 4: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Motivation• Searching over a relational database

– information scattered in different relations

Page 5: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Motivation

• Full text search, aggregation already supported by RDBMS

– What else do we need in order to perform good searching?

Page 6: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Related work

• Information Retrieval (full text searching)• Researches in Text Databases• Explore database via foreign key-primary key

– DBExplorer (ICDE 2002)– BANKS (ICDE 2002)– DISCOVER (VLDB 2002)

• What are related work missing– Target objects don’t contain keywords– Lack of scoring function for query results– Not utilizing aggregates to put together search results

for multiple keywords

Page 7: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Contributions

• Introduce an interesting problem domain

• Define “Object Finder” (OF) queries

• Propose scoring functions

• Propose a solution to process OF query– Return top K ranked results– Efficient early termination property

Page 8: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

System Overview

Page 9: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Scoring functions

• Scoring Matrixes and row- column- marginal's

Page 10: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Scoring semantics

• All Query Keywords Present in each document– can be too restrictive

• All Query Keywords Present in Set of Related Documents– can not use MIN as row-marginal scoring

• Pseudo-document Approach:– enlarged searching space

Page 11: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Problem definition

• Object finder problem:

Page 12: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Process OF query as Top-K query

• Top-K query incorporates ranking. Results are total ordered if we process strong top-K

• A good algorithm can utilize early termination to avoid processing of results that are not in top-K

Page 13: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Top-K query processing

• General framework:Supporting Ad-hoc Ranking Aggregates SIGMOD 2006

( presented in May)

*SELECT* ga_1,..ga_n , F ----groups *FROM* R1,...,Rh ----source rel *WHERE* c1 AND... cl ----join cond. *GROUP BY* ga_1,...ga_n ----group def. *ORDER BY* F ----ordering func. an

aggregate*LIMIT* k ----Top-k setting

Page 14: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Top-K query processing

• For OF query, it isselect TOId, TOValue, score(TOId)

from TargetTable T, R, L1,...,LN

where R.TOId = T.TOID

and R.DocId=Li.DocID (i=1..N)

group by TOId, TOValue

order By score(TOId)

limit k

Page 15: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

My work is done(please try to recall my last talk)

Page 16: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Algorithm : Generate-Prune

Phrase I : Compute top-K candidates

Page 17: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Algorithm Overview

Page 18: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Algorithm

• Phrase II Compute exact top-K

Page 19: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Discussions

• In this work– Choice of aggregation function– ranking function in general – How do you think of this work

• Not limited– Impact of more complicated schema– Impact of selectivity of the query

Page 20: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Experiment Results

• Faster than SQL

• Faster than Generate-Only

• Robust to # of keywords and selections

• Intuitive Results

Page 21: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Experiments• Faster than SQL

Page 22: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Experiments• Faster than Generate-Only

Page 23: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Experiments• Robust to # of keywords and selections

Page 24: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Thank you

Questions to discuss?


Recommended