Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik...

Ranking objects based on relationships

Computing Top-K over Aggregation

Sigmod 2006

Kaushik Chakrabarti et al.

Outline

• Motivation

• The framework and problem definition

• Proposed solution

• Discussions

• Experiments

Heating up discussion

• We basically know how web search engines work. – Having web crawlers collecting web-page

information, index and rank them.

• How do we define searching in a relational database– Free-style search v.s. SQL + predicates ?– What’s the expected outcome?– How do we rank results?

Motivation• Searching over a relational database

– information scattered in different relations

Motivation

• Full text search, aggregation already supported by RDBMS

– What else do we need in order to perform good searching?

Related work

• Information Retrieval (full text searching)• Researches in Text Databases• Explore database via foreign key-primary key

– DBExplorer (ICDE 2002)– BANKS (ICDE 2002)– DISCOVER (VLDB 2002)

• What are related work missing– Target objects don’t contain keywords– Lack of scoring function for query results– Not utilizing aggregates to put together search results

for multiple keywords

Contributions

• Introduce an interesting problem domain

• Define “Object Finder” (OF) queries

• Propose scoring functions

• Propose a solution to process OF query– Return top K ranked results– Efficient early termination property

System Overview

Scoring functions

• Scoring Matrixes and row- column- marginal's

Scoring semantics

• All Query Keywords Present in each document– can be too restrictive

• All Query Keywords Present in Set of Related Documents– can not use MIN as row-marginal scoring

• Pseudo-document Approach:– enlarged searching space

Problem definition

• Object finder problem:

Process OF query as Top-K query

• Top-K query incorporates ranking. Results are total ordered if we process strong top-K

• A good algorithm can utilize early termination to avoid processing of results that are not in top-K

Top-K query processing

• General framework:Supporting Ad-hoc Ranking Aggregates SIGMOD 2006

( presented in May)

*SELECT* ga_1,..ga_n , F ----groups *FROM* R1,...,Rh ----source rel *WHERE* c1 AND... cl ----join cond. *GROUP BY* ga_1,...ga_n ----group def. *ORDER BY* F ----ordering func. an

aggregate*LIMIT* k ----Top-k setting

Top-K query processing

• For OF query, it isselect TOId, TOValue, score(TOId)

from TargetTable T, R, L1,...,LN

where R.TOId = T.TOID

and R.DocId=Li.DocID (i=1..N)

group by TOId, TOValue

order By score(TOId)

limit k

My work is done(please try to recall my last talk)

Algorithm : Generate-Prune

Phrase I : Compute top-K candidates

Algorithm Overview

Algorithm

• Phrase II Compute exact top-K

Discussions

• In this work– Choice of aggregation function– ranking function in general – How do you think of this work

• Not limited– Impact of more complicated schema– Impact of selectivity of the query

Experiment Results

• Faster than SQL

• Faster than Generate-Only

• Robust to # of keywords and selections

• Intuitive Results

Experiments• Faster than SQL

Experiments• Faster than Generate-Only

Experiments• Robust to # of keywords and selections

Thank you

Questions to discuss?

Date post:	14-Dec-2015
Category:	Documents
Upload:	owen-glenn
View:	215 times
Download:	0 times