Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | owen-glenn |
View: | 215 times |
Download: | 0 times |
Ranking objects based on relationships
Computing Top-K over Aggregation
Sigmod 2006
Kaushik Chakrabarti et al.
Outline
• Motivation
• The framework and problem definition
• Proposed solution
• Discussions
• Experiments
Heating up discussion
• We basically know how web search engines work. – Having web crawlers collecting web-page
information, index and rank them.
• How do we define searching in a relational database– Free-style search v.s. SQL + predicates ?– What’s the expected outcome?– How do we rank results?
Motivation• Searching over a relational database
– information scattered in different relations
Motivation
• Full text search, aggregation already supported by RDBMS
– What else do we need in order to perform good searching?
Related work
• Information Retrieval (full text searching)• Researches in Text Databases• Explore database via foreign key-primary key
– DBExplorer (ICDE 2002)– BANKS (ICDE 2002)– DISCOVER (VLDB 2002)
• What are related work missing– Target objects don’t contain keywords– Lack of scoring function for query results– Not utilizing aggregates to put together search results
for multiple keywords
Contributions
• Introduce an interesting problem domain
• Define “Object Finder” (OF) queries
• Propose scoring functions
• Propose a solution to process OF query– Return top K ranked results– Efficient early termination property
System Overview
Scoring functions
• Scoring Matrixes and row- column- marginal's
Scoring semantics
• All Query Keywords Present in each document– can be too restrictive
• All Query Keywords Present in Set of Related Documents– can not use MIN as row-marginal scoring
• Pseudo-document Approach:– enlarged searching space
Problem definition
• Object finder problem:
Process OF query as Top-K query
• Top-K query incorporates ranking. Results are total ordered if we process strong top-K
• A good algorithm can utilize early termination to avoid processing of results that are not in top-K
Top-K query processing
• General framework:Supporting Ad-hoc Ranking Aggregates SIGMOD 2006
( presented in May)
*SELECT* ga_1,..ga_n , F ----groups *FROM* R1,...,Rh ----source rel *WHERE* c1 AND... cl ----join cond. *GROUP BY* ga_1,...ga_n ----group def. *ORDER BY* F ----ordering func. an
aggregate*LIMIT* k ----Top-k setting
Top-K query processing
• For OF query, it isselect TOId, TOValue, score(TOId)
from TargetTable T, R, L1,...,LN
where R.TOId = T.TOID
and R.DocId=Li.DocID (i=1..N)
group by TOId, TOValue
order By score(TOId)
limit k
My work is done(please try to recall my last talk)
Algorithm : Generate-Prune
Phrase I : Compute top-K candidates
Algorithm Overview
Algorithm
• Phrase II Compute exact top-K
Discussions
• In this work– Choice of aggregation function– ranking function in general – How do you think of this work
• Not limited– Impact of more complicated schema– Impact of selectivity of the query
Experiment Results
• Faster than SQL
• Faster than Generate-Only
• Robust to # of keywords and selections
• Intuitive Results
Experiments• Faster than SQL
Experiments• Faster than Generate-Only
Experiments• Robust to # of keywords and selections
Thank you
Questions to discuss?