Dp idp exploredb

Post on 11-Aug-2014

142 views 3 download

Tags:

description

Ranking skyline points with an IR-style weighting scheme

transcript

George Valkanas1, Apostolos N. Papadopoulos2, Dimitrios Gunopulos1

Skyline Ranking à la IR

1University of Athens, Greece2Aristotle University of Thessaloniki, Greece

1st ExploreDB WorkshopAthens, Greece28th March, 2014

Skyline Problem Introduction

• Dataset D = (p1, p2, …, pn) in d-dimensional space• Preferences for each dimension: min, max• p dominates q iff pi ≤ qi i = 1,..,d && j: pj < qj

Usefulness of Skyline• Multi-Objective optimization

• Exploratory Search

• Improve Recommendations

• Data summarization technique

• Building block for defining competitiveness

Skyline Cardinality Explosion

O( (ln n)d-1)

• Skyline becomes too large to inspect manually

Solving the Cardinality Problem

• Select subset of size k– Coverage-based– Contour representation– Diversification

• Ranking– Top-k Dominating– Subspace dominance

Skyline + IR: Intuition

• Dominated points are not equally important• Scheme similar to TF-IDF

Skyline + IR: How ?

• 2 Factors– DP (~ tf)

– IDP (~ idf)

• DP-IDP

Ranking the Skyline• Baseline:

– sp• Iterate over its dominated points, and SUM

SlowUnnecessary computations

• Alternative?Bound the score

• Lower• Upper

Prune skyline points

A Simpler Representation

• More comprehensive for bounds

Bounding the Score• Q1: What is the score for B ?

Bounding the Score• Q1: What is the score for B ?• A1: Depends on the assignment of the

remaining edges

Bounding the Score• Q1: What is the score for B ?• A1: Depends on the assignment of the

remaining edges

• Q2: What is the maximum score for B ?

Bounding the Score• Q1: What is the score for B ?• A1: Depends on the assignment of the

remaining edges

• Q2: What is the maximum score for B ?• A2: Assign appropriately the remaining

edges

Bounding the Score• Q1: What is the score for B ?• A1: Depends on the assignment of the

remaining edges

• Q2: What is the maximum score for B ?• A2: Assign appropriately the remaining

edges

• Q3: What is the appropriate way?

Bounding the Score• Q1: What is the score for B ?• A1: Depends on the assignment of the

remaining edges

• Q2: What is the maximum score for B ?• A2: Assign appropriately the remaining

edges

• Q3: What is the appropriate way?• A3:

– Same layer → Higher score (dp)– Minimum overlap → Higher score (idp)

• No overlap → Loose bounds

The SkyIR Algorithm

The SkyIR Algorithm

The SkyIR Algorithm

The SkyIR Algorithm

The SkyIR Algorithm

The SkyIR Algorithm

The SkyIR Algorithm

• Priority can be:– Round Robin (RRB)– Pending points (PND)– Upper Bound (UBS)

Experimental Setup

• Datasets

• Algorithms– Baseline– SkyIR

• Bounds: Loose (LS), Collaborative (CB)• 3 Priority schemes: RRB, PND, UBS

Total Runtime – IND distr

k=5, d=3

CB-UBS is 4x faster than the Baseline

Total Runtime – ANT distr

• Interesting fact: ANT is easier than IND (fewer layers to extract)

Total Runtime – Forest Cover

Memory Consumption

CB, k=5

PND is the best memory-wise

Conclusions

• IR-style ranking for skyline– Formal framework– Bounds for efficient computation

• SkyIR algorithm– Experimental evaluation

• Future Work– Speed up / Scale up– Improve bounds (lower, upper)– Approximation technique(s)

Thank you!

Questions?

Acknowledgements: Heraclitus II fellowship, THALIS – GeomComp, THALIS – DISFER, ARISTEIA – MMD, FP7 INSIGHT