Creating Competitive Products
Qian Wan[1], Raymond Chi-Wing Wong[1], Ihab F. Ilyas[2], M. Tamer Ozsu[2], Yu Peng[1]
[1] Hong Kong University of Science and Technology [2] University of Waterloo
Presented by Qian WanPrepared by Qian Wan
Creating Competitive Products | VLDB '09 2
Outline
• Background– Skyline, Related Work
• Motivation– Example, Problem Definition
• Algorithm– Framework, Grouping, Pruning
• Experiments– Synthetic, Real data– 6 factors, 4 measurements
• Conclusions
Creating Competitive Products | VLDB '09 3
Skyline
• Definition– Skyline contains the points which are not dominated by
others• Hotel searching problem– Distance to beach VS Price– Dominance– Skyline
Dist
Price
H3
H5
H7
H9
H1
H2
H4
H6
H8
Dist
Price
H1
H2
Creating Competitive Products | VLDB '09 4
Related Work
• Skyline Queries in DBMS [S.Borzsonyi, 2001]
• Single Table Skyline Queries– Bitmaps[K.L. Tan,2001], Nearest Neighbor[D.Kossomann,
2002], Branch and Bound Skylines[D.Papadias, 2005]
• Multi-Table Skyline Queries– Natural Join [W.Jin, 2007][D.Sun, 2008]
– Our Work• Join different source tables via a “Cartesian product”
like procedure.
Creating Competitive Products | VLDB '09 5
Outline
• Background– Skyline, Related Work
• Motivation– Example, Problem Definition
• Algorithm– Framework, Grouping, Pruning
• Experiments– Synthetic, Real data– 6 factors, 4 measurements
• Conclusions
Creating Competitive Products | VLDB '09 6
A Travel Agency’s DatabasePackage No-of-
stopsDistance-to-beach
Hotel-class Price
P1 0 130 2 250
P2 1 140 2 170
P3 1 300 1 150
P4 1 150 4 300
Existing Vacation Packages
Hotel Distance-to-beach
Hotel-class
Hotel-cost
H1 100 3 100
H2 200 2 90
H3 400 1 80
Flight No-of-stops
Flight-cost
F1 0 120
F2 1 100
Package No-of-stops
Distance-to-beach
Hotel-class Price
Q1(F1:H1) 0 100 3 220
Q2(F1,H2) 0 200 2 210
Q3(F1, H3) 0 400 1 200
… … … … …
Q24(f4,h6) 2 200 3 210
Newly Created Vacation Packages
Source Tables
1. Direct attributes2. Indirect attributes3. One indirect attribute characteristic e.g. Travel Agency (Price), PC Manufacture(Price)21,TT
ET
QT
Skyline tuples
Creating Competitive Products | VLDB '09 7
Finding Competitive Products
• Given a set of source tables• Market packages• New packages • Then, a tuple q in TQ is said to be competitive
product if q is in Skyline with respect to
kTTT ..., 21
ET
QT
QE TT
Creating Competitive Products | VLDB '09 8
Naïve Solution
Hotel Distance-to-beach
Hotel-class
Hotel-cost
H1 100 3 100
H2 200 2 90
H3 400 1 80
H4 150 2 150
H5 170 2 140
H6 200 3 120
Flight No-of-stops
Flight-cost
F1 0 120
F2 1 100
F3 2 80
F4 2 90
Package No-of-stops
Distance-to-beach
Hotel-class
Price
Q1(f1:h1)
0 100 3 220
Q2(f1,h2)
0 200 2 210
Q3(f1, h3)
0 400 1 200
… … … … …
Q7(f2,h1)
1 100 3 200
… … … … …
Q13(f3,h1)
2 100 3 180
… … … … …
Q24(f4,h6)
2 200 3 210
Package
No-of-stops
Distance-to-beach
Hotel-class
Price
P1 0 130 2 250
P2 1 140 2 170
P3 1 300 1 150
P4 1 150 4 300
1. Intra-dominance checking2. Inter-dominance checking
Source Tables
Existing Vacation Packages
Newly Created Vacation Packages
Package
No-of-stops
Distance-to-beach
Hotel-class
Price
Q1(f1:h1)
0 100 3 220
Q2(f1,h2)
0 200 2 210
Q3(f1, h3)
0 400 1 200
… … … … …
Q7(f2,h1)
1 100 3 200
… … … … …
Q13(f3,h1)
2 100 3 180
Competitive Products
Creating Competitive Products | VLDB '09 9
Outline
• Background– Skyline, Related Work
• Motivation– Example, Problem Definition
• Algorithm– Framework, Grouping, Pruning
• Experiments– Synthetic, Real data– 6 factors, 4 measurements
• Conclusions
Creating Competitive Products | VLDB '09 10
Algorithm Overview
• Intra-dominance checking– To Find Skyline in Source Tables
• Inter-dominance checking– Skyline in Existing Market Packages– R* Tree Indies in Existing Market Packages– Full Pruning– Partial Pruning
• Post-processing
Creating Competitive Products | VLDB '09 11
Intra-dominance Checking
Hotel Distance-to-beach
Hotel-class
Hotel-cost
H1 100 3 100
H2 200 2 90
H3 400 1 80
H4 150 2 150
H5 170 2 140
H6 200 3 120
Flight No-of-stops
Flight-cost
F1 0 120
F2 1 100
F3 2 80
F4 2 90
Package No-of-stops
Distance-to-beach
Hotel-class
Price
Q1(f1:h1)
0 100 3 220
Q2(f1,h2)
0 200 2 210
Q3(f1, h3)
0 400 1 200
… … … … …
Q7(f2,h1)
1 100 3 200
… … … … …
Q13(f3,h1)
2 100 3 180
… … … … …
Q15(f3,h5)
2 170 3 200
Hotel Distance-to-beach
Hotel-class
Hotel-cost
H1 100 3 100
H2 200 2 90
H3 400 1 80
H4 150 2 150
H5 170 2 140
Flight No-of-stops
Flight-cost
F1 0 120
F2 1 100
F3 2 80
Skyline Tuples of Source Tables
Newly Created Vacation Packages (conceptual)
1. NO intra-dominance checking (one indirect attribute)2. NO competitive products are missed
Package No-of-stops
Distance-to-beach
Hotel-class
Price
Q1(f1:h1)
0 100 3 220
Q2(f1,h2)
0 200 2 210
Q3(f1, h3)
0 400 1 200
… … … … …
Q7(f2,h1)
1 100 3 200
… … … … …
Q13(f3,h1)
2 100 3 180
Competitive Products
'QT'2T
'1T
Conceptual
Creating Competitive Products | VLDB '09 12
Algorithm Overview
• Intra-dominance checking (Framework)– To Find Skyline in Source Tables
• Inter-dominance checking– Skyline in Existing Market Packages– R* Tree Indies in Existing Market Packages– Full Pruning– Partial Pruning
• Post-processing
Creating Competitive Products | VLDB '09 13
Inter-dominance Checking
Package No-of-stops
Distance-to-beach
Hotel-class
Price
P1 0 130 2 250
P2 1 140 2 170
P3 1 300 1 150
P4 1 150 4 300
Package No-of-stops
Distance-to-beach
Hotel-class
Price
P1 0 130 2 250
P2 1 140 2 170
P3 1 300 1 150
P4 1 150 4 300
Package No-of-stops
Distance-to-beach
Hotel-class
Price
P1 0 130 2 250
P2 1 140 2 170
P3 1 300 1 150
No Competitive Products are missed
R* Tree will speedup the inter-dominance checking
Existing Vacation Packages
Skyline in Existing Vacation Packages
R0
R1
R3 R4
R2
R5
Inter-dominance Checking Range query
ET 'ET
Spatial Index
Creating Competitive Products | VLDB '09 14
Algorithm Overview
• Intra-dominance checking (Framework)– To Find Skyline in Source Tables
• Inter-dominance checking– Skyline in Existing Market Packages– R* Tree Indies in Existing Market Packages– Full Pruning– Partial Pruning
• Post-processing
Creating Competitive Products | VLDB '09 15
Full PruningPackage No-of-
stopsDistance-to-beach
Hotel-class
Price
P1 0 130 2 250
P2 1 140 2 170
P3 1 300 1 150
Package No-of-stops
Distance-to-beach
Hotel-class
Price
Q1(f1:h1)
0 100 3 220
Q2(f1,h2)
0 200 2 210
Q3(f1, h3)
0 400 1 200
… … … … …
Q7(f2,h1)
1 100 3 200
… … … … …
Q13(f3,h1)
2 100 3 180
… … … … …
Q15(f3,h5)
2 170 3 200
Hotel Distance-to-beach
Hotel-class
Hotel-cost
H1 100 3 100
H2 200 2 90
H3 400 1 80
H4 150 2 150
H5 170 2 140
Flight No-of-stops
Flight-cost
F1 0 120
F2 1 100
F3 2 80
Skyline Tuples of Source Tables
Newly Created Vacation Packages(Conceptual)
Package No-of-stops
Distance-to-beach
Hotel-class
Price
Q1(f1:h1)
0 100 3 220
Q2(f1,h2)
0 200 2 210
Q3(f1, h3)
0 400 1 200
… … … … …
Q7(f2,h1)
1 100 3 200
… … … … …
Q13(f3,h1)
2 100 3 180
Existing Vacation Packages
Competitive Products
A1
A2
B1
B2
C1={A1, B1}
C4={A2, B2}
Full Pruning
'2T
'1T
'ET
'QT
Creating Competitive Products | VLDB '09 16
Full PruningPackage No-of-
stopsDistance-to-beach
Hotel-class
Price
P1 0 130 2 250
P2 1 140 2 170
P3 1 300 1 150
Best Representative
B1
B2
… … … … …
Bi
… … … … …
Bj
… … … … …
Bk
Groups
C1
C2
… … … … …
Ci
… … … … …
Cj
… … … … …
Ck
Package No-of-stops
Distance-to-beach
Hotel-class
Price
Q(f2:h4) 1 150 4 250
Q’(f2,h5) 1 170 4 240
Package No-of-stops
Distance-to-beach
Hotel-class
Price
Min 1 150 4 240
Quality of Best Representative(tightness of each group): (Clustering, e.g. KMeans)
Best Representative
'QT 'ET
Creating Competitive Products | VLDB '09 17
Algorithm Overview
• Intra-dominance checking (Framework)– To Find Skyline in Source Tables
• Inter-dominance checking– Skyline in Existing Market Packages– R* Tree Indies in Existing Market Packages– Full Pruning– Partial Pruning
• Post-processing
Creating Competitive Products | VLDB '09 18
Partial Pruning• Full pruning prunes all members in the group• Partial pruning prunes some members in the group• Direct attribute does not change• Estimate the best possible value for indirect attributes• Using tuples in TE’ to conduct Range Query in each Source Table• Eliminate dominated combinations, if
– They are dominated on all direct attributes– They are dominated on all indirect attributes according to their best
estimation
• Partial pruning is used when full pruning cannot be applied
Creating Competitive Products | VLDB '09 19
Partial PruningPackage No-of-
stopsDistance-to-beach
Hotel-class
Price
P1 0 130 2 250
P2 1 140 2 170
P3 1 300 1 150
Package No-of-stops
Distance-to-beach
Hotel-class
Price
Q1(f1:h1)
0 100 3 220
Q2(f1,h2)
0 200 2 210
Q3(f1, h3)
0 400 1 200
… … … … …
Q7(f2,h1)
1 100 3 200
… … … … …
Q13(f3,h1)
2 100 3 180
… … … … …
Q15(f3,h5)
2 170 3 200
Hotel Distance-to-beach
Hotel-class
Hotel-cost
H1 100 3 100
H2 200 2 90
H3 400 1 80
H4 150 2 150
H5 170 2 140
Flight No-of-stops
Flight-cost
F1 0 120
F2 1 100
F3 2 80
Skyline Tuples of Source Tables
Newly Created Vacation Packages
Package No-of-stops
Distance-to-beach
Hotel-class
Price
Q1(f1:h1)
0 100 3 220
Q2(f1,h2)
0 200 2 210
Q3(f1, h3)
0 400 1 200
… … … … …
Q7(f2,h1)
1 100 3 200
… … … … …
Q13(f3,h1)
2 100 3 180
Existing Vacation Packages
Competitive Products
A1
B1
C1={A1, B1}
Full Pruning
Creating Competitive Products | VLDB '09 20
Meta Transformation
Package No-of-stops
Distance-to-beach
Hotel-class
Price
P1 0 130 2 250
P2 1 140 2 170
P3 1 300 1 150
Package No-of-stops
Distance-to-beach
Hotel-class
Price
P2 1 140 2 170
Package No-of-stops
Price
P2 1 170
Package Distance-to-beach
Hotel-class Price
P2 140 2 170
Hotel Distance-to-beach
Hotel-class
Hotel-cost
H1 100 3 200
H2 200 2 190
H3 400 1 180
Flight No-of-stops
Flight-cost
F1 0 200
F2 1 180
•No inter-dominance checking for {F2} X{H2}
Meta-Hotel
Meta-Flight
Min 1 100
Min 400 1 80
Hotel Distance-to-beach
Hotel-class
Hotel-cost
H1 100 3 100
H2 200 2 90
H3 400 1 80
Flight No-of-stops
Flight-cost
F1 0 120
F2 1 100
A1
B1
Creating Competitive Products | VLDB '09 21
Algorithm Overview
• Framework• Intra-dominance checking– To Find Skyline in Source Tables
• Inter-dominance checking– Skyline in Existing Market Packages– R* Tree Indies in Existing Market Packages– Full Pruning– Partial Pruning
• Post-processing
Creating Competitive Products | VLDB '09 22
Post-processing
• More than one indirect attributes– Calculation• Previous algorithm Intra-dominance checking
– Any existing Skyline algorithm– Post-processing cost depends on the size of
Competitive Products
Creating Competitive Products | VLDB '09 23
Outline
• Background– Skyline, Related Work
• Motivation– Example, Problem Definition
• Algorithm– Framework, Grouping, Pruning
• Experiments– Synthetic, Real data– 6 factors, 4 measurements
• Conclusions
Creating Competitive Products | VLDB '09 24
Experiments
• Pentium IV 2.4GHz PC with 4GB memory, Linux platform, C++
• Synthetic anti-correlated datasets• Real datasets, Travel Agency A and Travel Agency B
– A, 296 packages, 1014 hotels and 4394 flights – B, 149 packages, 995 hotels and 866 flights
• Implementation– Algorithm for Creating Competitive Products (ACCP)– Baseline algorithm – Naïve algorithm
Skyline in tables
R* Tree Full & Partial Pruning
ACCP Yes Yes Yes
Baseline Yes Yes No
Naïve No No No
Creating Competitive Products | VLDB '09 25
Synthetic DatasetsParameters Default value
No. of attributes in each source table 4
No. of indirect attributes in a product table
1
No. of source tables 2
No. of clusters in each source table 2
Size of existing packages 5M
Size of each source table 100k
• Schema is similar to our example
• Anti-correlated• 6 factors• Measurement
– Execution time– Pruning Power– Ratio of Competitive
Products out of all combinations
– Memory Usage
Creating Competitive Products | VLDB '09 26
Experiments
From 100k to 500k
Full pruning & partial pruning
TQ, TQ’, TR SKY
Pruning Powerslightly increases
Parameters Default value
No. of attributes in each source table 4
No. of indirect attributes in a product table
1
No. of source tables 2
No. of clusters in each source table 6
Size of existing packages 5M
Size of each source table 100k
Creating Competitive Products | VLDB '09 27
Experiments
From 2.5M to 10M
Parameters Default value
No. of attributes in each source table 4
No. of indirect attributes in a product table
1
No. of source tables 2
No. of clusters in each source table 6
Size of existing packages 5M
Size of each source table 100k
More competitive Slightly decreases
Creating Competitive Products | VLDB '09 28
Experiments
Travel Agency A Package Generation Set
1. A, 296 packages, 1014 hotels and 4394 flights . B, 149 packages, 995 hotels and 866 flights
2. Source tables from B, and Package from A
3. Vary discount from 0 to 0.504. Efficiency
ACCP(44.74s) and Baseline (84.47s)
5. |SKY|/|TQ|6. |DOM|/|TE|
DOMSKY
Creating Competitive Products | VLDB '09 29
Outline
• Background– Skyline, Related Work
• Motivation– Example, Problem Definition
• Algorithm– Framework, Grouping, Pruning
• Experiments– Synthetic, Real data– 6 factors, 4 measurements
• Conclusions
Creating Competitive Products | VLDB '09 30
Conclusions• Creating Competitive Products
– Example– Problem Definition
• Algorithms– Framework– Intra-dominance checking– Inter-dominance checking– Post-processing
• Experiments– Synthetic anti-correlated datasets– Real datasets
Creating Competitive Products | VLDB '09 31
THANK YOU !Q&A