+ All Categories
Home > Documents > Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li...

Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li...

Date post: 16-Dec-2015
Category:
Upload: clemence-baldwin
View: 220 times
Download: 0 times
Share this document with a friend
Popular Tags:
27
Addressing Diverse User Preferences in SQL-Query- Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland , Baltimore County Florida International University
Transcript
Page 1: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Addressing Diverse User Preferences in SQL-Query-Result Navigation

SIGMOD ‘07

Zhiyuan Chen Tao LiUniversity of Maryland , Baltimore County Florida International University

Page 2: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Recap…

• Exploratory queries on database systems becoming a common phenomenon

• These queries return a large set of results, most of then irrelevant to the user

• Categorization (and ranking) help users locate relevant records

• A user typically does not expend any effort in specifying his/her preferences

Page 3: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Motivation

• Previous work assumed that all users have the same preferences.

• Not true in most scenarios• Ignoring user preferences leads to

construction of sub-optimal navigation trees

Page 4: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.
Page 5: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Motivation (cont…)

• Key challenges:1. How to summarize diverse user preferences

from the behaviors of all the users in the system?

2. How to decide the subset of preferences associated with a specific user?

Page 6: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

System Architecture

Query History

Cluster Generation

Clusters over Data

Navigation tree Construction

Query Execution

Results

Query

Page 7: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

System Architecture (cont…)

• Pre-processing step:– Analyze query history and generate a set of (non-

overlapping) clusters over data.– Each cluster corresponds to one type of user

preference– Has an associated probability of users being

interested in that cluster• Assumption: Individual preferences can be

represented as a subset of these clusters

Page 8: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

System Architecture (cont…)

• Generation of the navigation tree– Occurs when a specific user asks a query– Intersect the set of clusters generated in the pre-

processing step with the answers of the given query

– Construct a navigation tree over the intersected clusters on the fly

Page 9: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Terminology and Definitions

• Query History (H) is of the form, {(Q1,U1,F1),…,(Qk, Uk, Fk)}, in chronological order– where • Qi is a query

• Ui is a user session ID

• Fi is the weight associated with the query

• Each query Qi is of the form: – Cond(A1) ^ Cond(A2) ^ …. ^ Cond(An)

• Each Cond(Ai) contains only range or equality conditions

Page 10: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Terminology and Definitions (cont…)

• Data (D) is partitioned into disjoint set of clusters C = {C1 , C2, …, Cq }

• Each Ci has an associated probability Pi

• The Pi associated with a cluster denotes the probability that the users are interested in that cluster

Page 11: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Definition : Navigation Tree

• Navigation Tree T(V, E, L)• Satisfying the following conditions :– Each node v has a label label(v) denoting a

Boolean condition.– v contains records that satisfy all conditions on its

ancestors including itself– conditions associated with children of a non-leaf

node v are on the same attribute (called split attribute)

Page 12: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Clusters over Data

• Two records ri and rj are indistinguishable if they always appear in the same set of queries

• Define a binary relation R – (ri ,rj) Є iff the above condition is satisfied

• R is reflexive, symmetric and transitive

=> R is an equivalence relation and partitions D into equivalence classes (clusters) {C1,….,Cq}

Page 13: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Clusters over Data (Example)

9

1

6

8

3

5

104

2

711 12

13

D = {r1 ,….,r13 }Q1 = {r1,…,r10} ; Q2 = {r1,…,r9 and r11} ; Q3 = {r12}

Page 14: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Clusters over Data (Heuristics)

• Problem: Too many clusters!• Apply heuristics to decrease the number of

clusters:– Prune unimportant queries• Remove queries with empty answers• Retain the most specific query in a given session

– Merge similar queries

Page 15: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Clusters over Data (Merge Similar Queries)

• Algorithm:1. Compute result DQi for each query Qi

2. Compute clusters CQi for each query Qi

3. Repeat until no more merging is possible1. Compute distance between each pairs of queries 2. Merge two clusters QCi & QCj that have a distance less

than B

Distance d(Qi,Qj) =

Page 16: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Merge Similar Queries (Example)• Let B = 0.2 • d(CQ1,CQ2) = 1 – 9/11 = 0.18, d(CQ1,CQ3) = 1 , d(CQ2,CQ3) = 1

• Merge CQ1 and CQ2

• Results in 2 query clusters CQ 1 = {r1 ,….,r11 }, CQ2 = {r12}

9

1

6

8

3

5

104

2

711 12

13

Page 17: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Merge Similar Queries (Complexity Results)

• O(|H||D| + |H|3td )– td is the time to compute distance

• Can be improved by – Sampling– Pre-computation of distances • O(|H||D| + |H|2td + |H|2 log|H|)

– Min-wise Hashing• O(|H||D| + |H|2k + |H|2 log|H|)

– K is the hash signature size

Page 18: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Generate Clusters

• QC1,…QCk generated after query pruning and merging

• For each record ri

– Generate a set Ci such that one of the queries in the cluster returns ri

– Group the records by Ci and assign a class-label to Ci

– Compute Pi :

• Sum of frequencies of query in Si divided by the sum of all queries in H (history)

Example: P1 = 2/3, P2 = 1/3 and P3 = 0

Page 19: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Navigation Tree Construction

• Given D, Q and C find a tree T(V, E, L) such that– T contains all records of Q– There does not exist T’ with NCost(T’,C) <

NCost(T’,C)

NCost(T,C):

Page 20: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Navigation Tree Construction

• Optimal-tree construction problem is NP-Hard• Observation: The navigational tree is very similar

to a decision tree. • So, any decision tree construction algorithm can

be used… • Decision tree algorithms compute information

gain to measure how good and attribute classifies data.

• Here, the criteria is to minimize navigation cost

Page 21: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Navigation Tree Construction (Decision Tree Construction)

• Precondition: Each record has a class label assigned in the clustering step

• Algorithm: 1. Create a root R and assign all records to it2. If all the records have the same class label, stop3. Select the attribute that maximizes the (global)

navigation cost (Information gain) to expand the tree for the next level

Page 22: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Navigation Tree Construction (Splitting Criteria)

• Navigation Cost includes– Cost of visiting Leaf nodes – the results– Cost of visiting intermediate nodes – category

labels

Page 23: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Splitting Criteria (Example)A1

(C1, C2, C1, C2)(C1, C2, C1, C2)

A1 <= v1 A1 > v1

A2

(C1, C2, C2, C2)(C1, C1, C1, C2)

A2 <= v2 A2 > v2

P(C1) = P(C2) = 0.5Navigation Cost = 2 + 4 +2+ 4

Which split is better?

Page 24: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Splitting Criteria Cost of visiting Leaf-nodes

• Let t be the node to be split• N(t) be the number of records in t• Let Pi be the probability that users are interested in

cluster Ci

• The gain (reduction in navigation cost) when t is split into t1 and t2 is:

Page 25: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Splitting Criteria Cost of Visiting Intermediate Nodes

• Observation:– Given a perfect tree T with N records and k

classes, where each class Ci in T has Ni records:

approximates the average length of root-to-leaf paths for all records in T

Page 26: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Splitting Criteria Cost of Visiting Intermediate Nodes

t

(C1, C1, C1,…..,C1) (Ck, Ck, Ck,…..,Ck)

…… ……C1C1 CkCk

Log N

Log Ni

Page 27: Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Splitting Criteria Combining the two costs

Gain when a node t is split into t1 and t2

Information Gain due to a split:IGain (t, t1, t2) = E(t) – N1/N E(t1) – N2/N E(t2)


Recommended