+ All Categories
Home > Data & Analytics > [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

Date post: 14-Apr-2017
Category:
Upload: yasanka-sameera-horawalavithana
View: 517 times
Download: 0 times
Share this document with a friend
25
An Efficient incremental indexing mechanism for extracting Top-k representative queries over continuous data streams Y.S. Horawalavithana, D.N. Ranasinghe Adaptive and Reflective Middleware (ARM) ACM/IFIP/USENIX Middleware Vancouver, BC, Canada December 08, 2015 1 University of Colombo School of Computing, Sri Lanka
Transcript
Page 1: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

1

An Efficient incremental indexing mechanism for extracting Top-k

representative queries over continuous data streams

Y.S. Horawalavithana, D.N. Ranasinghe

Adaptive and Reflective Middleware (ARM) ACM/IFIP/USENIX Middleware

Vancouver, BC, CanadaDecember 08, 2015

University of Colombo School of Computing, Sri Lanka

Page 2: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

2

Overview

β€’ Motivationβ€’ Adaptive Diversificationβ€’ Incremental Top-kβ€’ Evaluationβ€’ Conclusionβ€’ Future work

Page 3: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

3

Page 4: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

4

Diversity: Top-k representative setRepresentative Top-kDrawback

(without diversity)What we want(with diversity)

Method to retrieve Top-k publications from matching publications

1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work

Page 5: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

5

Minimum independent-dominating set

𝑝1

𝑝2𝑝3

𝑝4

𝑝5

𝑣1

𝑣4

𝑣3

𝑣5

𝑣2

𝛼

𝑣1

𝑣4

𝑣3

𝑣5

𝑣2 𝑣1

𝑣4

𝑣3𝑣2

𝑣5

𝑣1

𝑣4

𝑣3𝑣2

𝑣5

jijiji ppppdppodNeighborho ,| )(

𝑣1

𝑣4

𝑣3𝑣2

𝑣5

Publication space

Graph model

Independent, dominating Independent, dominating Independent, dominating Dominating, not independent

1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work

Page 6: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

6

NAÏVE Greedy argmaxπ‘Ÿ (𝑝𝑖)

2

βˆ‘π‘ π‘—βˆˆπ‘ (𝑝 𝑖)

π‘Ÿ (𝑝 𝑗)×𝑑 (𝑝𝑖 ,𝑝 𝑗)

1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work

Page 7: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

7

Handling streaming publications

𝑝1

𝑝2𝑝3

𝑝4

𝑝5

𝑣1

𝑣4

𝑣3

𝑣5

𝑣2𝛼

𝑝6

𝑣1

𝑣4

𝑣3

𝑣5

𝑣2𝑣6

Continuity Requirements1. Durability

an item is selected as diversified in window may still have the chance to be in window if it's not expired & other valid items in window are failed to compete with it.

2. Order Publication stream follow the chronological order We avoid the selection of item j as diverse later, when we already selected an item i which is not-older than j.

1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work

Page 8: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

8

Adaptive Diversification

𝑃1𝑃2𝑃3𝑃4 .. 𝑃 𝑗𝑃 𝑗+1 .. .. .. ....

Matching publication stream

𝑃1𝑃2𝑃3𝑃4 .. 𝑃 𝑗𝑃 𝑗+1 .. .. .. ....

ith window

(i+1)th window

𝑆 π‘–βˆ—

𝑆 𝑖+1βˆ—

Independence

Dominance

Durability

Order

Straightforward solution: Apply naΓ―ve greedy method at each instance

Propose incremental index mechanism! Avoid the curse of re-calculating neighborhood

1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work

Page 9: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

9

Locality Sensitive Hashing (LSH) Simple Idea

if two points are close together, then after a β€œprojection” operation these two points will remain close together

1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work

Page 10: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

10

LSH in Adaptive Diversification:Publications as categorical data

1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work

Page 11: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

11

LSH in Adaptive Diversification:Characteristic Matrix

1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work

Page 12: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

12

LSH in Adaptive Diversification:Minhashing No Publications any more!

Signature to represent

Technique Randomly permute the rows at

characteristic matrix m times Take the number of the 1st row, in

the permuted order, which the column has a 1 for

the correspondent column of publications.

First permutation of rows at characteristic matrix

Advantage: Reduce the dimensions into a small

minhash signature1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work

Page 13: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

13

LSH in Adaptive Diversification:Signature Matrix

Fast-minhashingSelect m number of random hash

functionsTo model the effect of m number of

random permutationMathematically proved only when,

The number of rows is a prime.

1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work

Page 14: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

14

LSH in Adaptive Diversification:LSH Buckets

Take r sized signature vectors From m sized

minhash-signature

Map them into, L Hash-Tables Each with

arbitrary b number of buckets

1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work

Page 15: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

15

LSH in Adaptive Diversification:Batch-wise Top-k computation

Bucket β€œWinner” – a publication which has the highest relevancy score

Winner is dominant to represent it's bucket neighborhood

Top-k "winnersβ€œ that have a majority of votes k winners are independent

𝑃 𝐴𝑃 𝐡𝑃𝐢𝑃𝐷𝑃 𝐸𝑃 𝐹𝑃𝐺𝑃𝐻. .

ith window

1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work

Page 16: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

16

LSH in Dynamic Diversification:Incremental Top-k computation

π‘π‘’π‘€π‘π‘’π‘π‘™π‘–π‘π‘Žπ‘‘π‘–π‘œπ‘›π‘– π‘ˆπ‘π‘‘π‘Žπ‘‘π‘’π‘– h𝑑 h𝑐 π‘Žπ‘Ÿπ‘Žπ‘π‘‘π‘’π‘Ÿπ‘–π‘ π‘‘π‘–π‘π‘£π‘’π‘π‘‘π‘œπ‘Ÿ Characteristic Matrix

πΊπ‘’π‘›π‘’π‘Ÿπ‘Žπ‘‘π‘’ 𝑖 h𝑑 h hπ‘šπ‘–π‘› π‘Žπ‘  π‘ π‘–π‘”π‘›π‘Žπ‘‘π‘’π‘Ÿπ‘’

Signature Matrix

Map signature into L hash-tables

Update β€œWinner” at bucket signature

maps into

Vote π‘‡π‘œπ‘βˆ’π‘˜π‘π‘Žπ‘›π‘‘π‘–π‘‘π‘Žπ‘‘π‘’1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work

Page 17: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

17

LSH in Dynamic Diversification:When new publication F arrives…

Only buckets will vote Follow continuity requirements

Durability Order

𝑃 𝐴𝑃 𝐡𝑃𝐢𝑃 𝐷𝑃 𝐸𝑃 𝐹𝑃𝐺𝑃𝐻. .

ith window

(i+1)th window

1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work

Page 18: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

18

LSH in Adaptive Diversification:Analysis

For two vectors x,y

For publications x & y At a particular hash table

x & y map into the same bucket:

x & y does not map into the same bucket:

At L Hash-tables x & y does not map into the same bucket:

1βˆ’ΒΏ

True near neighbors will be unlikely to be unlucky

in all the projections

1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work

Page 19: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

19

Publication Stream Zipfian subscriptions

Normalized preferences

Evaluation:Dataset

Amazon on-line market place data available at 17th – 19th November 2014

N - number of elements in distribution,

k - rank of element

s - value of exponent

Page 20: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

20

TerminologyILSH, BLSH and NAÏVE

𝑃1𝑃2𝑃3𝑃4𝑃5𝑃6𝑃7𝑃8. .BLSH

or NAIVE

BLSH or

NAIVE

BLSH or

NAIVE

BLSH or

NAIVE

ILSH

1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work

Page 21: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

21

Accuracy:ILSH vs. NAÏVE

Probability of producing optimal diverse set of results by ILSH under Jaccard similarity threshold (s)

1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work

Page 22: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

22

Performance & Efficiency:ILSH vs. BLSH vs. NAÏVE

log (Top-k matching time) on number of publications with D=500

1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work

Page 23: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

23

Conclusions Locality Sensitive Hashing (LSH) indexing method

Produce diverse set of results at average 70% accuracy over naΓ―ve method Reduce the matching time very significantly over NAÏVE method Further, refine by it’s incremental version

For handling streaming publications Avoid the curse of re-computing neighborhoods

Top k to restrict the delivery of Top publications Given a window size & delivery method Model can produce best diverse set of personalized results

To represent the set of all matching publications at given instance

1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work

Page 24: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

24

Future work Explore other suitable use-cases to apply proposed model & develop

prototype applications, E.g. Personalized newspaper for every Facebook user Adaptive resource scheduling in large scale distributed system

Exploit overlap among diversified results of users who have similar interest

Develop LSH based index over multi-threaded distributed environment

1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work

Page 25: [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

25

Q&A

THANK YOU!


Recommended