+ All Categories
Home > Documents > DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web...

DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web...

Date post: 11-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
15
DEWS: A Decentralized Engine for Web Search Presented by Prof. Raouf Boutaba
Transcript
Page 1: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter

DEWS: A Decentralized Engine for Web Search

Presented by

Prof. Raouf Boutaba

Page 2: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter

Web Search : Today

• Contemporary Web Search:

– Logically centralized

– Company controlled

• Problems

– Censorship

– Biased ranking

– Privacy

Page 3: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter

Web Search : Decentralization

• Using P2P networks – YacY, Faroo

– Search overhead

– Churn

• DEWS:

– P2P network between Webservers not end-hosts

– Both decentralized and stable

Page 4: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter

Challenges

• Indexing the voluminous Web

• Resolving Web queries

• Ranking search results

• Incremental retrieval

DEWS addresses the first 3 Challenges

Page 5: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter

Conceptual Overview

DHT DHT WS

WS WS

WS

WS

WS

Hosted contents

Web Server (WS) DHT: - Pros:

- Very stable - 1 or 2 hop lookup via link cache

- Cons: - Additional overhead on WS

- Content index - links to other WS

WS WS WS

WS

WS WS Crawl Crawl

Search portal

Page 6: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter

Plexus DHT

• Why Plexus[1]? – Efficient routing with dynamic load-balancing

– Supports approximate matching

• How Plexus works: – Generates a bit-pattern from advertisement/query keywords

– Decodes this pattern to codewords using a Linear Binary Code

– Routes using the generator matrix of the LBC

• Modification to Plexus routing – DEWS aggregates routing messages and packs multiple

queries in one message

[1] R. Ahmed and R. Boutaba. Plexus: A Scalable Peer‐to‐peer Protocol Enabling Efficient Subset Search.

In IEEE/ACM Transactions on Networking (TON). IEEE Press, Vol. 17(1), pp. 130-143, February 2009.

Page 7: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter

Indexing Mechanism

website

codeword

Website index

node

hash

Plexus Routing

Base URL Keywords

Pattern

Inverted index

nodes

codewords

Plexus Routing

DMP, n-gram

Bloom-filter

List decoding

Used for Decentralized PageRank

Used for Keyword Relevance

Page 8: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter

Decentralized PageRank

8

Plexus Overlay

Hyperlink structure

Hash-map

Soft-link

ui

vi2 vi1

(vi1)

(vi2)

(ui)

URL/website

Hyper link

Web Server (index node)

Overlay link

ui

Other nodes in the graph

Other nodes in the graph

ui1 ui2 ui3

vi1 vi2

Page 9: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter

Distributed Inverted Index

9

Overlay

Hash-map

Soft-link

ui , {vi1 , (vi1 )}, {<ki1, ri1 >, ...<ki2, rig >}

(ui) (vi1)

(ui)

(vi1) (vi2) (vit) …

<kij , ui , rij , (ui)}>

(ki1)

(ki2)

( k ) rep

i1 … ( k )

rep

i2 ( k ) rep

ig

Page 10: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter

Resolving Web Query

Pattern

Inverted index

nodes

codewords

Plexus Routing

DMP, n-gram

Bloom-filter

List decoding

Keyword-1

Query keywords

Pattern

Inverted index

nodes

codewords

Keyword-2

query keyword

1 if ql is in ui; 0 otherwise

Pagerank weight of ui

Relevance of ui

to ql

Page 11: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter

Evaluation

• Simulation Setup

– Web Track dataset from LETOR 3.0

• ~ 1 million webpages and ~11 million hyperlinks

– WS network size – up to 100,000 nodes.

• Measurements

– Routing performance: scalability & overheads

– Ranking performance: accuracy & convergence rate

– Search performance : flexibility & accuracy

• Here we present two important results

Page 12: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter

Routing Performance

Advertisement Scalability

Observations: • Advertisement hops do not increase

significantly with network size

(ui)

(vi1) (vi2) (vit) …

( k ) rep

i2

(ui)

( k ) rep

i1 ( k ) rep

ij

Indexing ui

Indexing kijrep

• URL advertisement requires more hops than keyword advertisement

• Route aggregation in DEWS significantly reduces advertisement overhead Original Plexus

Modified Plexus in DEWS

Page 13: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter

Ranking Accuracy

Observations: • Spearman’s footrule distance decays rapidly

with simulation time, which indicates fast convergence of our distributed ranking algorithm

σ1 σ2

σ1(3)=3

σ2(3)=1

Ranking Accuracy

• Variation in Top-20 and Top-100 elements is not high => DEWS is close to centralized ranking

Page 14: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter

Summary

• DEWS is a self-indexing architecture for the Web

– provides censorship resistance

– delivers unbiased ranking of search results

– makes it hard to track users’ search history

• Future Research:

– Support for incremental retrieval in DEWS

• Can be achieved by gradually increasing decoding radius in Plexus routing.

– Develop a working prototype of DEWS and deploy in the Web

Page 15: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter

Questions?


Recommended