+ All Categories
Home > Documents > Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1...

Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1...

Date post: 02-Apr-2015
Category:
Upload: jamison-gadsden
View: 217 times
Download: 4 times
Share this document with a friend
Popular Tags:
20
Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1 , Guoliang Li 2 , Chen Li 1 , Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University
Transcript
Page 1: Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University.

Efficient Interactive Fuzzy Keyword Search

Shengyue Ji1, Guoliang Li2, Chen Li1, Jianhua Feng2

1 University of California, Irvine2 Tsinghua University

Page 2: Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University.

UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng

Too many

results!

Traditional Keyword Search

No result!

Complicated and stillno result!

Page 3: Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University.

UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng

Interactive Fuzzy Keyword Search

Features: Interactive: data exploration Fuzzy: error tolerant Multiple keywords: search

on-the-fly

Page 4: Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University.

UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng

Fundamentals

Data R: a set of records W: a set of distinct words

Query Q = {p1, p2, …, pl}: a set of prefixes δ: Edit-distance threshold

Query result RQ: a set of records such that each record has

all query prefixes or their similar forms (conjunctive)

Page 5: Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University.

UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng

Contributions / Outline

Step 1 Incremental fuzzy prefix matching

Step 2 Multi-prefix intersection methods Cache-based prefix intersection

Page 6: Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University.

UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng

Observation

W = {exam, example, exemplar, exempt, sample}

δ = 2Prefix Distanc

e

exam 2

examp 1

exampl 0

example 1

exemp 2

exempt 2

exempl 1

exempla 2

sampl 2

Prefix Distance

examp 2

exampl 1

example 0

exempl 2

exempla 2

sample 2

delete e

delete e

match e

delete e

substitute e with a

match e

Q’ = exampl Q = example

Page 7: Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University.

UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng

Trie Indexing

Computing set of active nodes ΦQ

Initialization Incremental step

e

x

a

m

p

l

$

$

e

m

p

l

a

r

$

t

$

s

a

m

p

l

e

$

Prefix Distance

examp 2

exampl 1

example 0

exempl 2

exempla 2

sample 2

Activ

e n

odes fo

r Q =

exam

ple

e

2

1

0

2

2

2

Page 8: Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University.

UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng

Initialization

Q = ε

e

x

a

m

p

l

$

$

e

m

p

l

a

r

$

t

$

s

a

m

p

l

e

$

Prefix Distance

0

1 1

2 2

Prefix Distance

0

e 1

ex 2

s 1

sa 2

Prefix Distance

ε 0

Initializing Φε with all nodes within in depth of δ

e

Page 9: Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University.

UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng

Incremental Computation: Algorithm

Incremental computation from ΦQ’ to ΦQ

add(ΦQ , <n, d>) has effect only if there exists no active node in ΦQ with the same n and smaller d

FOR EACH <n, d> FROM ΦQ’

Deletion add(ΦQ , <n, d+1>)

Substitution

FOR EACH n’ FROM non-matching children of n

add(ΦQ , <n’, d+1>)

Match add(ΦQ , <m, d>)(m is the matching child of n)

Insertion FOR EACH m’ FROM descendents of madd(ΦQ , <m’, d+x>)(x is the distance from m’ to m)

Algorithm Details

Page 10: Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University.

UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng

e

Incremental Computation: Example

Q = e

e

x

a

m

p

l

$

$

e

m

p

l

a

r

$

t

$

s

a

m

p

l

e

$

Prefix Distance

ε 0

e 1

ex 2

s 1

sa 2Prefix

# Op

Base

Op

ε 1 ε del e

s 1 ε sub e/s

e 0 ε mat e

ex 1 ε ins x

exa 2 ε Ins xa

exe 2 ε Ins xe

Prefix

# Op

Base

OpPrefix

# Op

Base

Op

ε 1 ε del e

Prefix

# Op

Base

Op

ε 1 ε del e

s 1 ε sub e/s

Prefix

# Op

Base

Op

ε 1 ε del e

s 1 ε sub e/s

e 0 ε mat e

1

10

1

2 2

e 2 e del e

ex 2 e sub e/x

ex 3 ex del e

exa 3 ex sub e/aexe 2 ex mat e

s 2 s del e

sa 2 s sub e/a

sa 3 sa del e

Activ

e n

odes fo

r Q =

ε Activ

e n

odes fo

r Q =

e

2

Page 11: Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University.

UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng

Incremental Computation: Discussion

Insertions Needed after matches Not needed after deletions and substitutions

deletions and insertions do not co-occur in adjacent positions

adjacent substitutions and insertions are interchangeable

Correctness and Completeness Can be proved by reducing from/to edit-distance

computation

Page 12: Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University.

UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng

Outline

Step 1 Incremental fuzzy prefix matching

Step 2 Multi-prefix intersection methods Cache-based prefix intersection

Page 13: Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University.

UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng

Multi-Prefix Intersection

Q = vldb li

Multi-prefix intersection To return records such

that each record has all query keywords as prefixes (or their similar forms)

ID Record

1 Li data…

2 data…

3 data Lin…

4 Lu Lin Luis…

5 Liu…

6 VLDB Lin data…

7 VLDB…

8 Li VLDB…

6 VLDB Lin data…

8 Li VLDB…

Page 14: Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University.

UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng

Multi-Prefix Intersection: Method 1

ID Record

1 Li data…

2 data…

3 data Lin…

4 Lu Lin Luis…

5 Liu…

6 VLDB Lin data…

7 VLDB…

8 Li VLDB…

d

a

t

a

$

l

i

n u

$

u

$

v

l

d

b

$

1236

5

4 678

$

346

i

s

$

18

$

4

1 3 4 5 6 8

6 7 8livldb

6 8

Q = vldb li

Space cost Inverted index

Time cost Union + intersection

Page 15: Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University.

UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng

Multi-Prefix Intersection: Method 2

Forward List

1 2

1

1 3

3 5 6

4

1 3 7

7

2 7

d

a

t

a

$

l

i

n u

$

u

$

v

l

d

b

$

1236

5

4 678

$

346

i

s

$

18

$

4

ID Record

1 Li data…

2 data…

3 data Lin…

4 Lu Lin Luis…

5 Liu…

6 VLDB Lin data…

7 VLDB…

8 Li VLDB…

[1, 7]

[1, 1]

[1, 1]

[1, 1]

[1, 1]

[2, 6]

[2, 4]

1

2

3 4

5

6 7

[3, 3] [4, 4]

[5, 6]

[6, 6]

[6, 6]

[7, 7]

[7, 7]

[7, 7]

[7, 7]

Q = vldb li

678 [2, 4]

Read each Verify/Probe

6 VLDB Lin data…

1 3 7

8 Li VLDB… 2 7

Space cost Inverted + forward index

Time cost Probing forward lists

Page 16: Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University.

UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng

Experimental Results

Computing similar prefixes

Page 17: Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University.

UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng

Experimental Results

Multi-prefix intersection

Page 18: Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University.

UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng

Experimental Results

Overall scalability

Page 19: Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University.

Questions?

Thank You!

Questions?

UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng

TASTIER: Efficient Auto-Completion, Type-Ahead Searchhttp://tastier.ics.uci.edu/

Page 20: Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University.

UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng


Recommended