+ All Categories
Home > Documents > Partitioning Search-Engine Returned Citations for Proper-Noun Queries Reema Al-Kamha Supported by...

Partitioning Search-Engine Returned Citations for Proper-Noun Queries Reema Al-Kamha Supported by...

Date post: 20-Dec-2015
Category:
View: 217 times
Download: 0 times
Share this document with a friend
Popular Tags:
17
Partitioning Search-Engine Returned Citations for Proper-Noun Queries Reema Al-Kamha Supported by NSF
Transcript

Partitioning Search-Engine Returned Citations for Proper-Noun Queries

Reema Al-Kamha

Supported by NSF

The Problem

Search engines return too many citations Example: “Bonnie Lake” Google returns around 800 citations

Citations ranked best first Many refer to the same object Can we partition by same object?

Proper Noun Queries Discard citations not of the right kind Partition the rest by same object Retain the best-first ranking

“Bonnie Lake” Query to Google

The Interface

“Bonnie Lake” Query Result

Classification Group 1: those of the chosen kind Group 2: those not of the chosen kind

Partition Three facets

Attributes Links Page Similarity

Sub-facets for each facet Confidence Matrix for each sub-facet (Weighted) Mean for each facet

Final Confidence Matrix

Solution

Attributes

Attribute(s) (One-to-One) Latitude and longitude

Single Attribute (Functional Determination) Province with a lake’s name

Multiple Attributes (Functional Determination) Campground name and highway with a lake’s name

Attributes (Nonfunctional Determination) Country with a lake’s name

Distinguishing Attribute State for a lake

Links

Returned citations that link together

Returned citations that have a common URL prefix:

same Host, same File name, and same URL.

example of Host:

http://www.cs.byu.edu/info/dwembley.html

http://www.cs.byu.edu/info/directory.php

example of File:

http://sunsite.unc.edu/javafaq/oldnews.html

http://helios.oit.unc.edu/javafaq/oldnews.html

1 2 3 4 5 6 7 8

1 1 .50 .50 .89 .50 .50 .50 .50

2 1 .50 .50 .50 .50 .50 .50

3 1 .50 .50 .50 .50 .50

4 1 .50 .50 .50 .50

5 1 .50 .50 .50

6 1 .50 .50

7 1 .50

8 1

Confidence Matrix forReturned Citations that Link Together

1 4

Page Similarity

Similarity between each two returned citations

Similarity between two citations-referenced documents

1 2 3 4 5 6 7 8

1 1 0 0 1 0 0 0 0

2 .00 1 .22 .00 .36 .01 .00 .41

3 .00 .00 1 .00 .99 .00 .00 .00

4 1 0 0 1 0 0 0 0

5 0 .00 .99 0 1 .00 .00 .00

6 .33 .00 .29 .00 .22 1 .00 .56

7 .00 .00 .01 .00 .01 .00 1 .99

8 .00 .00 .00 .00 .99 .00 .00 1

Confidence Matrix forSimilarity between two Citation-Referenced Documents

1 2 3 4 5 6 7 8

1 1 .00 .00 1 0 .17 .00 .00

2 1 .11 .00 .18 .01 .00 .21

3 1 .00 1.00 .15 .01 .00

4 1 0 .00 .00 .00

5 1 .11 .01 .50

6 1 .00 .08

7 1 .50

8 1

Modified Confidence Matrix forSimilarity between two Citation-Referenced Documents

Final Matrix

1 2 3 4 5 6 7 8

1 1 .25 .25 .95 .25 .34 .25 .25

2 1 .30 .25 .34 .26 .25 .36

3 1 .25 .74 .36 .26 .25

4 1 .25 .25 .25 .25

5 1 .30 .26 .50

6 1 .25 .29

7 1 .50

8 1

{3,5,7,8} {6}

1,4 3,5 5,8 7,8

{2}{1,4}

“Bonnie Lake”—Results

Measurements

Classification ( Percent correctly classified)

Number of Partitions (Precision and Recall)

Each Partition (Precision and Recall)

Current Implementation Status

Interface

Google connection

Citations retrieval

Page retrieval

Contribution

Solve one type of object-identity problem

Provide an additional tool for search engine queries


Recommended