Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 217 times |
Download: | 0 times |
Partitioning Search-Engine Returned Citations for Proper-Noun Queries
Reema Al-Kamha
Supported by NSF
The Problem
Search engines return too many citations Example: “Bonnie Lake” Google returns around 800 citations
Citations ranked best first Many refer to the same object Can we partition by same object?
Proper Noun Queries Discard citations not of the right kind Partition the rest by same object Retain the best-first ranking
Classification Group 1: those of the chosen kind Group 2: those not of the chosen kind
Partition Three facets
Attributes Links Page Similarity
Sub-facets for each facet Confidence Matrix for each sub-facet (Weighted) Mean for each facet
Final Confidence Matrix
Solution
Attributes
Attribute(s) (One-to-One) Latitude and longitude
Single Attribute (Functional Determination) Province with a lake’s name
Multiple Attributes (Functional Determination) Campground name and highway with a lake’s name
Attributes (Nonfunctional Determination) Country with a lake’s name
Distinguishing Attribute State for a lake
Links
Returned citations that link together
Returned citations that have a common URL prefix:
same Host, same File name, and same URL.
example of Host:
http://www.cs.byu.edu/info/dwembley.html
http://www.cs.byu.edu/info/directory.php
example of File:
http://sunsite.unc.edu/javafaq/oldnews.html
http://helios.oit.unc.edu/javafaq/oldnews.html
1 2 3 4 5 6 7 8
1 1 .50 .50 .89 .50 .50 .50 .50
2 1 .50 .50 .50 .50 .50 .50
3 1 .50 .50 .50 .50 .50
4 1 .50 .50 .50 .50
5 1 .50 .50 .50
6 1 .50 .50
7 1 .50
8 1
Confidence Matrix forReturned Citations that Link Together
1 4
Page Similarity
Similarity between each two returned citations
Similarity between two citations-referenced documents
1 2 3 4 5 6 7 8
1 1 0 0 1 0 0 0 0
2 .00 1 .22 .00 .36 .01 .00 .41
3 .00 .00 1 .00 .99 .00 .00 .00
4 1 0 0 1 0 0 0 0
5 0 .00 .99 0 1 .00 .00 .00
6 .33 .00 .29 .00 .22 1 .00 .56
7 .00 .00 .01 .00 .01 .00 1 .99
8 .00 .00 .00 .00 .99 .00 .00 1
Confidence Matrix forSimilarity between two Citation-Referenced Documents
1 2 3 4 5 6 7 8
1 1 .00 .00 1 0 .17 .00 .00
2 1 .11 .00 .18 .01 .00 .21
3 1 .00 1.00 .15 .01 .00
4 1 0 .00 .00 .00
5 1 .11 .01 .50
6 1 .00 .08
7 1 .50
8 1
Modified Confidence Matrix forSimilarity between two Citation-Referenced Documents
Final Matrix
1 2 3 4 5 6 7 8
1 1 .25 .25 .95 .25 .34 .25 .25
2 1 .30 .25 .34 .26 .25 .36
3 1 .25 .74 .36 .26 .25
4 1 .25 .25 .25 .25
5 1 .30 .26 .50
6 1 .25 .29
7 1 .50
8 1
{3,5,7,8} {6}
1,4 3,5 5,8 7,8
{2}{1,4}
Measurements
Classification ( Percent correctly classified)
Number of Partitions (Precision and Recall)
Each Partition (Precision and Recall)