Date post: | 10-May-2015 |
Category: |
Technology |
Upload: | anatoliy-gruzd |
View: | 666 times |
Download: | 0 times |
Improving Search Engines using
Online Communities
Anatoliy Gruzd <[email protected]>
Research ForumGraduate School of Library and Information Science
University of Illinois, Urbana-Champaign, IL March 14, 2007
It takes an [Internet] village …
Anatoliy Gruzd Community-created metadata
AgendaAgenda
1. Common search problems
2. Online bookmarking - http://del.icio.us
3. Pilot Study
4. Future work
Anatoliy Gruzd Community-created metadata
Common search problems
The main drawback of all modern search engines is that they force
the user to guess words that might appear in all relevant documents
and at the same time will not appear in NON-relevant documents.
1. A relevant page will not be retrieved, if it does not contain keywords that the user chose for searching.
2. Even If user’s search keywords are found inside a web page, it does not always mean that the page is relevant to the user.
Anatoliy Gruzd Community-created metadata
Query#1: weight loss
User’s Query
Web page
MatchingMatching
Results
weight loss
weight loss ???
Architecture of a typical search engine
Anatoliy Gruzd Community-created metadata
Query#1: weight loss• http://www.paleofood.com/
Recipes are: grain-free, bean-free, potato-free, dairy-free, and sugar-free.
Anatoliy Gruzd Community-created metadata
Query#2: assignment about "human brain" for homeschooling
This is an instructor’s blog for a Human Development class in the Evergreen
State College. The page was retrieved because of two unrelated postings titled
“Homeschoolers use selective socialization” and
“Part Of Human Brain Functions Like A Digital Computer”.
This is an instructor’s blog for a Human Development class in the Evergreen
State College. The page was retrieved because of two unrelated postings titled
“Homeschoolers use selective socialization” and
“Part Of Human Brain Functions Like A Digital Computer”.
Anatoliy Gruzd Community-created metadata
AgendaAgenda
1. Common search problems
2. Online bookmarking - http://del.icio.us
3. Pilot Study
4. Future work
Anatoliy Gruzd Community-created metadata
Anatoliy Gruzd Community-created metadata
username
Anatoliy Gruzd Community-created metadata
Common Tags forhttp://www.paleofood.com/
• ethnic • evolutionary eating • food • allergies • german • naturopathic • primitivism • weight loss
• ethnic • evolutionary eating • food • allergies • german • naturopathic • primitivism • weight loss
Tag
Tag
Tag
Anatoliy Gruzd Community-created metadata
User’s Query
Web page
MatchingMatching
Results
Tags
weight loss
weight loss ???
Anatoliy Gruzd Community-created metadata
AgendaAgenda
1. Common search problems
2. Online bookmarking - http://del.icio.us
3. Pilot Study
4. Future work
Anatoliy Gruzd Community-created metadata
Pilot Study
User’s Query
Web page
MatchingMatching
Results A
Tags
MatchingMatching
Results B
System A System B
Anatoliy Gruzd Community-created metadata
Pilot Study
• Search engine – Indri, a cooperative effort between the University of
Massachusetts and Carnegie Mellon University
• Search queries – ~20-30 Users’ real questions found on the
Internet
• Pilot dataset– 454 health-related web pages
Anatoliy Gruzd Community-created metadata
115 /Neurological_Disorders
101 /Cancer
54 /Immune_Disorders/Immune_Deficiency
53 /Endocrine_Disorders
35 /Cardiovascular_Disorders
26 /Respiratory_Disorde
23 /Digestive_Disorders
“The Open Directory Project is the largest, most comprehensive human-edited directory of the Web.”
http://dmoz.org
Started with ~64,000 URLs (from Top/Health/Conditions_and_Diseases)-> only 544 are bookmarked by del.icio.us users
-> only 454 were accessible at the time of my experiment
Started with ~64,000 URLs (from Top/Health/Conditions_and_Diseases)-> only 544 are bookmarked by del.icio.us users
-> only 454 were accessible at the time of my experiment
Pilot dataset: 454 health-related web pages
Anatoliy Gruzd Community-created metadata
Noise in Tags
• toread• todo• interesting• imported• safari_export• system:unfiled• .imported
Anatoliy Gruzd Community-created metadata
Compound tags
• generalhealth• computersoftware
• cancerpatients-supportgroups• highbloodpressure
• whoiwanttosharewith
Anatoliy Gruzd Community-created metadata
Keywords-based Tags-based
1. (---) /term "assignment" 2. (---) /term "brain [center]" 3. (+++) Neuroscience For Kids -
Explore the nervous system
1. (+++) Neuroscience For Kids - Explore the nervous system
2. (+++) 3. (+++)
Common tags
anatomy
psychology
biology
cognitive
education
reference
medical
human
homeschool
Web page
Matching
Results A
System A
Tags
Matching
Results B
System B
Anatoliy Gruzd Community-created metadata
AgendaAgenda
1. Common search problems
2. Online bookmarking - http://del.icio.us
3. Pilot Study
4. Future work
Anatoliy Gruzd Community-created metadata
Future work
• Use a larger dataset
• Compare results across different subject domains and genres
• Explore ways to combine tags and keywords to determine whether it will improve the quality of results (if at all)