Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 216 times |
Download: | 0 times |
BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS
The Problem:
How to help people navigate and organize the world’s information?
BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS
The SIMS Solution
Focus on METADATA
System Support for Structured Search
Search UserInterfaces
Cheshire
Flamenco
Community-basedMetadata Creation
MMM
Content Analysisfor Metadata
Creation
Mamba
BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS
Example: Search and Navigation of Large Collections
ImageCollections
E-GovernmentSites
Example: the University of California Library Catalog
Shopping Sites
Digital Libraries
BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS
What do we want done differently?
• Organization of results• Hints of where to go next• Flexible ways to move around
• … How to structure the information?
BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS
How to Structure Information for Search and Browsing?
• Hierarchy is too rigid
• KL-One is too complex
• Hierarchical faceted metadata:– A useful middle ground
BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS
What are facets?• Sets of categories, each of which describe a
different aspect of the objects in the collection.• Each of these can be hierarchical.• (Not necessarily mutually exclusive nor
exhaustive, but often that is a goal.)
Time/Date Topic RoleGeoRegion
BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS
Facet example: Recipes
Course
Main Course
CookingMethod
Stir-fry
Cuisine
Thai
Ingredient
Red Bell Pepper
Curry
Chicken
BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS
How to Put In an Interface?Some Challenges:
• Users don’t like new search interfaces.
• How to show lots of information without overwhelming or confusing?
BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS
A Solution (The Flamenco Project)
• Use proper HCI methods.
• Organize search results according to the faceted metadata so navigation looks similar throughout
– Easy to see what to go next, were you’ve been
– Avoids empty result sets
– Integrates seamlessly with keyword search
BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS
Usability Studies• Usability studies done on 3 collections:
– Recipes: 13,000 items– Architecture Images: 40,000 items– Fine Arts Images: 35,000 items
• Conclusions:– Users like and are successful with the
dynamic faceted hierarchical metadata, especially for browsing tasks
– Very positive results, in contrast with studies on earlier iterations.
BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS
Post-Test Comparison
15 16
2 30
1 29
4 28
8 23
6 24
28 3
1 31
2 29
FacetedBaseline
Overall Assessment
More useful for your tasksEasiest to useMost flexible
More likely to result in dead endsHelped you learn more
Overall preference
Find images of rosesFind all works from a given period
Find pictures by 2 artists in same media
Which Interface Preferable For:
BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS
Cheshire: System Support forMetadata-based Search
• Cheshire is an XML/SGML Information Retrieval system using probabilistic relevance ranking
• Cheshire3 includes Grid-based data storage and processing support, permitting very large-scale databases and high efficiency while providing effective relevance ranked results
BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS
Cheshire• The system is currently in production use for
many JISC-funded national information services and projects in the UK including:– The Archives Hub– MerseyLibraries– Resource Discovery Network (RDN)– National Center for Text Mining (NaCTeM)
BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS
Mamba:Creating Classifications from Data
• Most approaches are associational– AKA clustering, LSA, LDA, etc.– This leads to poor results when applied to text
• To derive facets, need a different angle– We have a simple approach based on
WordNet
BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS
Our Approach• Leverage the structure of WordNet
Doc
umen
ts
WordNet
Get hypernym
paths
Sel
ect
ter
ms
Build tree
Compresstree
BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS
A New Opportunity• Tagging, folksonomies
– (flickr de.lici.ous)– People are created facets in a decentralized manner– They are assigning multiple facets to items– This is done on a massive scale– This leads naturally to meaningful associations
BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS
Recap
• Organizing and Navigating Information is a huge IT opportunity
• Several research projects at SIMS tackle this with a special perspective: METADATA– System support for efficient search over structured
information– User interfaces using hierarchical faceted metadata– Community-based metadata creation– Automated analysis algorithms for metadata creation
Thank you!