+ All Categories
Home > Documents > Search@SIMS A metadata-based approach

Search@SIMS A metadata-based approach

Date post: 17-Jan-2016
Category:
Upload: tiva
View: 18 times
Download: 0 times
Share this document with a friend
Description:
Search@SIMS A metadata-based approach. Marti Hearst Associate Professor. BT Visit August 18, 2005. The Problem:. How to help people navigate and organize the world’s information?. The SIMS Solution. Focus on METADATA. Content Analysis for Metadata Creation. Community-based Metadata - PowerPoint PPT Presentation
Popular Tags:
40
Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005
Transcript
Page 1: Search@SIMS A metadata-based approach

Search@SIMSA metadata-based

approach

Marti HearstAssociate Professor

BT Visit August 18, 2005

Page 2: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

The Problem:

How to help people navigate and organize the world’s information?

Page 3: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

The SIMS Solution

Focus on METADATA

System Support for Structured Search

Search UserInterfaces

Cheshire

Flamenco

Community-basedMetadata Creation

MMM

Content Analysisfor Metadata

Creation

Mamba

Page 4: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Example: Search and Navigation of Large Collections

ImageCollections

E-GovernmentSites

Example: the University of California Library Catalog

Shopping Sites

Digital Libraries

Page 5: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 6: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 7: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 8: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

What do we want done differently?

• Organization of results• Hints of where to go next• Flexible ways to move around

• … How to structure the information?

Page 9: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

How to Structure Information for Search and Browsing?

• Hierarchy is too rigid

• KL-One is too complex

• Hierarchical faceted metadata:– A useful middle ground

Page 10: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

What are facets?• Sets of categories, each of which describe a

different aspect of the objects in the collection.• Each of these can be hierarchical.• (Not necessarily mutually exclusive nor

exhaustive, but often that is a goal.)

Time/Date Topic RoleGeoRegion

Page 11: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Facet example: Recipes

Course

Main Course

CookingMethod

Stir-fry

Cuisine

Thai

Ingredient

Red Bell Pepper

Curry

Chicken

Page 12: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

How to Put In an Interface?Some Challenges:

• Users don’t like new search interfaces.

• How to show lots of information without overwhelming or confusing?

Page 13: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

A Solution (The Flamenco Project)

• Use proper HCI methods.

• Organize search results according to the faceted metadata so navigation looks similar throughout

– Easy to see what to go next, were you’ve been

– Avoids empty result sets

– Integrates seamlessly with keyword search

Page 14: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Art History Images Collection

Page 15: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 16: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 17: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 18: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 19: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 20: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 21: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 22: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 23: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 24: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 25: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 26: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 27: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 28: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Usability Studies• Usability studies done on 3 collections:

– Recipes: 13,000 items– Architecture Images: 40,000 items– Fine Arts Images: 35,000 items

• Conclusions:– Users like and are successful with the

dynamic faceted hierarchical metadata, especially for browsing tasks

– Very positive results, in contrast with studies on earlier iterations.

Page 29: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Post-Test Comparison

15 16

2 30

1 29

   4 28

8 23

6 24

28 3

1 31

2 29

FacetedBaseline

Overall Assessment

More useful for your tasksEasiest to useMost flexible

More likely to result in dead endsHelped you learn more

Overall preference

Find images of rosesFind all works from a given period

Find pictures by 2 artists in same media

Which Interface Preferable For:

Page 30: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Cheshire: System Support forMetadata-based Search

• Cheshire is an XML/SGML Information Retrieval system using probabilistic relevance ranking

• Cheshire3 includes Grid-based data storage and processing support, permitting very large-scale databases and high efficiency while providing effective relevance ranked results

Page 31: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Cheshire• The system is currently in production use for

many JISC-funded national information services and projects in the UK including:– The Archives Hub– MerseyLibraries– Resource Discovery Network (RDN)– National Center for Text Mining (NaCTeM)

Page 32: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Mamba:Creating Classifications from Data

• Most approaches are associational– AKA clustering, LSA, LDA, etc.– This leads to poor results when applied to text

• To derive facets, need a different angle– We have a simple approach based on

WordNet

Page 33: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Example: Recipes (3500 docs)

Page 34: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Stoica & Hearst ’04WordNet-based

Page 35: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Stoica & Hearst ’04WordNet-based

Page 36: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Stoica & Hearst ’04WordNet-based

Page 37: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Our Approach• Leverage the structure of WordNet

Doc

umen

ts

WordNet

Get hypernym

paths

Sel

ect

ter

ms

Build tree

Compresstree

Page 38: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

A New Opportunity• Tagging, folksonomies

– (flickr de.lici.ous)– People are created facets in a decentralized manner– They are assigning multiple facets to items– This is done on a massive scale– This leads naturally to meaningful associations

Page 39: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 40: Search@SIMS A metadata-based approach

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Recap

• Organizing and Navigating Information is a huge IT opportunity

• Several research projects at SIMS tackle this with a special perspective: METADATA– System support for efficient search over structured

information– User interfaces using hierarchical faceted metadata– Community-based metadata creation– Automated analysis algorithms for metadata creation

Thank you!


Recommended