+ All Categories
Home > Documents > SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Date post: 20-Dec-2015
Category:
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
32
SIMS 296a-3: SIMS 296a-3: Aids for Source Selection Aids for Source Selection Carol Butler Carol Butler Fall ‘98 Fall ‘98
Transcript
Page 1: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

SIMS 296a-3:SIMS 296a-3:Aids for Source SelectionAids for Source Selection

Carol Butler Carol Butler

Fall ‘98Fall ‘98

Page 2: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

OutlineOutline

IA InterfacesIA Interfaces

Design PrinciplesDesign Principles

Aids for Source SelectionAids for Source Selection

SavvySearchSavvySearch

HITSHITS

Kohonen mapsKohonen maps

Implications for New ResearchImplications for New Research

Page 3: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

IA Interface should help User:IA Interface should help User:

Express information needs and/or formulate Express information needs and/or formulate queries.queries.

Select among available sources.Select among available sources. Understand search results.Understand search results.

From: User Interfaces and Visualization, by Marti A. Hearst, 1998.

Page 4: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

IA Interface should allow User IA Interface should allow User to:to:

Reassess goals and adjust search strategy.Reassess goals and adjust search strategy. Follow trails with unanticipated results.Follow trails with unanticipated results. Monitor the progress of a search strategy.Monitor the progress of a search strategy. Use output of one action as input to the next.Use output of one action as input to the next.

From: User Interfaces and Visualization, by Marti A. Hearst, 1998.

Page 5: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

Role of Visualization:Role of Visualization:

Communicate more rapidly and effectively.Communicate more rapidly and effectively. TechniquesTechniques

icons and color highlightingicons and color highlighting brushing and linkingbrushing and linking panning and zoomingpanning and zooming focus-plus-contextfocus-plus-context animationanimation

InteractivityInteractivity

From: User Interfaces and Visualization, by Marti A. Hearst, 1998.

Page 6: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

““Visualization of inherently abstract information Visualization of inherently abstract information is more difficult, and visualization of textually is more difficult, and visualization of textually represented information is especially represented information is especially challenging.”challenging.”

From: User Interfaces and Visualization, by Marti A. Hearst, 1998.

Page 7: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

Starting Points for SearchStarting Points for Search

Lists of sources (Lexis-Nexis)Lists of sources (Lexis-Nexis) OverviewsOverviews

ClustersClusters Category Hierarchies/Subject CodesCategory Hierarchies/Subject Codes Co-citation LinksCo-citation Links

ExamplesExamples Automatic source selectionAutomatic source selection

Page 8: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

Last Week’s ReadingsLast Week’s Readings

Overviews via Category HierarchiesOverviews via Category Hierarchies HIBROWSE (Pollitt 97)HIBROWSE (Pollitt 97) Cat-A-Cone (Hearst 97)Cat-A-Cone (Hearst 97)

Page 9: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

Today’s ReadingsToday’s Readings

Automatic Source SelectionAutomatic Source Selection SavvySearch (Howe & Dreilinger 97)SavvySearch (Howe & Dreilinger 97)

Overviews via co-citation hyperlinksOverviews via co-citation hyperlinks HITS (Kleinberg et al. 97)HITS (Kleinberg et al. 97)

Overviews via clustersOverviews via clusters Kohonen maps (Chen et al. 97)Kohonen maps (Chen et al. 97)

Page 10: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

SavvySearchSavvySearch

Addresses problems with meta-search Addresses problems with meta-search engines.engines. reduce burden on user … butreduce burden on user … but may waste computational and Web may waste computational and Web

resourcesresources Carefully selects search engines likely to Carefully selects search engines likely to

return useful results.return useful results.

Page 11: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

Options provided by Options provided by interfaceinterface Sources and types of information.Sources and types of information. Treatment of query terms.Treatment of query terms. Display of results.Display of results. Interface language.Interface language. View interfaceView interface..

Page 12: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

Query ProcessingQuery Processing Reasoning about available resources Reasoning about available resources

modify concurrency (number of search modify concurrency (number of search engines queried in parallel)engines queried in parallel)

network load estimates (lookup table, time)network load estimates (lookup table, time) local CPU load (UNIX local CPU load (UNIX uptimeuptime command) command)

Ranking search enginesRanking search engines learned associations between search learned associations between search

engines and query terms (stored in a meta-engines and query terms (stored in a meta-index)index)

recent data on performancerecent data on performance

Page 13: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

Meta-IndexMeta-Index

No Results No Results search engine failed to return linkssearch engine failed to return links reduces confidence that this engine is reduces confidence that this engine is

appropriate for particular queryappropriate for particular query effectiveness values are reducedeffectiveness values are reduced

VisitsVisits number of links explored by usernumber of links explored by user indicates user found some links to be indicates user found some links to be

interesting and increases confidenceinteresting and increases confidence

Page 14: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

Future DevelopmentFuture Development

Meta-search will need to be personalized Meta-search will need to be personalized and embedding in other systems.and embedding in other systems.

Experimental versionExperimental version divides search into divides search into categories, with separate sets of rules for categories, with separate sets of rules for creating a search plan.creating a search plan.

•Web Indexes•Web Directories•Usenet News•Software

•People•Reference•Entertainment•Technical Reports

Page 15: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

Hyperlink-Induced Topic Hyperlink-Induced Topic Search (HITS)Search (HITS) System for locating authoritative web System for locating authoritative web

sourcessources Two premises:Two premises:

Implicit annotation provided by creators of Implicit annotation provided by creators of hyperlinks contains sufficient information to hyperlinks contains sufficient information to infer a notion of “authority.infer a notion of “authority.

Sufficiently broad topics contain embedded Sufficiently broad topics contain embedded communities of hyperlinked pages.communities of hyperlinked pages.

Page 16: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

HITSHITS

Two types of pagesTwo types of pages AuthoritiesAuthorities

highly referenced pages on the topichighly referenced pages on the topic

HubsHubs pages that “point” to many of the authoritiespages that “point” to many of the authorities

Mutually reinforcing relationshipsMutually reinforcing relationships Starts from a user-supplied queryStarts from a user-supplied query

Page 17: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

HITS methodHITS method Base set of pages returned by search engineBase set of pages returned by search engine Add pages that point to, or are pointed to by, Add pages that point to, or are pointed to by,

any page in base setany page in base set Assign each page a Assign each page a hub weight h(p)hub weight h(p) and and

authority weight a(p) authority weight a(p) (initialize to 1)(initialize to 1) For each page:For each page:

Replace Replace a(p)a(p) by the sum of the by the sum of the h()h()’s of all pages pointing to it’s of all pages pointing to it Replace Replace h(p)h(p) by the sum of the by the sum of the a()a()’s of all pages pointed to ’s of all pages pointed to

by itby it

RepeatRepeat

Page 18: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

HITS resultsHITS results Broad topics tend to have robust structureBroad topics tend to have robust structure

astrophysicsastrophysics Michael JordanMichael Jordan

Generalizes topics not sufficiently broadGeneralizes topics not sufficiently broad Dennis RitchieDennis Ritchie

Density of linkage on a topic influences authority/hub Density of linkage on a topic influences authority/hub structurestructure English literature vs. German literatureEnglish literature vs. German literature

Web-centric topicsWeb-centric topics cryptographycryptography

CommercializationCommercialization tennistennis

Page 19: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

Future DevelopmentFuture Development

Study temporal evolution of communities Study temporal evolution of communities on the Web.on the Web.

Combining text and the structure of Combining text and the structure of hyperlinks.hyperlinks. text within <href>text within <href> text near hyperlinktext near hyperlink

CLEVERCLEVER project at IBM Almaden Research project at IBM Almaden Research CenterCenter

Page 20: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

Automatically Generated Automatically Generated Concept Space (Kohonen Concept Space (Kohonen map and ET-Space map and ET-Space Thesaurus)Thesaurus)

IR users need:IR users need: Working knowledge of the system where the Working knowledge of the system where the

information is storedinformation is stored how to navigatehow to navigate how info is categorized or organizedhow info is categorized or organized

Knowledge of the subject of interestKnowledge of the subject of interest particularly the vocabulary of the subject domainparticularly the vocabulary of the subject domain

Page 21: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

Browsing vs. SearchingBrowsing vs. Searching BrowsingBrowsing

users rely on mental modelsusers rely on mental models embedded digression problemembedded digression problem

SearchingSearching content-basedcontent-based two basic approachestwo basic approaches

keyword searchkeyword search combined keyword search and categorizationcombined keyword search and categorization

vocabulary differences problemvocabulary differences problem

Page 22: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

User Aids for BrowsingUser Aids for Browsing DirectoriesDirectories

categories limited in granularitycategories limited in granularity categories limited in timelinesscategories limited in timeliness creating categories is manual, slow, and creating categories is manual, slow, and

cumbersomecumbersome Kohonen self-organizing map (SOM)Kohonen self-organizing map (SOM)

generates clusters of important conceptsgenerates clusters of important concepts

Page 23: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

Concept “Landscapes”Concept “Landscapes”

Pharmacology

Anatomy

Legal

Disease

Hospitals

Built using Kohonen Feature MapsXia Lin, H.C. Chenslide by Marti Hearst

Page 24: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

User Aids for SearchingUser Aids for Searching Query expansionQuery expansion Relevance feedbackRelevance feedback Multidimensional scalingMultidimensional scaling

metric similarity modelingmetric similarity modeling latent semantic indexinglatent semantic indexing

Thesauri useThesauri use incorporating existing thesauriincorporating existing thesauri automatic thesaurus generationautomatic thesaurus generation

Page 25: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

Automatic Thesaurus Automatic Thesaurus GenerationGeneration

Statistical co-occurrenceStatistical co-occurrence Cluster analysis further groups termsCluster analysis further groups terms Chen et al.Chen et al.

document collectiondocument collection automatic indexingautomatic indexing co-occurrence analysisco-occurrence analysis associative retrievalassociative retrieval

Et-Space WebpageEt-Space Webpage

Page 26: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

Experiment with YahooExperiment with Yahoo Browsing tested with Kohonen SOMBrowsing tested with Kohonen SOM

subjects who started with Yahoo were less subjects who started with Yahoo were less successful in repeating the task with the SOM successful in repeating the task with the SOM than vice versathan vice versa

useful more for broad exploring than for useful more for broad exploring than for searchingsearching

Searching tested with AGTSearching tested with AGT suggested terms came from web pagessuggested terms came from web pages most useful in further refining an initially too most useful in further refining an initially too

broad searchbroad search

Page 27: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

Future DevelopmentFuture Development Effects of different information sourcesEffects of different information sources

cohesioncohesion consistent with user’s mental modelconsistent with user’s mental model

User Interface designUser Interface design flexibilityflexibility spelling errors and typosspelling errors and typos pan-zoompan-zoom help screens or instructions (or more help screens or instructions (or more

intuitive design, or both)intuitive design, or both)

Page 28: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

Review and DiscussionReview and Discussion

OverviewsOverviews Category LabelsCategory Labels

when docs stored “inside” categories, when docs stored “inside” categories, users cannot create queries based on users cannot create queries based on combinations of categoriescombinations of categories

display of hierarchies takes up large display of hierarchies takes up large amounts of screen spaceamounts of screen space

tightly coupled with queries?tightly coupled with queries? Other starting pointsOther starting points

Page 29: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

Overviews in the User Overviews in the User InterfaceInterface Unsupervised Groupings Unsupervised Groupings

ClusteringClustering Kohonen Feature MapsKohonen Feature Maps

Supervised CategoriesSupervised Categories Yahoo!Yahoo! SuperbookSuperbook HiBrowseHiBrowse Cat-a-ConeCat-a-Cone

CombinationsCombinations DynaCatDynaCat SONIASONIA

Page 30: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

Category Labels Category Labels (from Hearst slide)(from Hearst slide)

Advantages:Advantages: InterpretableInterpretable Capture summary informationCapture summary information Describe multiple facets of contentDescribe multiple facets of content Domain dependent, and so descriptiveDomain dependent, and so descriptive

DisadvantagesDisadvantages Do not scale well (for organizing documents)Do not scale well (for organizing documents) Domain dependent, so costly to acquireDomain dependent, so costly to acquire May mis-match users’ interestsMay mis-match users’ interests

Page 31: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

Other Starting Points Other Starting Points ApproachesApproaches

Co-citation LinksCo-citation Links Examples, Guided ToursExamples, Guided Tours

Page 32: SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Carol ButlerFall 98

Review and Discussion Review and Discussion (cont..)(cont..)

Interface DesignInterface Design VisualizationVisualization

textual vs. 2D spatial representationtextual vs. 2D spatial representation Search StrategiesSearch Strategies

integration with non-search parts of integration with non-search parts of process (reading, annotating, analysis)process (reading, annotating, analysis)

Evaluation MethodologyEvaluation Methodology


Recommended