Date post: | 16-Jan-2016 |
Category: |
Documents |
Upload: | austen-small |
View: | 214 times |
Download: | 0 times |
INFO624 -- Week 9Effective Information Retrieval
Dr. Xia LinDr. Xia LinAssistant ProfessorAssistant Professor
College of Information Science and TechnologyCollege of Information Science and Technology
Drexel UniversityDrexel University
Effective Information Retrieval System’s perspectivesSystem’s perspectives
Fast indexing and retrieval algorithmsFast indexing and retrieval algorithmsInverted indexing. Tree structures, Hash Inverted indexing. Tree structures, Hash
tablestables Semantic indexing and mapping Semantic indexing and mapping
Subject indexingSubject indexingLatent semantic indexingLatent semantic indexing
Intelligent information retrievalIntelligent information retrievalKnowledge representationKnowledge representationLogical inferences Logical inferences
Effective Information Retrieval User’s perspectivesUser’s perspectives
Iteration Iteration Relevance FeedbackRelevance Feedback Use User's ProfilesUse User's Profiles Graphical Display of Search ResultsGraphical Display of Search Results Browsing/Interactive SearchingBrowsing/Interactive Searching
We can’t change the user. We should We can’t change the user. We should make the system to adapt to the user’s make the system to adapt to the user’s needsneeds
Iteration Most search needs to be done iterativelyMost search needs to be done iteratively
From the user’s point of viewFrom the user’s point of viewThe first query often does not retrieve The first query often does not retrieve
what the user wantswhat the user wantsThe user needs to see the output of The user needs to see the output of
previous queries to construct the next previous queries to construct the next queryquery
The user often needs to reconstruct his/her The user often needs to reconstruct his/her information needs after they read/browse information needs after they read/browse search results.search results.
Iteration – User’s strategies Modify queries repeatedly based on some goalsModify queries repeatedly based on some goals
Starting with high precisionStarting with high precisionUse a specific query firstUse a specific query firstBroaden queries to include more relevant Broaden queries to include more relevant
documentsdocuments• "pearl growing""pearl growing"
Starting with high recallStarting with high recallUse a very broad queryUse a very broad queryImprove precision graduallyImprove precision gradually"onion peeling""onion peeling"
Starting with known itemsStarting with known itemsFind documents similar to the known itemsFind documents similar to the known items
Browsing/interactive searchingBrowsing/interactive searching
Iteration – System’s strategies If the system can “learn” from the user’s If the system can “learn” from the user’s
activities, the system likely can retrieve activities, the system likely can retrieve better results to meet user’s needs.better results to meet user’s needs. Relevance feedbackRelevance feedback User’s profilesUser’s profiles
The system should provide better output The system should provide better output representations to help the user representations to help the user BrowseBrowse Conduct interactive searches.Conduct interactive searches.
Relevance Feedback Feedback: The user provides information that the system Feedback: The user provides information that the system
can use to modify its next search or next displaycan use to modify its next search or next display Relevant Feedback:Relevant Feedback:
Users let the system knowUsers let the system know
what documents are relevant to their what documents are relevant to their information needsinformation needs
What concepts or terms are related to their What concepts or terms are related to their information needsinformation needs
What weights they would like the system to What weights they would like the system to put on each relevant documents/termsput on each relevant documents/terms
Relevant Feedback – System’s Strategy
The system should invite the user to The system should invite the user to select relevant documents/terms from the select relevant documents/terms from the retrieved results before the second retrieved results before the second retrieval is conductedretrieval is conducted
The system should use information from The system should use information from user's feedback to conduct next search.user's feedback to conduct next search.
Design IR Systems with relevance feedback
Collect relevance feedback throughCollect relevance feedback through Binary vs. scalesBinary vs. scales Positive and negative feedbackPositive and negative feedback
Apply relevance feedback toApply relevance feedback to QueryQuery ProfileProfile DocumentDocument Retrieval algorithmRetrieval algorithm
User Profiles
User profilesUser profiles information about the user’s information about the user’s
information needs that IR system can information needs that IR system can use to modify its search process.use to modify its search process.
Simple user profilesSimple user profiles A list of terms that the user selects to A list of terms that the user selects to
represent his/her information needsrepresent his/her information needs A list of terms with weightsA list of terms with weights
Extended user profilesExtended user profiles More complex term structuresMore complex term structures Information use patternsInformation use patterns levels of interestslevels of interests User’s background informationUser’s background information User’s browsing behaviorsUser’s browsing behaviors
What pages the user has visited last What pages the user has visited last week, last month, …week, last month, …
From which page to which page …From which page to which page …
Use of user Profiles
Selective Dissemination of Information (SDI)Selective Dissemination of Information (SDI) The system regularly runs the search to get The system regularly runs the search to get
any any newnew information that matches user’s information that matches user’s profiles.profiles.
The user can set up several profilesThe user can set up several profilesOnce they are set up, the queries are Once they are set up, the queries are
always the same.always the same. The user can set the frequency of the update The user can set the frequency of the update
searches.searches.
SDI Advantages of SDIAdvantages of SDI
Automatic retrieval of new information for the Automatic retrieval of new information for the useruser
Set up a profile once, use the profile for retrieval Set up a profile once, use the profile for retrieval many times.many times.
The user can change the profiles or the search The user can change the profiles or the search frequency as needed.frequency as needed.
Disadvantages of SDIDisadvantages of SDI The query based on the profile is static The query based on the profile is static Timing problemsTiming problems
Information in need is information indeed.Information in need is information indeed.Something I am very interest, but it did not Something I am very interest, but it did not
come at the time I want to read it.come at the time I want to read it.
Use profiles during the search
Modify the queryModify the query When the user sends a query, the system When the user sends a query, the system
automatically adds some terms to the query automatically adds some terms to the query from the user’s profiles.from the user’s profiles.
When the user sends a query, the system When the user sends a query, the system checks if the query terms is in user’s profile. If checks if the query terms is in user’s profile. If it is, increase the weight for the terms. it is, increase the weight for the terms.
Organize the search resultsOrganize the search results When the user sends a query, the system uses When the user sends a query, the system uses
the profiles information to organize the search the profiles information to organize the search results (such as clustering, ranking, )results (such as clustering, ranking, )
Browsing
Browsing is an act of human information Browsing is an act of human information seekingseeking
a mental process of identifying and a mental process of identifying and choosing informationchoosing information
a dynamic process that varies in time a dynamic process that varies in time and depends on intermediate results.and depends on intermediate results.
a part of process of decision making, a part of process of decision making, problem solving, etc.problem solving, etc.
Browsing for Information Retrieval
A kind of searching process in which the A kind of searching process in which the initial search criteria or goals are only initial search criteria or goals are only partly definedpartly defined general-purpose web browsinggeneral-purpose web browsing
An art of not knowing what one wants An art of not knowing what one wants until one finds ituntil one finds it visual recognitionvisual recognition content recognitioncontent recognition
Browsing for Information Retrieval A learning activity that emphasizes A learning activity that emphasizes
structures and interactive processstructures and interactive process exploratoryexploratory movements based on feedbackmovements based on feedback
A process of finding and navigating in a A process of finding and navigating in a unknown or unfamiliar information spaceunknown or unfamiliar information space becoming aware of new contentsbecoming aware of new contents finding unexpected resultsfinding unexpected results
Search or Browse?
Would you like to search using a search engine or would you like to browse from pages to pages (or through a hierarchy)?Depend on what?
Factors of browsing
PurposesPurposes Fact retrievalFact retrieval Concept formation or interpretationConcept formation or interpretation Current awarenessCurrent awareness
TasksTasks Well-defined tasksWell-defined tasks Ill-defined tasksIll-defined tasks number of items to browsenumber of items to browse
Factors of browsing Individual characteristicsIndividual characteristics
MotivationMotivation Experience and knowledgeExperience and knowledge Cognitive stylesCognitive styles
ContextContext Subject disciplinesSubject disciplines Organizational schemesOrganizational schemes Nature of text/informationNature of text/information
MediumMedium Does the system support browsing?Does the system support browsing?
IR Systems that support browsing
Good navigation toolsGood navigation tools Easy to move from one item to anotherEasy to move from one item to another Links Links
good structuresgood structuresfast accessfast access
Easy to back trackEasy to back trackCorrect any errorsCorrect any errorsmake new selectionsmake new selections
IR Systems that support browsing
Good displaysGood displays easy to readeasy to read meaningful orders of retrieval resultsmeaningful orders of retrieval results graphical presentationgraphical presentation
Meaningful content organizationMeaningful content organization contextual hierarchical structurescontextual hierarchical structures Grouping of related items Grouping of related items Contextual landmarksContextual landmarks
“why just browse when you can fly?” HotSauce is an innovative 3D fly-through HotSauce is an innovative 3D fly-through
interface for navigating information spaces. interface for navigating information spaces. It was developed, largely as a one-man It was developed, largely as a one-man effort, by Ramanathan V. Guha while at effort, by Ramanathan V. Guha while at Apple Research in the mid-1990s. Apple Research in the mid-1990s. HotSauce was a specific 3D spatialization HotSauce was a specific 3D spatialization of the Meta Content Framework (MCF) also of the Meta Content Framework (MCF) also developed by Guha. developed by Guha.
HotSauce
Why Surf alone?
What if you had an assistant always What if you had an assistant always looking ahead for you [when browsing looking ahead for you [when browsing the web]….the web]…. The assistant could warn you if the The assistant could warn you if the
page was irrelevant, could alert you if page was irrelevant, could alert you if that link or some other link merited that link or some other link merited your attention.your attention.
The assistant could save you time and The assistant could save you time and frustration. frustration.
CACM,44(8), p.71, 2001
Information Agents a software that applies user profiles, a software that applies user profiles,
dynamically and intelligently, to search tasksdynamically and intelligently, to search tasks Search distributed, possibly heterogeneous Search distributed, possibly heterogeneous
information resources on the user’s behalf. information resources on the user’s behalf. Gather and integrate search results by some Gather and integrate search results by some
Artificial Intelligence techniquesArtificial Intelligence techniques Accept user’s feedback and use the feedback to Accept user’s feedback and use the feedback to
modify the user profiles and search strategiesmodify the user profiles and search strategies
Architecting Browsable Websites Design site structuresDesign site structures
Metaphor Exploration Metaphor Exploration Organizational metaphorsOrganizational metaphorsFunctional metaphorsFunctional metaphorsVisual metaphorsVisual metaphors
Define Navigation Define Navigation Global navigationGlobal navigationLocal navigationLocal navigation
Design Document Design Document
Interactive Systems
““When an interactive system is well-designed, When an interactive system is well-designed, the the interface almost disappearsinterface almost disappears, enabling users , enabling users to concentrate on their work, exploration, or to concentrate on their work, exploration, or pleasure.”pleasure.”
– Ben Ben Shneiderman Shneiderman
Design Principles Offer informative feedbacksOffer informative feedbacks
Relationships between query and documents Relationships between query and documents retrievedretrieved
Relationships among retrieved documentsRelationships among retrieved documents Relationships between metadata and documentsRelationships between metadata and documents
Reducing working memory loadReducing working memory load Keep tracks of choices made during the search Keep tracks of choices made during the search
processprocess Allow user to return temporarily abandoned Allow user to return temporarily abandoned
strategies or jump from one strategy to anotherstrategies or jump from one strategy to another Retain information and context across search session. Retain information and context across search session.
Provide alternative interfaces for novice and Provide alternative interfaces for novice and expert users. expert users. Simplicity vs. powerSimplicity vs. power
Output Presentation for Search engines Two major issuesTwo major issues
What information to present?What information to present? How to organize the output items?How to organize the output items?
Information in the output displayInformation in the output display Traditional databasesTraditional databases
Document reference numbers (unique Document reference numbers (unique number)number)
Citations (author, title, source)Citations (author, title, source)Document surrogate (citation plus abstract Document surrogate (citation plus abstract
and/or indexing terms)and/or indexing terms)fulltextfulltext
On the webOn the web title, urltitle, urlFirst few sentences/related First few sentences/related
sentences/summariessentences/summariesDates / page sizesDates / page sizesDegree of relevanceDegree of relevancespecial linksspecial links
• ““find similar one”find similar one”Types of linksTypes of linksRelated categoriesRelated categories
What other information you may wish to have What other information you may wish to have in the retrieval output?in the retrieval output? Citations (or links from this document)?Citations (or links from this document)? Critique or evaluation?Critique or evaluation? Access information (how many times it was Access information (how many times it was
accessed in last 6 months)?accessed in last 6 months)? Links to this document Links to this document Author contact information ?Author contact information ? Why documents were retrieved?Why documents were retrieved?
Output organization Linear Linear
a list of documentsa list of documents listed bylisted by
best match best match alphabetical ordersalphabetical ordersdatesdatesorder of selected fields (authors, order of selected fields (authors,
titles, web sites)titles, web sites)
Linear displayLinear display Practical and most popularPractical and most popular
easy to generateeasy to generateusers know how to use itusers know how to use it
Did not shown relationships among Did not shown relationships among documents!documents!
Document relationships are more complex Document relationships are more complex than a linear onethan a linear one
Hierarchical displayHierarchical display Separate data into different levels or Separate data into different levels or
branchesbranches Branches can be expanded/collapsed.Branches can be expanded/collapsed. Show more data in less spaceShow more data in less space Show the organization of the data Show the organization of the data
Graphical displaysGraphical displays Show more complex relationshipsShow more complex relationships Use location, colors, dimensions, etc to Use location, colors, dimensions, etc to
represent documents, terms or concepts.represent documents, terms or concepts. Provide more interactive functionsProvide more interactive functions
What is IV?
The use of computer-supported, interactive, The use of computer-supported, interactive, visual representations of abstract data visual representations of abstract data to assist navigation in large information to assist navigation in large information
spaces spaces to reveal complex information structuresto reveal complex information structures to amplify cognitionto amplify cognition
System-centered View
User-centered
IV and IR Both need to process a large amount of Both need to process a large amount of
informationinformation Both are tools to assist the cognitive Both are tools to assist the cognitive
process of finding, learning, and process of finding, learning, and understanding information.understanding information.
Both face the challenge of “uncertainty”Both face the challenge of “uncertainty” Not an “Exact science”Not an “Exact science”
Both subject to human’s interpretation. Both subject to human’s interpretation.
VIRI -- Visual Information Retrieval Interfaces
2-dimensional graphical display2-dimensional graphical display use graphical objects (icons, dots etc.) to use graphical objects (icons, dots etc.) to
represent documentsrepresent documents Use geographical relationships to Use geographical relationships to
indicate document relationshipsindicate document relationships use colors to group/differentiate use colors to group/differentiate
documentsdocuments use animation to assist interactionuse animation to assist interaction
Concept Visualization
AltaVista LiveTopicAltaVista LiveTopic HiBrowse InterfaceHiBrowse Interface SemioMapSemioMap Hyperbolic TreesHyperbolic Trees Visual ThesaurusVisual Thesaurus Visual Concept Explorer Visual Concept Explorer
Alta Vista’s LiveTopic
ConceptSpace
HiBrowse Interface
SemioMap
Inxight.com
Topic Maps Highwire: Highwire: http://www.http://www.highwirehighwire.org.org
Visual Thesaurus
Visual Concept Explorer
Concept Mapping
MedLine Search
IBM – Visualization Space
•This information system understands the user.
•It "hears" users' voice commands and "sees"their gestures and body positions. Interactions are natural, more like human-to-human interactions.
Visual Search Engines
TheBrainTheBrain MooterMooter KartooKartoo MapStanMapStan GrokkerGrokker ToughGraphToughGraph StarNightStarNight NewsLinkNewsLink
WebBrainhttp://www.webbrain.com/
Mooter: http://www.mooter.com/
Kartoo: http://www.kartoo.com/
MapStan: http://search.mapstan.com/
Grokkerhttp://www.http://www.groxisgroxis.com/service/.com/service/grokgrok/g_products.html/g_products.html
Touchgraph http://www.http://www.touchgraphtouchgraph.com/.com/
Starrynight from RHIZOME
Galaxy of NewsRennison 95
Galaxy of NewsRennison 95
Map of Information Scientists
Author Mapping
AuthorLink
NewsLink
IntegrateIntegrate And cross mappingAnd cross mapping
Mapping on topics; displaying by people;Mapping on topics; displaying by people; Mapping on people; display by organization;Mapping on people; display by organization; Etc.Etc.
NewsLink: NewsLink: http://project.cis.drexel.edu/lexislink/http://project.cis.drexel.edu/lexislink/
Discussion
Information VisualizationInformation Visualization What works and what does not? What works and what does not?
VIRI
AdvantagesAdvantages More representational powerMore representational power
show more information in a limited screen show more information in a limited screen spacespace
many different ways to group documentsmany different ways to group documents can put both keywords and documents in the can put both keywords and documents in the
same 2-dimensional spacesame 2-dimensional space Provide good overview Provide good overview Provide more interactionProvide more interaction
VIRI
DisadvantagesDisadvantages Difficult to generateDifficult to generate Not always easy to understandNot always easy to understand Many not be specific enoughMany not be specific enough Hard to useHard to use
Evaluation of IR Systems Using Recall & PrecisionUsing Recall & Precision
Conduct query searchesConduct query searchesTry many different queriesTry many different queriesResults may depend on sampling Results may depend on sampling
queries.queries. Compare results of Precision & RecallCompare results of Precision & Recall
Recall & Precision need to be Recall & Precision need to be considered together.considered together.
How to calculate Recall Determine recall for the whole collectionDetermine recall for the whole collection
Take a random sample to estimate Take a random sample to estimate Use a broad query to select a sample collection Use a broad query to select a sample collection
for the estimationfor the estimation Use “seed” documentsUse “seed” documents
Use relative recallUse relative recall Use two more expert searches as the base. Use two more expert searches as the base. Use one system as the base to estimate recall Use one system as the base to estimate recall
on other systemson other systems Use a small test collectionUse a small test collection
Use experts to judge relevance of every Use experts to judge relevance of every document. document.
Prepare special collections.Prepare special collections.
Functionalities Precision and Recall are particularly useful for Precision and Recall are particularly useful for
evaluating searching/indexing algorithms, and evaluating searching/indexing algorithms, and system featuressystem features.. Compare P & R with and without fuzzy Compare P & R with and without fuzzy
search.search. Compare P & R with different type of Compare P & R with different type of
indexing optionsindexing options Compare P & R across systems with the Compare P & R across systems with the
sample featuressample features Precision and Recall are query-oriented, not Precision and Recall are query-oriented, not
system-oriented. system-oriented.
Evaluation without P & R
The emphasis should be on the user and the The emphasis should be on the user and the interaction.interaction. Be specific on data collection Be specific on data collection
How data are collected and indexed?How data are collected and indexed?Is there a quality control for the data Is there a quality control for the data
collection?collection? Be creative on the test questions and Be creative on the test questions and
methodsmethodsNot just questionnairesNot just questionnaires
Be selective on subject groupsBe selective on subject groups
Quality Evaluation
Data quality Data quality Coverage of databaseCoverage of database
It will not be found if it is not in the It will not be found if it is not in the database.database.
Completeness and accuracy of dataCompleteness and accuracy of data Indexing methods and indexing qualityIndexing methods and indexing quality
It will not be found if it is not indexed.It will not be found if it is not indexed. indexing types indexing types currency of indexing ( Is it updated often?)currency of indexing ( Is it updated often?) indexing sizesindexing sizes
Interface Consideration User friendly interfaceUser friendly interface
How long does it take for a user to How long does it take for a user to learn advanced features?learn advanced features?
How well can the user explore or How well can the user explore or interact with the query output?interact with the query output?
How easy is it to customize output How easy is it to customize output displays?displays?
User Satisfaction User satisfactionUser satisfaction
The final test is the user!The final test is the user!User satisfaction is more important User satisfaction is more important
then precision and recallthen precision and recall Measuring user satisfactionMeasuring user satisfaction
SurveySurvey Use statisticsUse statistics User experimentsUser experiments
User Experiments Observe and collect data on Observe and collect data on
System behaviorsSystem behaviors User search behaviorsUser search behaviors User-system interactionUser-system interaction
Interpret experiment resultsInterpret experiment results for system comparisonsfor system comparisons for understanding user’s information seeking for understanding user’s information seeking
behaviorsbehaviors for developing new retrieval systems/interfacesfor developing new retrieval systems/interfaces