Fundamentals Of Search

1

The Fundamentals of Enterprise Search

www.searchtools.com/slides/kmw09/fundamentals-of-search.html

KMWorld 2009

Avi Rappoport, Search Tools Consulting

www.searchtools.com

[email protected]

2

What’s In This Workshop

• Overview of enterprise search, in context • Search engine processes

– Robot spiders, database access– Indexing– Security– Query parsing, retrieval, and relevance ranking– Usable search interfaces. – Maintenance and Analytics

• Methods for choosing a good search engine

Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com

3

About SearchTools

• Avi Rappoport is a librarian (MLIS from Berkeley) – Software developer and product manager– User interface designer– Long-time search consultant

• Editor & Publisher, www.searchtools.com• Search Tools Consulting

– Search needs analysis and recommendations– Enterprise search evaluation – Outsourced search administration


4

Defining Enterprise Search

• Large scale web site search – Corporate sites– Institutional sites– Online stores

• Intranet search – Crossing departmental lines– Opening data silos

• Extranets• Portal Search


5

Similarities to Webwide Search

• Robot crawlers • HTML over HTTP• Scaling to millions of items• Distributed processing • Full-text indexing of content• Simple query language• Relevance ranking of results

– TF-IDF (term frequency : inverse document frequency)

• Familiar results list


6

Differences from Web Search

• Limited scope – A site, set of sites, extranet, or intranet

• Few meaningful hyperlinks – Page Rank and link analysis is less useful

• Security and access control issues• Content in databases, CMSs, etc. • More control

– Index update scheduling – Some content is very valuable, other is not

• No search spam


7

Text Search vs. Database Search

• Indexes multiple content sources– Database fields, files, web pages, feeds...

• Simple search commands instead of SQL• Flexible indexing and retrieval• Relevance ranking (this is a major issue)• Does not compete for database resources

– Easy to scale separately from DBMS

• New features: spellcheck, auto complete, facets• Works in the real world, from eBay to Google


8

Search and Information Architecture

• Information Architecture – The art and science of organizing information

for access and use.• IA work enriches search

– Creates order and systems– Provides standard vocabulary– Removes ROT (redundant, obsolete, trivial)

• Search supplements IA– Supports user vocabularies– Changes dynamically with new content


9

Search and Taxonomy

• Taxonomy creates categories– Labels and metadata– Improves quality of search results– Additional metadata extremely valuable

• Search crosses categories – Bypasses ambiguous topic labels– Useful for novices – Supports user vocabulary– Dynamic updates for new topics


10

Search & Knowledge Management

• KM is: “The process through which organizations generate value from their intellectual and knowledge-based assets.” (CIO Magazine)– Organizes information, processes and people – Offers collaboration and archiving tools– Attempts to regularize implicit knowledge

• Search mostly matches words


11

Two Main Types of Search

• Known-item search – Short queries– “Good-enough” answers

• Exploratory search– Research - finding unknowns– Scientific, legal, medical, business, sales– Conceptual overviews– Completeness - all possible relevant items

• Law enforcement• Medicine


12

• All people see are the search box and results list• Invisible functionality

– Indexes– Query processing– Retrieval– Relevance ranking

• Search is a mystery – But it’s just software

Search as an Iceberg


13

Elements of Search Engines

• Automated tools to collect content • Specialized storage for quick retrieval• Query processing and expansion • Retrieval (matching query to index content)• Relevance ranking• Search results interfaces • Analytics, metrics and maintenance


14

Choosing Content To Index

• Information sites – Consider indexing every single page– Use search indexing as a discovery mechanism

• Online stores, catalogs – Product information: cost, color, size, materials– Other: return policies, CEO’s name, jobs listing

• Intranets – Intranet portal and core servers – May need archive servers and search

• Multimedia: images, audio, video– Metadata at least


15

(Near) Real Time Indexing

• Twitter has changed expectations– Even in intranets

• Index must support partial updates– Search engines finding limits at scale– Distribute indexing and indexes

• Trigger index updates (push vs. pull)– Continuous feed– Send web service message– Database trigger– Update watched URLs with new links


16

Indexing and Security

• Search can undermine “security by obscurity”– One link can expose a whole set of documents

• Work with your security team – List areas which contain sensitive content– Define words which trigger further analysis– Create a process for removing sensitive data

• Indexing encrypted content – Search engine uses SSL client for indexing – Encrypt search results before returning– Physical security on search servers


17

Search and Access Control

• Authentication and authorization in indexing – “Basic authentication” - user name and password– NT Security integration– ACLs and single sign-on

• Conform to security rules during indexing– Keep access control info as part of document store

• Showing results - who can see what?– Access to search engine itself– Collection-level access control – Locked results as teaser for subscription– Hit-level access control

• Check before displaying results


18

Indexing: Sources of Content– Web sites – Intranets– Extranets– Blogs– Wikis– Mailing list archives & email public folders– File systems & shared servers

• NFS, SMB, AFP, GFS, ftp, WebDAV– Content Management Systems – Databases– Legacy programs in silos


19

Indexing: Robot Spiders

• Start with base URL for all hosts • For each page, repeat

– Read text into internal format– Save document in cache– Save words into index– Extract all links and check the rules– If they are new URLs, add them to the list


20

Robot Indexing Spider


21

Common Problems With Robots

• Pages that are not linked from anywhere • Spider disallowed by robots.txt or robots meta• URLs with ? and & (all should do these now)• JavaScript, forms, and interactive dynamic links

– Some robots can handle some of these

• Session IDs that change• Duplicate detection

– Multiple views of the same data (Lotus, wikis) – Symbolic links & bad redirects– Multiple copies of files or directories


22

Indexing: Other Data Sources

• RSS feeds: nice clean text• File servers: SMB, file:/// etc. • Content / Document Management Systems• Email archives • Databases via ODBC, JDBC, Oracle API

– Full-text content– Metadata: library catalog records, yellow pages

• External sources using APIs (Application programmatic interfaces)

– News feeds (Reuters, AP)– Twitter


23

Indexing: Text Files

• Plain text is easy• RTF export format text easy to find• HTML semi-structured text

– Content is between tags and in attributes– Generated by JavaScript - hard to extract– Bad HTML, especially missing </ close tags

• XML files (structured)– Many tags are document-level– Content is between tags and in attributes– Complex tag hierarchy

• TEI (Text Encoding Initiative) & Semantic Web• Xquery and XPATH tools


24

Indexing: Binary File Formats

• PDF– Scanned, may not have any text– Bad PDF generators break words at columns– “Shadow” text effect duplicates letters

• SWF and Flash: API may not load dynamic text• Office documents

– Word processing files (may have hidden text from revisions)– Spreadsheets (hard to know what to grab) – Presentations– Note: new docx, xslx, pptx are really XML file sets

• CAD and project files • Metadata (properties, Adobe XMP)Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com

25

Indexing: Tokenizing

• Lowercase all characters (aka ‘folding’)• Tokenizing makes words searchable

– Break on punctuation and spaces– Recognize special words: C++ @ [TS]– Typography issues: is really “st”st– HTML escaped text: möchten = möchten– Special cases for structured strings

• Numbers, Prices, Dates

• N-grams - an alternate approach– Break into short text patterns– Takes a lot of index space


26

Indexing: Character Set Issues

• World has many charsets (aka scripts, alphabets)– English has a simple alphabet: 26 letters, 10 numbers– Other Roman languages: extended (ç, î, ß)– Non-Roman one byte: Cyrillic, Arabic, Hebrew– Asian two bytes: Chinese, Japanese, Korean

• Identifying character sets– Unicode characters– Older usage: language “code pages”– HTTP header or <META http-equiv>– Statistical detection techniques


27

Indexing: Language Issues

• Text search works across languages– Simple pattern-matching, query to index

• Language-specific indexing improves search– Tokenizing using appropriate rules

• Compound nouns (kindergarten)– Language rules for stemming

• Singular version of thés is thé• Language detection

– Trusted tags– Bilingual dictionaries– Statistical matches, n-grams

• Documents may have mixed languages…Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com

28

Indexing: Multimedia

• Images, photos, drawings, sound, scores, video• External metadata

– File name– Link text, surrounding words

• Internal metadata – ID3 tags for music– EXIF and other digital photo information– Subtitles (sometimes)

• Content– OCR to extract graphic text and closed captions– Audio: Speech-to-text conversion, still buggy

• Use human judgment not just automated systems


29

Inverted Index Diagram

• Inverted indexes work well• Lots of IR research

shows this• Better than DBMS

• Alphabetical list of tokens• Tokens not in

paragraph order, thus, inverted

• Each token hasID of source


30

Richer Index Structures

• Store word position (for phrase matching) • Enclosing tag or field

– Document metadata – Database field names– Image (which attribute)– Named anchor text– Text markup tags (TEI, Semantic Web)– Extracted entities

• Personal names, companies, geo locations, dates• Anchor text from incoming links

– Can be very descriptive– Add to index as if part of the target document


31

Example Inverted Index Structure

• For each word– Document ID– Position– Tag name

• For each document– ID– Title– URL– Description


32

Indexing: Stopwords• Stopwords - very common terms

– Linguistic (a an the as he she it you new)– Ubiquitous (names, copyright, click here)

• Consequences of excluding stopwords:– Reduces the size of index files – Improves recall, finds more matching documents – Fails some queries

• As You Like It, IT copyright policy– Problems matching phrases: “New York University”

• Solutions vary:– Index everything, pay the price in index size– CommonGrams: n-grams of of frequent phrases


33

Stopwords Problems: Example

• Searching wordpress.com for whatever will be • Finds all matches for whatever (stopwords ignored)

• Useless results ranking• No matches for will be• One ad gets it right

• External search finds over 3,000 pages on site with phrase


34

Indexing: Stemming• Singular query should find plural words & vice versa

– Shoe <=> shoes, cans <=> can, geese <=> goose – Statistical and probabilistic truncation rules– Linguistic rules

• Lemmatization - stemming based on part of speech• Stemming before indexing

– Improve recall: find all forms of a word– Reduce index size

• Consequences of extreme stemming– Short query problems– Search for Ran shouldn’t match Run, Lola, Run

• Other options– Index everything (makes indexes larger and queries slower)– New idea: CommonGrams (n-grams of frequent phrases)


35

Indexing: Document Store

• Minimum– ID (key for for inverted index)– Unique location (URL / file path / record ID)

• Richer document store– Implicit metadata: filename, size, location– Explicit metadata

• Title, date, keywords, author• Taxonomy labels, classification, user tagging

– Language, character set– Access control settings

• Full text of the document– For snippets and caching


36

Indexing: Dealing with Duplicates

• Detecting duplicate documents – Exact match is fairly easy: checksums– Document similarity check: harder but worth it

• Choosing the primary copy – Most recent (if reliable)– Rules based on path or metadata– New web search “canonical” tag

• What to do with duplicates – Remove from the index: saves space– Hide in results unless requested

• That’s the Google way


37

Indexing: Document Dates

• HTTP servers lie about dates – Frequent wrong settings: 1969, 2040– Dynamic pages send the current timestamp

• File systems lie about dates • Applications lie about dates• Indexers do the best they can

– Metadata (date tag, property, tag DC.date)– Extract from page content– Checksum to see if file has changed since last index – Consider external metadata repository


38

Search Process Flow


39

Where the Queries Come From

• User-entered text in search fields• Search navigation: moving around in results list• Previous searches

– May just be repeated clicks on URL– Save Search feature– Simplistic alerts

• Facet click to add a metadata filter– May re-issue search with additional terms– May be navigational, no text query

• Scripts or automated queries– Dynamic links (find all pictures by this artist)– Geographic information systems


40

Query Processing Steps

1. Try to recognize the character set and language 2. Tokenize the text by language rules

– Break at spaces and punctuation– Same algorithm as index tokenizer

3. Check for operators – Internet Query Operators: + - "quotes"– Boolean Operators: AND OR NOT & | !– Others: NEAR, (parentheses)

4. Check for field names, zones, other filters– Example: title:lunch location=94703

5. Handle the rare natural language question


41

Query Expansion

• Stemming – Dependant on index stemming choices– Good to find singular/plural forms

• Word similarity searching - increases recall– Fuzzy matching– Phonetic, soundex, sound-alike – May overwhelm exact matches

• Synonym expansion, should be site-specific – bus => coach, ATM => Air Tasking Message


42

Search: Retrieval, Recall & Precision

• Retrieval – Finding the documents matching a particular query

• Recall – Finding every relevant document

• Precision – Finding only relevant documents

• Balance more recall vs. better precision– Use search logs and user studies to guide choices

• Use precision as part of relevance ranking – Top results should be more exact matches


43

One-Word Text Retrieval

– Fast binary search in inverted index • Check index updates on disk or in memory• If there are distributed indexes, merge results

– Store the related document information in a list • Document ID• Term frequency in document• Term positions in the document• Note: The document list is not yet sorted

– Frequent searches may be cached• “Short head” vs. “long tail”


44

Multi-Word Text Retrieval

• Relationship between words defines results– Boolean AND, + operator, find all default

• Only documents which contain all terms– Boolean OR operator, find any default

• All documents with any term– Boolean NOT, - operator

• All documents with the first term but not next term – Phrase operators, quotes

• Only documents with the words as a phrase– Also check for zones or field filters – Parentheses: use for order of processing

• Merge resulting listsFundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com

45

Relevance Ranking Algorithms

• Relevance – The likelihood that an item will fill an information need– Based on documents in retrieval list

• Most common algorithm: TF:IDF(Term frequency : inverse document frequency)– How often the query word is in the document?– How often the word is in the index?

• Other relevance algorithms – Vectors and document-query similarity – Linguistic analysis and Natural Language Processing – Statistical and Bayesian analysis


46

Relevance Heuristics

• Phrase matches for multiple query terms – Logs show most multi-word searches are phrases

• Query terms found in special sections– Title– Metadata– Top of document

• All terms matched in document – Even when not relevant, it’s transparent– Old systems gave excess weight to single rare terms


47

More on Relevance

• Relevance is task-specific – Results can never please all of the people– More like berry-picking than like hunting

• Link analysis (PageRank) not very useful – Intranet and site links tend to be navigational

• Situation-specific adjustments – Some areas more likely to be valuable – Current content– Local content


48

Federated Search and Relevance

• Send query to multiple search engines– May require special syntax– Response time often a factor– Receive results in relevance order for each

• Display results, two options– Separate sections for each search engine– Merged single relevance rank list

• Works if all search indexes are similar• Problems where the sources are very different


49

Retrieval: Access Control

• Limit access to search itself– User enters password or other credentials– Search only accepts queries when authenticated

• Collection-level access control– Query filter only retrieves items from allowed groups

• Hit-level access control– Real-time check for user access on documents– Start with most relevant documents– Repeat until there are ten (may be slow)– Display top results, include estimate of how many more– Show helpful message if user can’t see any


50

Search User Experience

• Limit user interface complexity– Show the scope of the information covered– Expose query expansion and contraction – Use familiar UI elements

• User experience goes beyond interface– Index coverage– Query syntax– Retrieval quality and speed– Relevance ranking (first ten are vital)


51

Search Forms Interface

• Balance simplicity with functionality• Put a search field in the navigation bar

– Location should be consistent– Longer is better: short fields lead to short queries

• Simple Search forms: limit options – Zone or section– Dates


52

Search Field Auto-Complete

• Dropdown menu of matching words• Base on search logs• Smallish list, 7-10

– Most popular• Simple sort

– Alphabetic– Price or size– Complete range

(preferably lowestto highest)


53

Other Search Interfaces

• Heavily researched• Natural language

– Must keep typing

– Defining a questionis quite hard

• Interactive search– Guided interviews

– But users want immediate results

• Avatars – do not improve interaction


54

Simple vs. Advanced Search UI

• Most searches are simple– Short: one to three words– Fewer than 10% use any operators at all (maybe 1%)– Even experts prefer simple search

• Will use advanced tools if simple doesn’t work

• Default to simple search, link to advanced search – Those are your power users: librarians, techies– Expose all possible options– Don’t spend huge resources on advanced UI

• Exploratory search is different


55

Advanced Search Fits Sometimes

• EBay– High motivation – Complex search

requirements – Frequent use

• UX testing still required


56

Search Results: Page Elements

• Site context – General page layout, navigation links– Colors and design elements

• Results header– A search field, with the current search terms – Retrieval information - how many hits

• Results list in relevance order– Each result item with at least a linked title

• Facets: dynamic links for filtering results• Results footer


57

Search Results: Good Example

• Full but readable• white space• content blocks

• Site look-and-feel• Navigation• Familiar search

results elements


58

Search Results: Not-So-Good Example

Site page has navigation, colors: search results should tooFundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com

59

Search Results: Visualization

• Fascinating to look at, great demos– Star charts– Topographical displays– Interactive fly-throughs– Hyperbolic trees

• Require significant resources to run• Good for exploratory & comprehensive research

– Finding unexpected synergies• Simple search is much cheaper for casual users


60

Search Results: Header Elements

• Search field, with the current query– Users often edit to be more or less restrictive

• Number of results found• A few search options

– Match Any Word / All Words / Exact Phase– Filter by date option (if trustworthy)– Search zones

• Results navigation• Best Bets• Spelling suggestions


61

Search Results: Hits and Pages

• Show number of items matched– Be accurate – Do not give estimates for small numbers

• (Google and SharePoint are bad this way)

• Pagination - results list navigation– Helps user calibrate content– Important for exploratory search– Follow web search conventions, example

< previous 1 2 3 4 ... 26 next >– Be accurate


62

Results Headers: Examples


63

Search Results: “Best Bets”

aka Search Suggestions, QuickLinks, KeyMatch, Recommendations

• Special-case links for problem queries – Internal topic landing pages– External sites when appropriate– New and better query to search

• Only implement for very frequent queries– Discover problems from users, log analysis – “Short head” - few very popular query terms– Allocate resources to keep them current

• Good search results are higher priority


64

Best Bets Example

• Best Bets are very clear

• Would not come first in normal search results


65

Search Results: List Sorting

• List of links to items matching the query• Sorted by matching terms

– Impossible to be relevant to every query– Variety of sources when possible– Transparency: why these items in this order

• Other sort orders - make very visible– By author’s last name– By date– By price


66

Search Results: Not Enough Variety


67

Search Results: Weird Sort

Sorted by:“Degrees away”

Labels too subtle:• Hidden in header• Degree icon should

be on the left side


68

Result Items: Elements

• Information foraging: show hints about items• Title of document, or name of product• Location: URL, file path, database ID

– May need to rewrite to user-accessible URLs– Hide location if it’s not meaningful

• Distinguishing data – Metadata: picture, product code, author name

• Show match terms in context (snippets)– Text before and after query term matches – Highlight the matches


69

Results Items: Not Enough Content


70

Results Items: Too Much Content


71

Results Items: Just Right


72

Results Items: Additional Data

• Date (if reliable)• Size and File type

– Avoid surprising launches of Acrobat or other app.• Metadata

– Author, department, brand, product... • Access status: password required? • Topics and subject headings

– Taxonomy categories– Keywords and concept tags– User tags, folksonomy


73

Results Items: Rich Items Example


74

Results: Dynamic Clustering

• Uses search results text to infer topics – Groups by similarity in titles and results text

• Particularly good for portals and intranets– Unstructured, uncontrolled text– Dynamic, no preprocessing needed

• Can supplement categorization and taxonomies


75

Results: Clustering Example


76

Commerce and Catalog Results

• Picture or graphic if possible• Important attributes

– Price– Color– Size– Compatibility– Availability

• “Buy” button – Simplify process, save time


77

Online Store Results Example


78

Multimedia Search

• Image, audio, and video files– Audio and visual similarity search still theory

• Show context in results – Match terms from transcript or OCR– Text around image– Thumbnails or keyframes


79

Multimedia Results Example


80

Results: Faceted Metadata

• Better than forms for structured text data – Exposes attributes as part of search results – Leverages metadata

• Topic names, taxonomy• Mundane stuff: color, date, size, author...

• Choices specifically relating to search results – Dynamically generates from metadata – Preview numbers offer users confidence in clicking

• Supported by extensive usability testing• Used on a majority of large e-commerce sites


81

Why Faceted Search is Better Than Forms


82

Faceted Metadata: Commerce Example


83

Faceted Metadata: Library Catalog


84

No Matches Queries: Causes

• Misspellings and typing errors• Scope problem: nothing for that topic• Vocabulary differences

– Users may be less precise, or use competitor’s terms– Marketers may dominate content

• Restrictive search settings – Default may only match exact phrase or all words – Access control may disallow user

• Software/hardware/network failures


85

No Matches Queries: Responses

• Track queries with no matches in logs • Use sessions, surveys & testing to find user intent • Design the no-matches page carefully

– Explain what is and isn’t on the site – Provide useful navigation links

• Add search engine help – Synonyms– Best Bets– Spelling

• Add terms to text• Add content, topic pages


86

No Matches Queries: Spelling Issues

• Detect and address common problems– Spelling errors– Typos– Queries without spaces between words

• Use site-specific dictionary– Easy to build from search index – Never suggests any words not on the site

• Users familiar with did you mean....?


87

Good Example of No-Matches Page


88

Empty Searches

• Users click or press “enter” in the search box• Test for this special case

• Should not find all items in the index

• Interaction options:• Do nothing• Go to a simple

search page• Show an

error dialog


89

Search Engine Maintenance

• Index maintenance – Obsolete content removal– Check for new content– Track technical problems (bad links, servers down)

• Search quality – Re-run test suite– Compare with original results– Add new test queries

• Track user feedback, surveys– Use metrics and log analysis to catch trends


90

Metrics for Search Engines

• Server uptime• Errors: how often and how serious• Index

– Size on disc and in memory– Number of entries– Number and type of indexing errors

• Search traffic – Queries per minute (60 qpm is common)– Average clicks on results items per query– Average next-page views per query– Number and percent of no-match queries


91

Search Log Analysis

• Most frequent query terms– Short head: a few very popular terms– Long tail of unique queries– Lots of junk: URLs, spam, gibberish

• Frequent query terms not matched - fix somehow• More esoteric analysis - need a lot of data

– Frequent query terms with low click-through – Frequent query terms with high “next page” clicks

• Raw logs– Import into database for ad-hoc reports– Session analysis can be enlightening


92

Choosing a Search Engine

• Find specific information needs• Analyze content

– Source and formats formats– Rough number of pages/ records / items

• Define platform, API, language requirements• Buy (or use open source), don’t build

– User surveys show problems with home-grown • Choose & compare likely candidates

– Gathering, indexing, retrieval, relevance features– Scaling– Administration tools– Continuing development, support, user groups– Price


93

Information Needs Analysis

• What works already? • Don’t fix what’s not broken

• Where is the real pain? – Difficult search syntax– Data silos – New content not findable

• What requires more complex tools? • Exploratory search• Scientific & academic research • Business intelligence and data mining• Comprehensive legal discovery


94

Content Inventory

• Work with Information Architects – Use existing taxonomies and catalogs

• Learn what you have – Simple static HTML pages– Other formats: PDF, Office documents (which version)– CMS, document management, publishing systems– Databases and legacy systems– Multimedia audio and video files

• Identify more and less valuable data – Some content should be in archives


95

Search Engine Deployment Types

• Software – Controlled by local IT– Flexible installation– Open-source - several high quality packages

• Search Appliances – Server hardware/software combinations– Require very little technical attention– Check development and backup server pricing

• Remote Search Services (SaaS) – Index using robot spiders or remote access– Query goes to service, results go back to user – Low network, hosting, IT load


96

Scaling Search to Millions & Billions

• What are the largest installations for each?– Talk to them before committing

• Cache frequent queries• Add query servers, automated load balancing• Indexing at scale

– Indexing on dedicated servers– Deal with new calls for near-real-time indexing – Distribute multiple clones of indexes– Segment indexes, parallel lookups, merge result


97

Testing Search Indexing

• Choose 3-4 good candidates • Index as much content as possible

– Watch the robot, track errors– Try to index tricky data sources– Compare coverage among them

• Test index scaling– Make a really big index based on expected use– Speed of add/ update/ delete– Responsiveness during big update


98

Evaluating Search Results

• Create a query test suite– Use existing search logs if possible– Short, long, unusual, common (check cache)– Simple and complex queries – Spelling, typing and vocabulary errors – Many matches, few matches, no matches

• Perform searches against the test engines– Save results pages as HTML for later checking

• Analyze differences among them– Retrieval (and indexing): what’s found?– Relevance: are the top results good ones?


99

Search: Not a Black Box

• Simple search solves many enterprise problems – Dynamic access to local content– Familiar interface, expectations– User vocabulary

• Understand the real information needs – Index the right stuff– Work with content providers and IAs

• Link to specialty research engines• Learn from users over time, make it better


Date post:	28-Jan-2015
Category:	Technology
Upload:	search-tools-consulting
View:	103 times
Download:	1 times

Fundamentals Of Search

Technology