+ All Categories
Home > Documents > Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Date post: 02-Jan-2016
Category:
Upload: ralf-todd
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
55
Web Content Development Dr. Komlodi Classes 20-21: Search systems
Transcript
Page 1: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Web Content Development

Dr. Komlodi

Classes 20-21: Search systems

Page 2: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Web Searching

• Search within your site:– Full site or subsites– www.jhu.edu, www.umbc.edu

• Web search:– Search indexes of web pages– www.google.com,

• Metasearch:– Searching across multiple search engines– clusty.com, www.dogpile.com, www.myriadsearch.com

• Web Search Engine Watch: http://searchenginewatch.com/

Page 3: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Does your site need a search?

• Pp. 145-1481. Sufficient content2. Sufficient resources3. Time and know-how to optimize system4. Better alternatives?5. Will users bother with it?6. Too much information to browse7. Fragmented site8. Learning tool9. User expectations10.Dynamism

• Post bullets on Blackboard discussion board

Page 4: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Why should an IA worry about search?

• You know the users

• Many decisions should be user-centered and not technology-centered

• It has an interface

Page 5: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

How does search work?

©2004 Google Source: http://www.google.com/technology/pigeonrank.html

Page 6: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

How does search work?

Searchers

Search Interface Queries

Documents

Indexing(manual or automatic)

Indexes

Search Engine

Matching

Results

Page 7: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

How does web search work?

Searchers

Search Interface Queries

Documents = Web sites & pages

Indexing =Automatic, spiders & robots crawl

websites and index pages according to their own rules. As a result, they build large databases

containing the indexes.

Indexes

Search Engine

Matching = queries to

search engine indexes

Results

Page 8: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

How does search work?

Searchers

Search Interface Queries

Documents

Indexing(manual or automatic)

Indexes

Search Engine

Matching

Results

Page 9: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

What to search on…

• All the content?• Determining search zones• Site search:

– Subsite– Type of document

• Web search: – Multimedia and heterogeneous

• Full-text or metadata• Types of indexes

Page 10: Web Content Development Dr. Komlodi Classes 20-21: Search systems.
Page 11: Web Content Development Dr. Komlodi Classes 20-21: Search systems.
Page 12: Web Content Development Dr. Komlodi Classes 20-21: Search systems.
Page 13: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

How does search work?

Searchers

Search Interface Queries

Documents

Indexing(manual or automatic)

Indexes

Search Engine

Matching

Results

Page 14: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Indexing by

• Navigation vs. destination pages

• Audience or Reading level

• Topic

• Date of update

• Author

• Title

• User task

Page 15: Web Content Development Dr. Komlodi Classes 20-21: Search systems.
Page 16: Web Content Development Dr. Komlodi Classes 20-21: Search systems.
Page 17: Web Content Development Dr. Komlodi Classes 20-21: Search systems.
Page 18: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

What would the index look like?

Page 19: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Full Text Indexing

• Take out frequent words from documents

• List the rest of the words from each document

• May add frequency numbers to each word

• Search the lists of words

Page 20: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

What would the index look like?

Page 21: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Indexing Languages

• An index is a systematic guide designed to indicate topics or features of documents in order to facilitate retrieval of documents or parts of documents.

• An Indexing language is the set of terms used in an index to represent topics or features of documents, and the rules for combining or using those terms.

Page 22: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Web Search Engine Indexes

• The larger a web search engine’s index is, the more web pages it can return and the more types of queries it can accommodate

• However, quantity is just one measure of performance

• How to compare:

• http://www.google.com/help/indexsize.html

• Try this!

Page 23: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

How does search work?

Searchers

Search Interface Queries

Documents

Indexing(manual or automatic)

Indexes

Search Engine

Matching

Results

Page 24: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Search Interface

• Shneiderman, Byrd, Croft, Clarifying Search, DLib, 1997

• Formulation:– Sources– Fields– What to search for– Variants

• Action • Review of results • Refinement • Let’s see Google’s advanced search

Page 25: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Query Format• Boolean:

– Good for advanced users– Precise and clear why you go results back– Need to understand syntax

• “Natural Language”– Good for difficult questions when you can’t think of terms– Or novice users– Difficult to know why certain results come back– Black box

• Relevance Feedback– User selects relevant items from results– Search engine consider these in reformulating query

• Similarity Retrieval– Similar to relevance feedback– “I want more like this”– Both are good if you don’t know what exactly you are looking for

Page 26: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Boolean

Page 27: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Natural Language

Page 28: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Relevance Feedback

Source: http://nayana.ece.ucsb.edu/imsearch/imsearch.html Accessed January 2007.

Page 29: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Relevance Feedback

Page 30: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Relevance Feedback

Page 31: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Similarity Retrieval

Page 32: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Other Query Building Tools

• Citation networks:– This page/paper is citing/linking to?– This page/paper is cited by/linked to?– What other papers/pages cite/link to the same

papers/pages?– http://portal.acm.org/dl.cfm

• Spell checkers in queries• Phonetic tools• Stemming tools• Controlled vocabularies

Page 33: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

How does search work?

Searchers

Search Interface Queries

Documents

Indexing(manual or automatic)

Indexes

Search Engine

Matching

Results

Page 34: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Matching

• Boolean – AND, OR, NOT

• Probabilistic

• Vector model (calculate weights of words)

• Natural Language – process the query as well, match lists

Page 35: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

How does search work?

Searchers

Search Interface Queries

Documents

Indexing(manual or automatic)

Indexes

Search Engine

Matching

Results

Page 36: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Results Presentation

• How many?

• How much information about each item?

• What can users do with each item?

• Presenting results by categories

Page 37: Web Content Development Dr. Komlodi Classes 20-21: Search systems.
Page 38: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Evaluation of Search Engines

Your book is wrong on page 159!!!

Recall:Relevant retrieved documents

All relevant documents in collection

Precision: Relevant retrieved documents

All retrieved documents

Page 39: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Copyright Dr. David Grossman, Source: http://ir.iit.edu/~dagr/cs529/files/handouts/01Introduction-6per.PDF

Page 40: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Within-Site Search Bloopers 1

1. Baffling search controls. Search options require knowledge of computer or industry-insider concepts.

2. Dueling search controls. Competing search boxes on page, with no guidance.

3. Hits look alike. List of found items cannot be easily distinguished by scanning.

4. Duplicate hits. List of found items contains duplicates.

5. Search myopia: Missing relevant items. Items that should be found are not.

http://www.web-bloopers.com/

Page 41: Web Content Development Dr. Komlodi Classes 20-21: Search systems.
Page 42: Web Content Development Dr. Komlodi Classes 20-21: Search systems.
Page 43: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Within-Site Search Bloopers 2

6. Needle in a haystack: Piles of irrelevant hits. Many items don’t match search criteria.

7. Hits sorted uselessly. Sort-order of found items doesn’t support user tasks.

8. Crazy search behavior. Modifying search criteria yields unexpected results.

9. Search-terms not shown. Not showing what search terms produced these results.

10.Number of hits not revealed. Not showing how many items were found.

http://www.web-bloopers.com/

Page 44: Web Content Development Dr. Komlodi Classes 20-21: Search systems.
Page 45: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Search User Interface Design Recommendations 1

• Put a simple, reasonably long search field on every page of the site. (Nielsen: min. 27 characters long)

• Use simple words to explain the process: remove all jargon and technical terms, and make sure that any icons have labels.

• Avoid inventing a new interface, which will confuse users: take the best of the formats of the large public search engines

• Make the search forms and results pages fit into the overall design of the web site: they should use the same colors, fonts and so on.

http://www.searchtools.com/info/user-interface.html

Page 46: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Search User Interface Design Recommendations 2

• Include site names and navigation links into results pages, so users can see the context and structure of the site.

• Set up a special page to be displayed when the search does not find any matches in the index

• Avoid surprises: clarify all automated search features, such as stemming, phonetic matching, thesaurus lookups and stopwords

http://www.searchtools.com/info/user-interface.html

Page 47: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

How Search Should WorkPWU Ch5

• Follow the standards of the large search engines:– Search box (min. 27 char-s) and a button in the top right

corner of the page– Search box on every page– Linear results in order of relevance

• Users expect search to be a keyword search and not other types of searches (by types of clothing, size, season, etc.)

• Advanced search should be a secondary option or omitted

• Scope search useful is you site has distinct sections• Do not default the search to a scope

Page 48: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Search Engine Results Pages(PWU Ch5)

• Copy the design of major search engines• List results in relevance order but no need

to show measure of relevance• If appropriate, allow users to re-sort results• Each result should start with a clickable

headline• Follow headline by 2-3-line summary• Include a search box with the user’s query

in it to make query reformulation easier

Page 49: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Design of No-Matches Pages• Site Context and Navigation• Instead of a bare page saying that the search failed, show the standard site

layout, including background colors, logos, text and link colors, and navigation links.

• If you have a site map or Yahoo-style directory for your site, include it in the no-matches page -- otherwise you may want a statement of the site scope. That provides a positive way to help people understand what is available, and browse if they choose.

• Search Again Field• Make sure there is a Search field, so people can try a different search. Don't

make them click a link or otherwise take an extra step to search again.• Suggested Wording• Include some text that explains why the search might have failed, and what

people can do next. This list is carefully worded to be positive and helpful, rather than blaming the user for the search failure. For example: Your search returned no results. Try broadening your search (from heart attack to heart disease) or adding additional terms (from high blood pressure to high blood pressure or hypertension).

http://www.searchtools.com/guide/nomatches.html

Page 50: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Search UI Design Exercise

• Work in pairs

• Select an imaginary website

• Design on paper:– A homepage with a search box– A search results page– A no-hit page

Page 51: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Search Engine Optimization

Page 52: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

How do search engines find you?

• Search engine optimization:– Changing your site to improve the site’s ranking in

search results

• Search engine submission: – To submit your site to search engines to make sure

the engines know about it

• Search engine marketing/promotion: the process of submitting (free or paid) and search engine optimization

• http://blog.searchenginewatch.com/090402-110851 (From 2:12)

Page 53: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Search Engine Submission

• Yahoo’s human-compiled directory listings (http://help.yahoo.com/l/us/yahoo/ysm/ds/index.html):– Crawlers look at those pages– Free for normal review– $299 for expedited review and commercial listing (no guarantee of

listing)

• Google:– Free but not guaranteed (http://www.google.com/addurl/?

continue=/addurl)– Or use AdWords for payment (http://www.google.com/ads/)

• Yahoo ads submission:– Yahoo sponsored search

(http://searchmarketing.yahoo.com/arp/sponsoredsearch_ss.php?o=US1806&cmp=SYC&ctv=&s=Y&s2=S&b=25)

– Pay by the number of clicks

Page 54: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Search Engine Optimization

• Linguistic SEO:– Research what words users use for your content:

• Search engine logs, user testing, support calls, discussion forums

– Use those words to describe your content on your pages and in the metadata

• Architectural SEO:– Make sure your important content is text– Make sure your linking structure leads search

engine indexing crawlers to important content• Reputation SEO:

– Make sure other sites link to you

Page 55: Web Content Development Dr. Komlodi Classes 20-21: Search systems.

Search Engine Optimization

• Study your guideline• Create a few bullet points to describe your

guideline and post them on the discussion board• Sources:• http://searchenginewatch.com/webmasters/

article.php/2168021

• http://searchenginewatch.com/webmasters/article.php/2167931


Recommended