User Interface Design LBSC 708A/CMSC 838L Douglas W. Oard Session 5, October 9, 2001.

transcript

User Interface Design

LBSC 708A/CMSC 838L

Douglas W. Oard

Session 5, October 9, 2001

Agenda

• Muddiest points and questions

• Query formulation

• Selection

• Examination

• Document delivery

• Inference networks (4)

• Bayes Theorem

• Probability computation

• Relationship to vector space model

• What if a query term is in no document?

• The mud example

Muddiest Points

Problematic Assumptions

• Term independence (9)

• Binary relevance (3)

• Relevance rather than utility (2)

• Prior probability

• Relationship between terms and concepts

Retrieval System Model

SourceSelection

Search

Selection

Ranked List

Examination

Document

Delivery

Document

QueryFormulation

IR System

Indexing Index

Acquisition Collection

Query Formulation

Search

Some Facts About Queries

• IR research shows that long queries are better– Early TREC queries filled a page

• Search engine logs show mostly short queries– Averaging just over 2 words per query– Very few use “advanced” query interfaces– Almost nobody reads “help” screens

• Why don’t people do what’s good for them?– And what can we do about it?

A Conceptual Framework

• Four perspectives on “information needs”– Visceral

• What you really want to know

– Conscious• What you recognize that you want to know

– Formalized (e.g., TREC topics)• How you articulate what you want to know

– Compromised (e.g., TREC queries)• How you express what you want to know to a system

Compromising Information Needs

• Direct translation from a conscious info need– End users rarely formalize their needs first

• Constrained by perceived system capabilities– Vocabulary

• Guessed, found in earlier searches, or from a thesaurus

– Structure• Length, operators for combining terms, syntax

• Users learn systems and topics by exploring

Things That Help

• Encourage longer queries– Provide a large text entry area

• Provide examples– e.g., For good pizza type +Chicago +“deep dish”– Examples related to the last query are best

• Offer lists of related terms– Documents with high IDF in the last retrieved set

Things That Hurt

• Obscure ranking methods– Unpredictable effects of adding or deleting terms

• Only single-term queries avoid this problem

• Counterintuitive statistics– “clis”: AltaVista says 3,882 docs match the

query– “clis library”: 27,025 docs match the

query!• Every document with either term was counted

What About Boolean Queries?

• Present a set of text boxes– OR the terms in each box– AND the boxes together

• Allow graphical query depictions– Several techniques have been tried

poetrypoetry Milton Shakespeare

Alternate Query Modalities

• Spoken queries– Used for telephone and hands-free applications– Reasonable performance with limited vocabularies

• But some error correction method must be included

• Handwritten queries– Palm pilot graffiti, touch-screens, …– Fairly effective if some form of shorthand is used

• Ordinary handwriting often has too much ambiguity

Browsing Retrieved Sets

• Uses the detection stage’s output– Unranked sets, ranked lists, document clusters

• Two goals– Identify documents for some form of delivery– Enrich the query in some way

• Two stages– Select promising documents– Examine those documents individually

Indicative vs. Informative

• Terms often applied to document abstracts– Indicative abstracts support selection

• They describe the contents of a document

– Informative abstracts support understanding• They summarize the contents of a document

• Applies to any information presentation– Presented for indicative or informative purposes

Browsing Goals

• Identify documents for some form of delivery– An indicative purpose

• Query Enrichment– Relevance feedback (indicative)

• User designates “more like this” documents

• System adds terms from those documents to the query

– Manual reformulation (informative)• Better approximation of visceral information need

Selection

Search

Selection

Examination

About 7381 documents match your query.

1. MAHEC Videoconference Systems. Major Category. Product Type. Product. Network System. Multipoint Conference Server (MCS) PictureTel Prism - 8 port. . - size 5K - 6-Jun-97 - English -

2. VIDEOCONFERENCING PRODUCTS. Aethra offers a complete product line of multimedia and videocommunications products to meet all the applications needs of... - size 4K - 1-Jul-97 - English -

SelectionIndex

DocsIndexing

A Selection Interface Taxonomy

• One dimensional lists– Content: title, source, date, summary, ratings, ...– Order: retrieval status value, date, alphabetic, ...– Size: scrolling, specified number, RSV threshold

• Two dimensional displays– Construction: clustering, starfields, projection– Navigation: jump, pan, zoom

• Three dimensional displays– Contour maps, fishtank VR, immersive VR

Cluster Map

Cluster Formation• Based on inter-document similarity

– Computed using the cosine measure, for example

• Heuristic methods can be fairly efficient– Pick any document as the first cluster “seed”– Add the most similar document to each cluster

• Adding the same document will join two clusters

– Check to see if each cluster should be split• Does it contain two or more fairly coherent groups?

• Lots of variations on this have been tried

Starfield

Constructing Starfield Displays

• Two attributes determine the position– Can be dynamically selected from a list

• Numeric position attributes work best– Date, length, rating, …

• Other attributes can affect the display– Displayed as color, size, shape, orientation, …

• Each point can represent a cluster

– Interactively specified using “dynamic queries”

Projection

• Depict many numeric attributes in 2 dimensions– While preserving important spatial relationships

• Typically based on the vector space model– Which has about 100,000 numeric attributes!

• Approximates multidimensional scaling– Heuristic approaches are reasonably fast

• Often visualized as a starfield– But the dimensions lack any particular meaning

Contour Map Displays

• Display a cluster density as terrain elevation– Fit a smooth opaque surface to the data

• Visualize in three dimensions– Project two 2-D and allow manipulation– Use stereo glasses to create a virtual “fishtank”– Create an immersive virtual reality experience

• Mead mounted stereo monitors and head tracking

• “Cave” with wall projection and body tracking

Examination

Delivery

Selection

Examination

Aethra offers a complete product line of multimedia and videocommunications products to meet all the applications needs of users. The standard product line is augmented by a bespoke service to solve customer specific functional requirements.

Standard Videoconferencing Product Line

Vega 384 and Vega 128, the improved Aethra Set-top systems, can be connected to any TV monitor for high quality videoconferencing up to 384 Kbps. A compact and lightweight device, VEGA is very easy to use and can be quickly installed in any officeand work environment.

Voyager, is the first Videoconference briefcase designed for journalist, reporters and people on-the-go. It combines high quality video-communication (up to 384 Kbps) with the necessary reliability in a small and light briefcase.

Full-Text Examination Interfaces

• Most use scroll and/or jump navigation– Some experiments with zooming

• Long documents need special features– “Best passage” function helps users get started

• Overlapping 300 word passages work well

– “Next search term” function facilitates browsing

• Integrated functions for relevance feedback– Passage selection, query term weighting, …

Extraction-Based Summarization

• Robust technique for making disfluent summaries

• Four broad types:– Single-document vs. multi-document– Term-oriented vs. sentence-oriented

• Combination of evidence for selection:– Salience: similarity to the query– Selectivity: IDF or chi-squared– Emphasis: title, first sentence

• For multi-document, suppress duplication

Generated Summaries

• Fluent summaries for a specific domain• Define a knowledge structure for the domain

– Frames are commonly used

• Analysis: process documents to fill the structure– Studied separately as “information extraction”

• Compression: select which facts to retain• Generation: create fluent summaries

– Templates for initial candidates– Use language model to select an alternative

Things That Help

• Show the query in the selection interface– It provides context for the display

• Explain what the system has done– It is hard to control a tool you don’t understand

• Highlight search terms, for example

• Complement what the system has done– Users add value by doing things the system can’t– Expose the information users need to judge utility

Delivery

Examination

Delivery Modalities

• On-screen viewing– Good for hypertext, multimedia, cut-and-paste, …

• Printing– Better resolution, portability, annotations, …

• Fax-on-demand– Really just another way to get to a printer

• Synthesized speech– Useful for telephone and hands-free applications

Two Minute Paper

• When examining documents in the selection and examination interfaces, which type of information need (visceral, conscious, formalized, or compromised) guides the user’s decisions? Please justify your answer.

• What was the muddiest point in today’s lecture?