+ All Categories
Home > Documents > Models of ATTENTI Communication: From P r - Eric...

Models of ATTENTI Communication: From P r - Eric...

Date post: 03-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
8
One of the main results of 20th century cognitive psychology is that, despite the overall impressive abilities of people to sense, remember, and reason about the world, our cognitive abilities are extremely limited in well-charac- terized ways. In particular, psychologists have found that people grapple with scarce attentional resources and lim- ited working memory. Such limitations become salient when people are challenged with remembering more than a handful of new ideas or items in the short term, recognizing important targets against a background pattern of items [4], or interleaving multiple tasks [5]. By Eric Horvitz, Carl Kadie, Tim Paek, and David Hovel 52 March 2003/Vol. 46, No. 3 COMMUNICATIONS OF THE ACM Creating computing and communication systems that sense and reason about human attention by fusing together information from multiple streams. These results indicate that we cannot help but to inspect the world via a limited spotlight of attention. As such, we often generate clues implic- itly and explicitly about what we are selec- tively attending to and how deeply we are focusing. Given constraints on attentional resources, it is no surprise that communi- cation among people relies deeply on attentional signals. Psychologists and lin- guists studying communication have rec- ognized that signaling and detecting attentional states lies at the heart of the fast-paced and fluid interactions that people have with one another when collaborating or communicating [1, 6]. Attentional cues are central in decisions about when to initiate or to make an effective contribution to a conversation or project. Beyond knowing when to speak or listen in a conversa- tion, attention is critical in detecting that a con- versation is progressing. More generally, detecting or inferring attention is an essential component of the overall process of grounding—converg- ing in a shared manner on a mutual understanding of a communication [1]. The findings about our limited atten- tional resources—and about how we rely on attentional signals in collaborating— have significant implications for how we design computational systems and inter- faces. Over the last five years, our team at Microsoft Research has explored, within the Attentional User Interface project, opportunities for enhancing computing and communication systems by treating human attention as a central construct and organizing principle. We consider attention a rare commodity—and critical cur- Models of A TTENTI Communication: From P r
Transcript
Page 1: Models of ATTENTI Communication: From P r - Eric Horvitzerichorvitz.com/Models_of_attention_in_computing.pdf · Models ofATTENTI Communication: From P r. COMMUNICATIONS OF THE ACMMarch

One of the main results of 20th century cognitivepsychology is that, despite the overall impressive abilitiesof people to sense, remember, and reason about theworld, our cognitiveabilities are extremelylimited in well-charac-terized ways. In particular, psychologists have found thatpeople grapple with scarce attentional resources and lim-ited working memory. Such limitations become salient

when people are challenged with remembering more than a handful of new ideas oritems in the short term, recognizing important targets against a background patternof items [4], or interleaving multiple tasks [5].

� By Eric Horvitz, Carl Kadie,Tim Paek, and David Hovel

52 March 2003/Vol. 46, No. 3 COMMUNICATIONS OF THE ACM

Creating computing andcommunication systems

that sense and reasonabout human attention

by fusing together information from multiple streams.

These results indicate that we cannot help butto inspect the world via a limited spotlight ofattention. As such, we often generate clues implic-itly and explicitly about what we are selec-tively attending to and how deeply we arefocusing. Given constraints on attentionalresources, it is no surprise that communi-cation among people relies deeply onattentional signals. Psychologists and lin-guists studying communication have rec-ognized that signaling and detectingattentional states lies at the heart of the fast-pacedand fluid interactions that people have with oneanother when collaborating or communicating[1, 6]. Attentional cues are central in decisionsabout when to initiate or to make an effectivecontribution to a conversation or project. Beyondknowing when to speak or listen in a conversa-

tion, attention is critical in detecting that a con-versation is progressing. More generally, detectingor inferring attention is an essential component of

the overall process of grounding—converg-ing in a shared manner on a mutualunderstanding of a communication [1].

The findings about our limited atten-tional resources—and about how we relyon attentional signals in collaborating—have significant implications for how wedesign computational systems and inter-

faces. Over the last five years, our team atMicrosoft Research has explored, within theAttentional User Interface project, opportunitiesfor enhancing computing and communicationsystems by treating human attention as a centralconstruct and organizing principle. We considerattention a rare commodity—and critical cur-

Models of ATTENTICommunication: From P r

Page 2: Models of ATTENTI Communication: From P r - Eric Horvitzerichorvitz.com/Models_of_attention_in_computing.pdf · Models ofATTENTI Communication: From P r. COMMUNICATIONS OF THE ACMMarch

COMMUNICATIONS OF THE ACM March 2003/Vol. 46, No. 3 53

rency—in reasoning about the informationawareness versus disruption of users [11]. Wehave also pursued the use of attentional cues as animportant source of rich signals about goals,intentions, and topics of interest [9]. We seek tobuild systems that sense, and share with users,natural signals about attention to support conver-sations and other forms of fluid mixed-initiativecollaborations with computers. Moving to con-siderations of computational efficiency, an assess-ment of a user’s current and future attention canbe employed to triage computational resources.Investigations in this realm include selective allo-cation of resources in rendering graphics via rely-ing on models or on direct observations of visualattention, and in guiding precomputation andprefetching [10] with forecasts of future atten-tion. Finally, although there is a rich history ofprior work on attention from cognitive psychol-ogy, we have found there is much we do not yetunderstand. Thus, beyond pooling results fromprior psychological studies, we need to continueto perform user studies that adapt or extend priorresults on attention and memory from cognitivepsychology to real-world computing and com-munication applications [2, 3].

We shall first describe several principles andmethodologies at the heart of research on inte-grating models of attention into human-com-puter interaction and communications. Then, weshall review representative efforts illustrating howwe can harness these principles in attention-sen-sitive messaging and mixed-initiative interactionapplications.

Models of Attention and DecisionMaking Under UncertaintyHow might we access and use informationabout a user’s attention? To be sure, subtle cluesabout attention are often available, and a num-ber of these clues can be taken as direct signalsabout the attentional status of users. For exam-ple, sensing patterns of simple gestures such asthe touching and lifting of a device in differentsettings can relay important information aboutattention that can be exploited in a number ofexciting ways [7]. Moving to higher-precisionsensing, several researchers have pursued the useof gaze-tracking systems, and have used signalsabout the focus of visual attention in a variety ofapplications. As gaze sensors grow in reliabilityand decrease in cost, we are seeing the evolutionof devices that recognize when and how they areinterrogated by the spotlight of visual attention.

Nonetheless, we may often be uncertain abouta user’s attentional focus and workload in light ofobservations, and about the value of alternateactions in different contexts. Thus, we turn tomodels that can be harnessed to reason about auser’s attention and about the ideal attention-sensitive actions to take under uncertainty. Such models and reasoning can unleash newfunctionalities and user experiences.

We have constructed by hand and learnedfrom data Bayesian models viewed as performingthe task of an automated “attentional SherlockHolmes,” working to reveal current or futureattention under uncertainty from an ongoingstream of clues. Bayesian attentional models take

ON in Computing and P rinciples to Applications

Page 3: Models of ATTENTI Communication: From P r - Eric Horvitzerichorvitz.com/Models_of_attention_in_computing.pdf · Models ofATTENTI Communication: From P r. COMMUNICATIONS OF THE ACMMarch

54 March 2003/Vol. 46, No. 3 COMMUNICATIONS OF THE ACM

as inputs sensors that provide streams of evidenceabout attention and provide a means for computingprobability distributions over a user’s attention andintentions.

Perceptual sensors include microphones listeningfor ambient acoustical information or utterances,cameras supporting visual analysis of a user’s gaze orpose, accelerometers that detect patterns of motionof devices, and location sensing via GPS and analysisof wireless signals. However, more traditional sourcesof events can also offer valuable clues. These sourcesinclude a user’s online calendar and considerations ofthe day of week and time of day. Another rich streamof evidence can be harvested by monitoring a user’sinteractions with software and devices. Finally, back-ground information about the history of a user’sinterests and prior patterns of activities and attentioncan provide valuable sources of information aboutattention.

To build probabilistic attentional models able tofuse evidence from multiple sensors, we leverage theresults of accelerated research over the last 15 yearson representations for reasoning and decision mak-ing under uncertainty. Such work has led to inferen-tial methods and representations including Bayesiannetworks and influence diagrams—graphical modelsthat extend probabilistic inference to considerationsof actions under uncertainty. Algorithms have beendeveloped that enable us to compute probability dis-tributions over outcomes and expected utilities ofactions from these graphical representations.

Figure 1a displays a high-level influence diagramrepresenting sensor fusion and decision making in

the context of a user’s attention under uncertainty. Asportrayed in the figure, a set of variables (oval nodes)representing sensed evidence influence a randomvariable representing a user’s attentional status which,in turn, influences the expected value of alternateactions or configurations. We introduce intermediatecost and benefit variables in the pedagogical model asit can be useful to deliberate about the value andcosts associated with different outcomes. Decisions(rectangular node) about ideal computer actions takeinto consideration the costs and benefits, givenuncertainty about a user’s attention. In the end, theexpected utility (diamond-shaped node) is influ-enced by the action and the costs and benefits.

We extend such a high-level, pedagogical view byconstructing richer models that contain additionalintermediate variables and key interdependenciesamong the variables. Also, as both devices and peopleare immersed in time, we move beyond pointwiseconsiderations of the states of variables, to buildhigher-fidelity temporal attentional models that rep-resent changing observations and beliefs with theflow of time. We have employed dynamic Bayesiannetworks and Hidden Markov Models for represent-ing and reasoning about states of attention and loca-tion over time.

Figure 1b displays two adjacent time slices of atemporal attentional model. Such a model provides aprobability distribution over a user’s workload andtask developed for an application that provides selec-tive filtering of messages and communications tousers. In this case, the status of attention includesapproximately 15 discrete states.

Economic Models of Attention andInformationAs we can all attest from personal experiences, com-puters and communication systems today have little

Value ofActions

Cost ofActions

Computer'sActions

ExpectedUtility

AttentionalFocus & Workload

PerceptualSensors Device Interaction Calendar Day, Time

(b)(a)

Visual pose, tn Visual pose, tn+1

App in Focus, tn

App in Focus, tn+1

App Usage Pattern, tn

App Usage Pattern, tn+1

Inspection Interval, tn

Inspection Interval, tn+1

Attentional Focus, tn

Attentional Focus, tn+1

Appt Status, tn

Appt Status, tn+1

Location, tn Location, tn+1

Day, Time Day, TimeCalendar

Time n Time n+1

Calendar

AcousticalSignal, tn

AcousticalSignal, tn+1

User, DeviceActivity, tn

User, DeviceActivity, tn+1

Figure 1. (a) High-level decision model considering a user’sattentional focus and workload as a random variable, influencedby the observed states of several sensors. (b) A temporalBayesian attentional model, highlighting key dependencies(dashed arcs) between variables in adjacent time slices.

Page 4: Models of ATTENTI Communication: From P r - Eric Horvitzerichorvitz.com/Models_of_attention_in_computing.pdf · Models ofATTENTI Communication: From P r. COMMUNICATIONS OF THE ACMMarch

COMMUNICATIONS OF THE ACM March 2003/Vol. 46, No. 3 55

awareness of the value and costs of relaying messages,alerts, and calls to users. Research on the NotificationPlatform project has centered on formulating eco-nomic principles of attention-sensitive notifica-tion—and on implementing a cross-device-alertingsystem based on these principles. A descendant ofthe Notification Platform named Bestcom appliessimilar principles to interpersonal communications[12]. We focus here on the Notification Platform.

The Notification Platform system modulates theflow of messages from multiple sources to devices byperforming ongoing decision analyses. These analysesbalance the expected value of information with theattention-sensitive costs of disruption. As highlightedin Figure 2a, the system serves as an attention-savvylayer between incoming messages and a user, taking asinputs sensors that provide information about a user’sattention, location, and overall situation.

The design of the Notification Platform wasinformed by several earlier prototypes exploiting con-text-sensing for identifying a user’s workload, includ-ing the Priorities system [11, 12]. Priorities employsclassifiers that predict the urgency of incoming email.The classifiers are trained with sample messages,either obtained via explicit training or by automati-cally drafting data sets by observing a user’s interac-tion with an email browser. Studies havedemonstrated the system performs remarkably well atclassifying the urgency of messages (see, for example,the receiver-operator characteristic curve described in[11]). Beyond classifying the urgency of messages,Priorities also observes a user’s patterns of presence ata desktop computer based on time of day, and infers

the time until a user will reviewunread messages. The systemcomputes an expected cost ofdelayed review for each incomingmessage. This cost is considered,along with a cost of interruptionbased on activity sensing and cal-endar information, in automateddecisions about if and how to alertand transmit information to a userabout email, tasks, and appoint-

ment reminders in mobile and desktop settings.The Notification Platform uses a decision-analytic

model for cross-device alerting and relay of informa-tion from multiple sources. The analyses consider auser’s attention and location under uncertainty, as wellas the fidelity and relevance of potential communica-tion channels. We developed a distributed architec-ture that executes over multiple devices. Figure 2bdisplays a schematized view of the architecture of theNotification Platform. Standard interfaces and meta-data schemas allow users to subscribe different sourcesof information and devices to a Notification Manager.At the heart of the Notification Manager is a Bayesianattention model and decision analysis that accessesclues about attention and location from sensors via amodule we refer to as a Context Server.

The Context Server accesses several states andstreams of evidence, including a user’s appointmentsfrom Microsoft Outlook, events about device pres-ence and activity, an analysis of ambient acoustics inthe room, and a visual analysis of pose using aBayesian head-tracking system. Key abstractions fromthe evidence, such as “voice trace detected,” “taskcompletion occurred within five seconds,” “singleapplication focus,” “head-tracked—looking awayfrom display,” and “meeting away from office—end-ing in 10 minutes,” are posted to a volatile store calledthe Context Whiteboard, which is continually updatedby incoming evidence. The Context Whiteboard iscontacted for updated information every few secondsby the Bayesian attentional model in the NotificationManager.

The Notification Manager’s decision analysisweighs the expected costs and benefits of alerting auser about messages coming into the system’s Univer-sal Inbox. In computing the costs of disruption, thedecision model considers the probability distributionover a user’s attentional state and location in severalplaces in its analysis, including the cost of disruptionassociated with different alerts for each device, theavailability of different devices, and the likelihood thatthe information will reach the user when alerted in aspecific manner on a device.

Attenton-sensitive message

Decision model Attention model

(a) (b)

Incomingmessages

Sources

Device interactionVisual analyses

Calendar

Desktopoffice

Email

IM

Voice

News

Financial

File Scout

BackgroundQuery

NotificationManager

ErrorMessages

Notificationschema

Deviceschema

Notificatonpreferences

ContextWhiteboard

Context Server

Sensing

• Best device & time?• Modality & intensity?

Assistance

Desktophome

PDA

Cellphone

Contentjournal

Voicemail

Location (802.11/GPS)

Proximity detectors

Preferences

Sensors

Ambient acoustics

AccelerometersEndpoints

Figure 2. Conceptual overview of the Notification Platform, a cross-device messaging system that balances the costs of disruption with the value of information from multiple messagesources. The system employs a probabilistic model of attentionand executes ongoing decision analyses about ideal alerting,fidelity, and routing. (b) Constellation of components of Notification Platform, depicting the subscription architecture. Subscribed sources and devices communicate with the NotificationManager via a set of standard interfaces. Sensor findings frommultiple devices are considered in deliberations about informationvalue, attention, and the best channel and alerting modality.

Page 5: Models of ATTENTI Communication: From P r - Eric Horvitzerichorvitz.com/Models_of_attention_in_computing.pdf · Models ofATTENTI Communication: From P r. COMMUNICATIONS OF THE ACMMarch

The ongoing expected-utility analysis is performedin accordance with a user’s preferences, stored in aprofile. These include assertions about the cost of dis-ruption for each alert modality, conditioned on theuser being in different attentional states. As an exam-ple, for the case of a desktop computer, the systemmakes available a set of display alternatives as theproduct of different visual displays of the alert (forexample, thumbnail, full-display alert) and severalauditory cues (for example, no auditory clue, softchime, louder herald). The placement of the alertwith regard to the current focus of visual attention orinteraction is also considered.

Figure 3a captures in a graphical manner the delib-eration of the Notification Platform about incomingmessages. The system computes the expected value ofreceiving an alert as the difference between the valueof alerting the user now and the value that will beobtained when the information is viewed later. Givenprobability distributions over a user’s attention andlocation inferred from its sensors, Notification Plat-form iterates over all alerting and display modalities

for each device with an expected-utility analysis todecide if, when, and how to alert a user. As repre-sented with the metaphor of a narrowing funnel inFigure 3a, the system considers, for each device andmodality, the loss in fidelity of information transmit-ted. In addition, the system considers the likelihoodthat an alert will be received, given inferred probabil-ity distributions over the attention and location of theuser. This reliability of transmission is representedmetaphorically in the figure as the chance that a mes-sage will make it through a slot in a spinning disk. Inthe end, the attention-sensitive costs of disruption aresubtracted from estimates of the value of alerting,yielding a net value of alerting a user for each channeland alerting modality. The channel and modalitywith the highest expected value is selected.

Figure 3b displays several aspects of the behind-the-scenes functioning of the Notification Platform.A context palette displays current findings drawn fromsensor sources. Several views onto components of thedecision analysis are displayed, including inferenceabout the time-varying attention of the user. At thecurrent time, the user is inferred to be most likely ina state named “high-focus solo activity,” which hascompeted recently with “low-focus solo activity,”“conversation in office,” and other less likely states.The Universal Inbox displays messages from severalsources, including email, instant messaging, breakingnews, and stock prices. Messages have also beenreceived from DocWatch, an agent subscribed to bythe user that identifies documents of interest for theuser. Each message is annotated with the best deviceand alerting policy, and the associated net expecteddollar value of relaying the messages with that chan-nel and mode is indicated. As portrayed in the inbox,it is worthwhile passing two instant messages on tothe user. Other alerts are “in the red,” as the cost of

56 March 2003/Vol. 46, No. 3 COMMUNICATIONS OF THE ACM

Now

Information value

Net value

Device i, Modality j

Later

Fidelity

Disruption

Transmittedvalue

Attentionalcost

Inferredattention

Display and alertingconfiguration

Transmissionreliability

Figure 3. (a) Graphical depiction of the Notification Manager’s analyses. Attention-sensitive costs of disruption andthe value of information are considered,along with losses based in decreasedfidelity (narrowing funnel) and transmis-sion reliability (spinning slotted disk)associated with the use of each alertingmodality of all subscribed devices. (b) View of a portion of the Notification Platform’s real-time reasoning. Informationfrom multiple sensors is posted to theContext Whiteboard and fused to infer theuser’s attentional status and location. Multiple notifications are sorted by netexpected value and the best channel and alerting modality with the highestexpected utility is selected.

(a)

(b)

Page 6: Models of ATTENTI Communication: From P r - Eric Horvitzerichorvitz.com/Models_of_attention_in_computing.pdf · Models ofATTENTI Communication: From P r. COMMUNICATIONS OF THE ACMMarch

disruption dominates the net value of information. Inthis case, the ideal alerting mode and channel for aninstant message is determined to be a visual notifica-tion in a large format coupled with an audio herald atthe user’s desktop system.

Ongoing research on the Notification Platformproject includes the refinement of preference assess-ment tools to ease the task of encoding preferences.Currently, users can adjust sliders to change a set ofpredefined defaults on costs of interruptions. Anotherkey area of work centers on using machine learningfor building probabilistic models of attention, loca-tion, and cost of disruption from data. Results fromongoing machine-learning efforts by our team havebeen applied to refine the Notification Platform [12].

As highlighted in Figure 4, we have also beenworking to make small devices aware of the atten-tional status and location of users [7]—and eithertransmitting local sensor information to inform acentral Notification Manager, performing entirelylocal notification management and related servicesbased on the observations, or doing a combination ofcentral and local deliberation about notification. Inthe latter case, the central Notification Managermakes general decisions about routing, and relies onthe endpoint device to perform precision targeting ofthe timing and alerting modality, based on local sens-ing and reasoning. As an example, with the use of amethod we refer to as “bounded deferral,” a localdevice commits to relaying a message that it hasreceived before a message-specific deadline is reached;the device does its best to find a good time for inter-ruption within the allotted period. Research on smartendpoints includes the challenges of embedding andleveraging multiple perceptual sensors on small

devices, including GPS, 802.11 signal strength,accelerometers, infrared proximity detectors, andtouch sensors. Part of this work has explored oppor-tunities for developing devices, such as cell phonesthat behave with more insight about their disruptive-ness by considering the situation at hand, includingstates derived from coarse models of attention [8].

Additionally, we are continuing to pursue psycho-logical studies of disruption. Formal studies of thecosts of disruption began with the early work ofOvsiankina and Zeigarnik nearly 75 years ago. Therich body of work in this realm includes studies onmemory, problem solving, and overall task efficiencyin the face of disruptions. More recent work includesefforts by our team [2, 3] and other groups to probethe influence of notifications of various types andsaliencies on the efficiency and satisfaction with per-forming a variety of computer tasks. The psychologi-cal studies and results complement the mathematicalmodels; the economic models provide a principled,flexible foundation that can integrate findings aboutthe costs uncovered by user studies of disruption viathe setting of parameters considered in expected-util-ity decision making.

Attention, Initiative, and InteractionIn another area of investigation within the Atten-tional User Interface project, we have studied the usemodels of attention to enhance the robustness andfluidity of human-computer collaboration. Some ofthis work focuses on the recognition of attentionalcues as coordinative signals in mixed-initiative inter-action with computing devices. In mixed-initiativeinteraction, both users and computers take turns incontributing to a project or an understanding [9].The turn taking of conversational dialogue is a pro-totypical example of mixed-initiative interaction.Psychologists have found that people engaged inconversations rely on attentional cues to signal whena contribution is going to be offered or has beenaccepted [1]. We have sought to endow computerswith an analogous ability to recognize and emit sig-nals to guide the nature and timing of contributionsand clarifications in support of mixed-initiativeinteraction.

DeepListener and Quartet represent efforts inmixed-initiative interaction to incorporate attentionin spoken language systems. Both systems tacklewhat we have referred to as the “speech-target prob-lem:” When a computer with an open microphoneand speech recognizer detects an utterance, how is itto recognize that it is being addressed when there areother people or listening devices in a room? DeepLis-tener and Quartet explicitly address this challenge

COMMUNICATIONS OF THE ACM March 2003/Vol. 46, No. 3 57

Figure 4. Sensing PDA, outfitted with multiple perceptual sensors, including proximity, motion, and touch sensors. In thebackground, accelerometer signals are displayed showing themotion fingerprint of a user walking while looking at the device.

Page 7: Models of ATTENTI Communication: From P r - Eric Horvitzerichorvitz.com/Models_of_attention_in_computing.pdf · Models ofATTENTI Communication: From P r. COMMUNICATIONS OF THE ACMMarch

with probabilistic models that infer the likelihoodthat they are the target of speech.

DeepListener reasons about a user’s attention andintentions to guide clarification dialogue in a spokencommand and control setting. The system considersits uncertainty about whether it is the target ofspeech, what it has heard, and the likelihood of dif-ferent intentions. DeepListener continues to makeexpected-utility decisions about taking actions in theworld, or about how it should approach users, if nec-essary, to clarify their intentions before taking suchworld actions. These decisions take into considerationthe utilities of alternate dialogue actions and thestakes of the world actions.

DeepListener shares its attention and availabilityby gracefully changing the colors and intensities of anattentional lens that glows on its control panel, or viagestures and rendered “thoughts” of an animatedagent. These affordances provide cues that assist withconversational turn taking.

Figure 5a displays a situation where DeepListenerhas detected an utterance first directed elsewhere in a

noisy environment. Afteranalyzing a new utterancea bit later, the systemengages the user in a clari-fication dialogue, and theninvokes a desired action.

Quartet operates with acontinuous speech recog-nition system, and incor-porates a richer model ofattention under uncer-tainty. It examines key-board events, an analysis ofthe content and the coher-

ence of natural language parsing, and a visual poseanalysis to ascertain the attentional status of the userand system with regards to the establishment, main-tenance, and disruption of attention between the userand system. Since Quartet couples speech recognitionto a natural language parser, the system can also usethe grammatical parse to reason about whether anutterance was misrecognized, or properly recognizedbut intended for someone else. Figure 5b showsQuartet listening to a user talking about the systemrather than speaking to the system. In this case, Quar-tet is being used as an assistant to control, via voicecommands, the navigation of slides displayed in a pre-sentation. Requests directed to Quartet about naviga-tion among slides arise intermittently during themore dominant stream of ongoing utterances associ-ated with the presentation. In this example, the useris talking about the computer, and, based on a fusionof the user’s language and visual pose, Quartet infersthe user is likely speaking to someone else.

Our ongoing research on mixed-initiative and spo-ken-language systems is focusing on several chal-

58 March 2003/Vol. 46, No. 3 COMMUNICATIONS OF THE ACM

Figure 5. (a) DeepListener’sdeliberation about the target ofspeech and ideal clarificationdialog. The system first makesan expected utility decision toshare in a subtle manner itsthoughts about the possibilitythat it is the target of an utterance. Given additionalrecognitions, it goes ahead toseek clarification, and finallyexecutes an action for the user.(b) Quartet in action. Quartet’spartial recognition is displayedat the top of the display. Thesystem’s belief about the attentional status of the user,with regards to initiating, maintaining, or breaking out of conversational dialogue, is represented as a dynamicallychanging probability distribution.

(a)

(b)

Page 8: Models of ATTENTI Communication: From P r - Eric Horvitzerichorvitz.com/Models_of_attention_in_computing.pdf · Models ofATTENTI Communication: From P r. COMMUNICATIONS OF THE ACMMarch

lenges, including the use of sensed or inferred atten-tion to provide clues about a user’s intentions, thecontent and context at hand, and the nature and idealtiming of the appropriate contributions. This workincludes using sensed or inferred attention to informspeech recognition systems about the specific micro-contexts being addressed with utterances. Such nar-rowing of the spotlight of analysis can be useful forenhancing recognition as it can enable spoken dia-logue systems to swap in the appropriate languagemodels and semantics, and adjust the scope of possi-ble actions. Also, robust solutions to the speech-targetproblem promise to significantly influence the overallsociology of human-computer interaction, by allow-ing users to interact with multiple devices and peoplein their proximity with speech and gestures in a man-ner similar to the way people interact with oneanother.

In another realm of innovation, computers with anability to track and to understand attentional patternsamong people engaged in conversations can providenew kinds of services and facilities. For example,methods for identifying visual attention among par-ticipants in a conversation can be used to automatethe control of cinematography, and to capture, orga-nize, and understand a group meeting or videoconfer-ence. Thus, beyond enhancing human-computerinteraction, sensing and reasoning about attentionpromises to enhance the way we communicate andcollaborate with one another.

ConclusionWe have described efforts to endow computing andcommunication systems with the ability to sense andreason about human attention. After reviewing somebackground on the nature and importance of atten-tion in cognition and discourse, we discussed meth-ods for inferring attention from multiple streams ofinformation, and for leveraging these inferences indecision making under uncertainty. Then, we pre-sented illustrative applications of the use of atten-tional models in cross-device, multichannelmessaging and in mixed-initiative interaction.Research on the use of models of attention in com-puting and communication is still in its youth. Weexpect that continuing refinement of methods forrecognizing, reasoning, and communicating aboutattention will change in a qualitative manner theway we perceive and work with computing systemsand devices.CONTRIBUTORS ON THE CONSTELLATION OF EFFORTS ON THE ATTENTIONALUSER INTERFACE PROJECT INCLUDE JOHNSON APACIBLE, ED CUTRELL, MARYCZERWINSKI, SUSAN DUMAIS, KEN HINCKLEY, DAVID HOVEL, ANDY JACOBS,CARL KADIE, PAUL KOCH, JOHN KRUMM, NURIA OLIVER, TIM PAEK, JOHNPLATT, DANIEL ROBBINS, CHAITANYA SAREEN, JOE TULLIO, MAARTEN VANDANTZICH, AND ANDY WILSON.

References1. Clark, H.H. and Schaefer, E.F. Collaborating on contributions to con-

versations. Language and Cognitive Processes 2/1, (1987), 19–41.2. Cutrell, E., Czerwinski, M., and Horvitz, E. Notification, disruption,

and memory: Effects of messaging interruptions on memory and per-formance. In Proceedings of Interact 2001. IFIP. Conference onHuman-Computer Interaction, Tokyo, Japan, July 2001.

3. Czerwinski, M., Cutrell, E., and Horvitz, E. Instant messaging: Effectsof relevance and time. People and Computers XIV: Proceedings of HCI2000. S. Turner, P. Turner, eds. British Computer Society (2000),71–76.

4. Eriksen, C.W. and Yeh, Y. Allocation of attention in the visual field. J.Experimental Psychology: Human Perception and Performance 11 (1985),583–597.

5. Gillie, T. and Broadbent, D. What makes interruptions disruptive? Astudy of length, similarity and complexity. Psychological Research 50(1989), 243–250.

6. Grosz, B.J. and Sidner, C.L. Plans for discourse. Intentions in Commu-nication. MIT Press, Cambridge, MA (1990), 417–444.

7. Hinckley, K., Pierce, J., Sinclair, M., and Horvitz, E. Sensing tech-niques for mobile interaction. In Proceedings of the ACM UIST 2000Symposium on User Interface Software and Technology. (San Diego, CA,Nov. 2000). ACM Press, NY, 91–100.

8. Hinckley, K. and Horvitz, E. Toward more sensitive mobile phones. InProceedings of the ACM UIST 2001 Symposium on User Interface Soft-ware and Technology. ACM Press, NY, 191–192.

9. Horvitz, E. Principles of mixed-initiative user interfaces. In Proceedingsof the ACM SIGCHI 1999. (Pittsburgh, PA, May 1999). ACM Press,NY, 159–166.

10. Horvitz, E. Principles and applications of continual computation. Arti-ficial Intelligence J. 126 (2001) Elsevier Science, New York, NY,159–196.

11. Horvitz, E., Jacobs, A., and Hovel, D. Attention-sensitive alerting. InProceedings of the 15th Conference on Uncertainty and Artificial Intelli-gence. (Stockholm, Sweden, July 1999). Morgan Kaufmann, San Fran-cisco, CA, 305–313.

12. Horvitz, E., Koch, P., Kadie, C.M., and Jacobs, A. Coordinate: Proba-bilistic forecasting of presence and availability. In Proceedings of the 18thConference on Uncertainty and Artificial Intelligence. (Edmonton,Canada, July 2002) 224–233.

THE COMPLETE SET OF REFERENCES FOR THIS ARTICLE CAN BE ACCESSED ATRESEARCH.MICROSOFT.COM/~HORVITZ/CACM-ATTENTION.HTM

Eric Horvitz ([email protected]) is a senior researcher andgroup manager of the Adaptive Systems & Interaction Group atMicrosoft Research, Redmond, WA.Carl Kadie ([email protected]) is a research software designengineer at Microsoft Research, Redmond, WA.Tim Paek ([email protected]) is a researcher in the AdaptiveSystems + Interaction Group at Microsoft Research, Redmond, WA.David Hovel ([email protected]) is a software developer atMicrosoft Research, Redmond, WA.

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. To copy otherwise, to republish, to post on servers or to redistributeto lists, requires prior specific permission and/or a fee.

© 2003 ACM 0002-0782/03/0300 $5.00

c

COMMUNICATIONS OF THE ACM March 2003/Vol. 46, No. 3 59


Recommended