������
On Grounding Human Communication with ���Human-Computer Interaction Designs
Hao-‐Chuan Wang . 王浩全 Department of Computer Science Ins3tute of Informa3on Systems and Applica3ons Na3onal Tsing Hua University h-p://www.cs.nthu.edu.tw/~haochuan May 26, 2014 @ Department of Communica3on and Technology, Na3onal Chiao Tung University
������
������
Wang
A Quick Overview of Human-Computer Interaction (HCI)
2
������
The two “senses” of Human-‐Computer Interac7on: From interface …
“Interac<on” in the sense of computers listening and responding to people’s input
������
… to problem solving and value crea7on in the real world
“Interac<on” in the sense of designing technologies based on user needs, goals, constraints, and characteris<cs. UCD: User-‐Centered Design.
Iden7fying & fixing usability problems
Technology supported educa7on
Persuasive (behavioral change) compu7ng
������
������
Wang
HCI: Studying Existing and Possible Relationships between Computers and People
5
ACM SIGCHI Curricula 1996
������
������
Wang
30 Years of the HCI Community
6
ACM SIGCHI: 9 Turing Award Winners / 188 ACM Fellows
http://dl.acm.org/sig.cfm?id=SP923
������
������
Wang
What’s Changing in HCI Today?
Big picture is s<ll there, but: • More emphasis is on use contexts and
applica<ons. • Computers are of many forms, doing all
sort of things. • Compu<ng is not necessarily done
by silicon chips computers.
-‐ Input and output are versa<le. Not necessarily “keyboard and mouse”, “text, speech or graphics”
-‐ Collabora<on and social. Not necessarily “one human, one computer”.
7
������
������
Wang
Computer-Mediated Communication (CMC)
8
������
������
Wang
What’s the longest distance in the world? 世界上最遠的距離是什麼?
9
������
������
Wang 10
������
������
Wang
Supporting Human Communication
Communica<on in the sense of data transmission across physical distance is not that hard today
• Wired and wireless computer networking, internet etc. Communica<on, in the sense of understanding each other, or crossing the “psychological distance” between people remains hard • Difficul<es in expressing or understanding thoughts • Barriers between genera<ons, genders, professions,
languages, and cultures. Suppor<ng human communica<on con<nues to be a challenging yet worth-‐of-‐pursuing topic in HCI.
11
������
������
Wang
Supporting Human Communication
Communica<on in the sense of data transmission across physical distance is not that hard today
• Wired and wireless computer networking, internet etc. Communica<on, in the sense of understanding each other, or crossing the “psychological distance” between people remains hard • Difficul<es in expressing or understanding thoughts • Barriers between genera<ons, genders, professions,
languages, and cultures. Suppor<ng human communica<on con<nues to be a challenging yet worth-‐of-‐pursuing topic in HCI.
12
������
������
Wang
Ultimate Goal? Mind-Connecting!
13
������
������
Wang
Lost in Technologies However, technology development does not always approach
the goal effec<vely. For example: Video conferencing • Bandwidth-‐demanding. Video lagging
that disrupts conversa<on • Adop<on is not guaranteed .
Privacy and other social concerns
Machine transla<on • Quality concern • Influent second language can beat
MT (cf. Yamashita & Ishida, 2006).
14
������
������
Wang
Observation
Designs of CMC can work be-er when features and constraints of human communica<on are inves<gated and considered.
Ex. Awareness indicator that makes “typing” visible in instant messaging.
Basic research stays relevant! What are the features of successful and unsuccessful communica<on? What’s the nature of “understanding”?
15
������
������
Wang
Grounding Communication
16
������
������
Wang
How Would You Describe…
Where you live in Hsinchu? Where you lived when you were in U.S.?
17
������
������
Wang
My Answer
Where you live in Hsinchu? Near 清大後門. Where you lived when you were in U.S.? In Ithaca, a college town in the middle of New York state if you know where it is. It’s where Cornell University is located.
18
������
������
Wang
My Answer
Where you live in Hsinchu? Near 清大後門. Where you lived when you were in U.S.? In Ithaca, a college town in the middle of New York state if you know where it is. It’s where Cornell University is located. Do you see the general difference? Why?
19
������
������
Wang
My Answer
Where you live in Hsinchu? Near 清大後門. Where you lived when you were in U.S.? In Ithaca, a college town in the middle of New York state if you know where it is. It’s where Cornell University is located. Do you see the general difference? Why? The amount of knowledge that we shared. 20
������
������
Wang
Common Ground
21
Knowledge, beliefs, aitudes we share, and know that we share, and know that we know that we share, influence how we use language to communicate. Grounding: Interac<ve process by which communicators exchange evidence of their understanding to arrive at the state of common ground.
Herbert Clark
������
������
Wang
Evidence of Common Ground
Physical co-‐presence (being co-‐located) • “close that door”
Shared community membership • “Let’s meet at 小七”
Linguis<c co-‐presence (can access same u-erances)
22
������
������
Wang
Evidence of Common Ground
Physical co-‐presence (being co-‐located) • “close that door”
Shared community membership • “Let’s meet at 小七”
Linguis<c co-‐presence (can access same u-erances)
23
“What’s this?”
������
������
Wang
Grounding is a Collaborative Process
24
������
������
Wang
The Role of Media: Affordances
An influen<al HCI-‐rooted concept, which roughly means “ac<on-‐permiing proper<es” of objects that people see • Chair affords siing • Door-‐knob affords door-‐opening • Virtual keyboard affords typing (but is this trivial?)
25
Don Norman
������
������
Wang
Affordances of Communication Media
26
������
������
Wang
Technology Changes Grounding
Affordances of media constrain how people may interact with one another • E.g., if no visibility, impossible to use head-‐nodding as a technique for grounding
People may learn to adapt their grounding behaviors
(this happens. E.g., emo<cons in IM) or Design new CMC tools with useful proper7es to support
grounding and communica7on.
27
������
28
������
使⽤用體感裝置探討在電腦中介傳播下之⼿手勢使⽤用⾏行為
使用電腦作為訊息傳遞媒介進行人與人間的溝通已經是一個普遍的現象,我們亟需瞭解以電腦為中介之溝通模式與面對面溝通的模式之間到底有那些差異,對於人際溝通的影響為何。過去這方面的研究多著重在媒介的性質對於信任及語言使用的影響。對於非語言的溝通行為,例如溝通手勢的使用則探討有限,其中一個原因在於缺少可快速有效量測細微手勢的方法。本論文提出一個應用技術,利用體感裝置Microsoft Kinect來捕捉人與人溝通時肢體動作細微的變化。透過對肢體移動速度的分析以及多重特徵值的截取,我們得以實驗比較面對面溝通(Face-to-Face)、視訊通訊(Video)與音訊通訊(Audio)三種不同媒介對於溝通手勢行為所產生的影響,包括了手勢使用的程度以及兩個溝通者間行為的相似度。此運用體感裝置作為行為科學量測工具的方法可用於快速評估新設計之線上溝通介面對於溝通行為的影響,亦可用於傳播理論研究之發展與探討。在設計上,所提出之資料收集與分析方法亦可能作為未來電腦中介傳播工具設計的基礎。
Microsoft Research Asia UR Project: FY13-RES-OPP-027
Wang, H-C., & Lai, C-T. (accepted). Kinect-taped Communication: Using Motion Sensing to Study Gesture Use and Similarity in Face-to-Face and Computer-Mediated Brainstorming. ACM Conference on Human Factors in Computing Systems (CHI) 2014. Full paper. [Acceptance rate: 22.8%]
Kinect-taped Communication: ���Using Motion Sensing to Study Gesture Use ���and Similarity in Face-to-Face and ���Computer-Mediated Brainstorming
Hao-Chuan Wang, Chien-Tung Lai National Tsing Hua University, Taiwan
[cf. Bos et al., 2002; Setlock et al., 2004; Scissors et al., 2008, Wang et al., 2009]
Computer-mediated communication (CMC) tools are prevalent, but are they all equal?�• Ex. Video vs. Audio���Media properties influence aspects of communication differently�• Task performance, grounding, styles, similarity of
language patterns, social processes and outcomes etc.
How media influence communication?
Communication could be more than speaking.�Both verbal and non-verbal channels are active
during conversations.�
Facial expression
Gesture
[cf. Goldin-‐Meadow, 1999; Giles & Coupland, 1991 ]
The (missing) non-verbal aspect in CMC research
Studying gesture use in communication Current methods:�
• Videotaping with manual coding.�• Giving specific instructions to participants �
(e.g., to gesture or not).�• Using confederates etc.�
Problems to solve:�• High cost. Labor-intensiveness.�• Resolution of manual analysis- �
Hard to recognize and reliably label small movements.�• Scalability-�
Hard to study arbitrary communication in the wild.�
“Kinect-taping”method Like videotaping, we use motion sensing devices, such as Microsoft Kinect, to record hand and body movements during conversations.�
• Detailed, easier-to-process representations.�• Behavioral science instrument (“microscope”) to
study non-verbal communication in ad hoc groups.�• Low cost if automatic measures are satisfactory.�
Re-appropriating motion sensors in HCI: Sensing-aided user research for ���future designs From sensors as design elements to sensors as research instruments to help future designs.�
!
!(a)!Face(to(face!(F2F)!communication! !
(b)!Video(mediated!communication!
Figure'1.'A'sample'study'setting'that'compares'(a)'F2F'to'(b)'video<mediated'communication'by'using'Kinect'as'a'behavioral'science'instrument.'
!
[cf. Mark et al., 2014]
A media comparison study Investigate how people use gestures during face-to-face and computer-mediated brainstorming��Compare three communication media�
• Face-to-Face�• Video�• Audio�
!
!(a)!Face(to(face!(F2F)!communication! !
(b)!Video(mediated!communication!
Figure'1.'A'sample'study'setting'that'compares'(a)'F2F'to'(b)'video<mediated'communication'by'using'Kinect'as'a'behavioral'science'instrument.'
!
Hypotheses
H1. Visibility increases gesture use� Proportion of gesture� Face-to-Face > Video > Audio�
H2. Visibility increases accommodation Similarity between group members’ gestures�
Face-to-Face > Video > Audio�
Also explore how gesture use, level of understanding, and ideation productivity correlate.
[cf. Clark & Brennan, 1991]
[cf. Giles & Coupland, 1991]
Experimental design
36 individuals, 18 two-person groups�
�Kinect-taped group brainstorming sessions�
�����
Face-to-Face Video Audio
Three trials (15 min each) in counterbalanced order
Data analysis�Amount and similarity of gestures, �
Level of understanding, Productivity�
How to quantify gestures? How many gestures are there in a 15 min talk?
moving
not moving
Two unit motions with speed threshold 0
Three unit motions with speed threshold 2
Choose the thresholds
(m/s)
Choose the thresholds
Too few signals Almost everything
Data points of interest (m/s)
How to measure similarity between unit motions?
Feature extraction and representation Unit motions are represented as feature vectors�
• Time length, path length, displacement, �velocity, speed, angular movement etc.�
• Features extracted for both hands and both elbows.�
73 features extracted for each unit motion.��Similarity between unit motions: Cosine value between the two vectors.��
Validating the similarity metric
1 2
3
Machine Ranking
Human Ranking
1 2
3
Randomly select motion queries
Retrieve similar and dissimilar motions
Kinect-taped motion database
Count Human Rank
R1 R2 R3
Machine Rank�
R1 29 2 5
R2 7 27 2
R3 0 7 29
x2=107.97, p<.001
Validating the similarity metric
Contingency analysis
H1: Amount of gesture use�
H2: Similarity between group members�
�
Associations�• Amount of gesture and understanding�• Amount of gesture and ideation productivity�• Gesture similarity and ideation productivity��
Key Results
Visibility on proportion of gesture use
0
2
4
6
8
10
12
14
16
Face-to-face Video Audio
Prop
ortio
n of
Ges
ture
Use
(%
)
H1 not supported. Media did not influence percentage of gesture. �People gesture as much in Audio as in F2F and Video.�
Association between self-gesture and level of understanding
Mod
el&Predicted
,Und
erstanding�
Mod
el&Predicted
,Num
ber,o
f,Ide
as�
Propor9on,of,Individual’s,Own,Gesture,Use,(%)�
Mod
el&Predicted
,Und
erstanding�
Mod
el&Predicted
,Num
ber,o
f,Ide
as�
Propor9on,of,Individual’s,Own,Gesture,Use,(%)�
Audio�
F2F�
Video�
Individual’s Own Gesture Use (%)�
Non-communicative function of gesture. ��Understanding correlates with �self-gesture but not partner-gesture��Stronger correlation with reduced or no visibility.��
Similarity between group members
0.46
0.47
0.48
0.49
0.5
0.51
0.52
0.53
0.54
0.55
Face-to-face Video Audio
Betw
een-
part
icip
ant
Ges
tura
l Si
mila
rity
H2 supported. Similarity F2F > Video > Audio. �People gesture more similarly when they can see each other.�
Summary and implications
Media
Comparison Study
Kinect-taping
Method��
Motion sensing for studying non-verbal behaviors in CMC.�
Summary and implications
Media
Comparison Study
Kinect-taping
Method��
Visibility influences similarity but not amount of gesture.��Only self-gesture correlates with understanding.��Gesture doesn’t seem to convey much meaning to the partner. Seeing the partner is not crucial to understanding.���
Study communication of ad hoc groups�in the wild. ��Distributed deployment�study of CMC tools.��Cross-lingual and cross-cultural communication.�
Summary and implications (cont.)
Media
Comparison Study
Kinect-taping
Method��
The value of video may be relatively limited to the social and collaborative aspect (similarity etc.).��Feedback that promotes self-gesturing may help understanding.��
Effects of Interface Interactivity on Collecting Language Data to Power Dialogue Agents �
Hao-Chuan Wang, Tau-Heng Yeo, Hsin-Hui Lee, Ai-Ju Huang���National Tsing Hua University, Hsinchu, Taiwan Jia-Jang Tu, Sen-Chia Chang Industrial Technology Research Institute, Hsinchu, Taiwan
“What’s the top-grossing movie in 2012?”
“Let me see... The Avengers.”
“The top-grossing movie in 2012 is The Avengers”
Young, S., Keiser, S. & Gašić, M. Spoken Dialogue Management using Partially Observable Markov Decision Processes
Spoken Dialogue Systems�
ChiCHI 2014 | Effects of Interface Interactivity on Collecting Language Data to Power Dialogue Agents
Young, S., Keiser, S. & Gašić, M. Spoken Dialogue Management using Partially Observable Markov Decision Processes
How to collect more natural language responses?
Language Genera<on Task�
ChiCHI 2014 | Effects of Interface Interactivity on Collecting Language Data to Power Dialogue Agents
Some Exis<ng Methods� • One-on-one interviews to get the responses
from people - Manual data collection. - Expensive.
• Using surveys with specific instructions, “Imagine that you’re answering people’s questions …” - Less expensive. - Non-interactive, “imagined interaction”.
ChiCHI 2014 | Effects of Interface Interactivity on Collecting Language Data to Power Dialogue Agents
Idea: Using an Interac<ve Chat Bot to
Elicit Natural Responses�
ChiCHI 2014 | Effects of Interface Interactivity on Collecting Language Data to Power Dialogue Agents
ChiCHI 2014 | Effects of Interface Interactivity on Collecting Language Data to Power Dialogue Agents
Anthropomorphic features: ü Greet workers ü Simulate human typing delays ü Wait for response
Sta<c Interface�
ChiCHI 2014 | Effects of Interface Interactivity on Collecting Language Data to Power Dialogue Agents
Crowdsourcing Answer Genera<on�
Evalua<on
Compare interactive and static interface
Crowdsourcing to select quality responses
Evaluate the results with end users
ChiCHI 2014 | Effects of Interface Interactivity on Collecting Language Data to Power Dialogue Agents
Stage1 : Creation
Stage2 : Aggregation
Evaluation Stage
PTT A BBS System and Online
Community in Taiwan
MTurk
Mul<lingual Crowdsourcing Study�
ChiCHI 2014 | Effects of Interface Interactivity on Collecting Language Data to Power Dialogue Agents
Chinese and English versions of ads and task instructions are prepared for crowdsourcing
Stage 1 : Answer Creation • 223 workers
- 122 from MTurk - 101 from PTT
Stage 2 : Answer Aggregation • 222 workers
Evaluation • 165 workers
98 from Mturk 67 from PTT
ChiCHI 2014 | Effects of Interface Interactivity on Collecting Language Data to Power Dialogue Agents
Key Results�
ChiCHI 2014 | Effects of Interface Interactivity on Collecting Language Data to Power Dialogue Agents
Interac<ve vs. Sta<c Interface�
• 73.6% of comments show preference for working with the interactive chat bot.
• Increasing the satisfaction of workers (Kittur, A., et al. 2013)
ChiCHI 2014 | Effects of Interface Interactivity on Collecting Language Data to Power Dialogue Agents
Stage1 : Creation Stage2 : Aggregation Evaluation Stage
Interac<ve vs. Sta<c Interface�
“Chat is much fun and more likely to make me think, while questionnaire is
more standardized, like an exam.”
“the chat interface is much better. it recognizes the text entered in real time and
responds accordingly with artificial intelligence and recognition. very nice”
ChiCHI 2014 | Effects of Interface Interactivity on Collecting Language Data to Power Dialogue Agents
Stage1 : Creation Stage2 : Aggregation Evaluation Stage
Interac<ve vs. Sta<c Interface�
• 73.6% of comments show preference for working with the interactive chat bot.
• Increasing the satisfaction of workers (Kittur, A., et al. 2013)
ChiCHI 2014 | Effects of Interface Interactivity on Collecting Language Data to Power Dialogue Agents
Stage1 : Creation Stage2 : Aggregation Evaluation Stage
Mturk vs. PTT : Language� • Two platforms are highly language-specific.
0 10 20 30 40 50 60 70 80 90
100 110 120
Chinese Recruitment
Ads (PTT)
English Recruitment
Ads (PTT)
Chinese Recruitment Ads (MTurk)
English Recruitment Ads (MTurk)
Answer in English
Answer in Chinese
ChiCHI 2014 | Effects of Interface Interactivity on Collecting Language Data to Power Dialogue Agents
Stage1 : Creation Stage2 : Aggregation Evaluation Stage
• Cultural Differences. (Nisbett, R., 2003 & Hall, E. T.,1977).
Evalua<on: Ul<mate User Experience�
ChiCHI 2014 | Effects of Interface Interactivity on Collecting Language Data to Power Dialogue Agents
Stage1 : Creation Stage2 : Aggregation Evaluation Stage
3.5
3.0
2.5
Enjo
yab
ility
Answers collected w/ Interactivity
Answers collected w/ questionnaire
Chinese English
Conclusion� • Present an interactive chat bot-based
interface for crowdsourcing language generation tasks for building natural dialogue agents.
• Interactivity lead to higher worker satisfaction, and better perceived enjoyability by Chinese-speaking users.
• Also, identified language specificity of crowdsourcing platforms. Helps to inform crowdsourcing practices.
ChiCHI 2014 | Effects of Interface Interactivity on Collecting Language Data to Power Dialogue Agents
Thank you for your listening. �
Acknowledgement This study is partially supported by Project D352B24310 and conducted at ITRI under the sponsorship of the Ministry of Economic Affairs, Taiwan.
//// ////
Contact Hao-Chuan Wang [email protected] Ai-Ju (Ivy) Huang [email protected]
ChiCHI 2014 | Effects of Interface Interactivity on Collecting Language Data to Power Dialogue Agents
������
������
Wang
Key Messages Suppor<ng human communica<on con<nues to be an important
topic in HCI, both to research and design prac<ce. • Focusing on how to shorten the “psychological distance” between people. “Mind-‐connec<ng”!
Basic and applied behavioral, cogni<ve and social sciences helps to understand the features of successful and unsuccessful communica<on • Insight that we should focus on CMC affordances as much as technicality.
Interdisciplinary work can benefit both sides: Social and behavioral sciences help technology design, and vice versa.
76
������
������
Wang
Ultimate Goal? Mind-Connecting!
77
������
������
Wang 78
國⽴立清華⼤大學⼈人機合作與社群運算實驗室 NTHU Collabora<ve and Social Compu<ng Lab (CSC Lab)
Acknowledgement for Support from Ministry of Science and Technology, Taiwan 科技部
Google Inc. 美國Google總部 Microsov Research Asia 微軟亞洲研究院
Industrial Technology Research Ins<tute (ITRI) ⼯工業技術研究院 Delta Corp 台達電⼦子公司
Na<onal Science Founda<on, USA 美國NSF