Challenging 5 Common Assumptions about
Videoconferencing
Milton ChenComputer Systems Lab
Stanford University
Presented at Internet2 Advanced Applications Track 10/28/2002
Copyright 2002 Milton Chen
The Stanford Video Auditorium
desktop interface
15’ x 5’ video wall
Copyright 2002 Milton Chen
Video Auditorium publicity/usersIntel president Paul Otellini’s Intel Developer Forum keynoteInvited demo to NASA headquarters for Paul G. Pastorek
CANARIE, CanadaCUDI, MexicoComdex, BrazilIBM Almaden LabManhattan College Hopkins Marine Station Stanford Medical SchoolStanford Learning LabStanford Center for Design ResearchBerkeley Bioengineering LabUniversidade Federal do Rio Grande do Sul, Brazil
OutlineCommon assumptions
– Technology1. High-fidelity AV requires dedicated hardware2. Difficult to install and use
– Human factors3. Life size displays are ideal4. Floor control requires interactive frame rate5. Eye contact is difficult
Beyond MCU and H323– Peer-to-peer– Stanford’s Port Bootstrap Protocol– Personal directory
An evaluation of distance learning at StanfordWhy videoconferencing is not ubiquitous
1. High-fidelity low-latency AV requires
dedicated hardware
Copyright 2002 Milton Chen
$700 Pentium 4 computer $7000 systemsoutperforms
Your PC outperforms all dedicated systems
Comparison of videoconferencing solutions
Max number of links
Max video resolution
BW required at 352x288 15fps
NetMeeting 1 352x288 200 Kbps
WIDE DVTS 1 720x480 3000 Kbps
Vbrick 1 720x480 2000 Kbps
Polycom, Sony, … 4 352x288 200 Kbps
AccessGrid, VRVS many 720x480 400 Kbps
Stanford Video Auditorium
16 to more than 100
720x480 100 Kbps
* CUSeeME, iVisit, Yahoo messenger have unacceptable latency
demo
* TrueSpeech 8.5* MPEG-4* Encrypted, AES (Rijndael), streaming* Simultaneous AV recording* Perceptual streaming adapts to network conditions
A scalable AV streaming architecture
audiocapture
audiocompress
audiosend
audioreceive
audiodecompress
audiorender
videocapture
videocompress
videosend
videoreceive
videodecompress
videorender
Copyright 2002 Milton Chen
Copyright 2002 Milton Chen
Beyond MCU and H323
MCU vs. peer-to-peer– Scalability– Ease of deployment
H323 vs. Stanford’s Port-Bootstrap Protocol– Firewall– Ease of deployment
Personal directory
2. Videoconferencing systems are difficult to install and use
Copyright 2002 Milton Chen
One click operationTo use the Video Auditorium
– “Nothing” to install– One click on the html speed dial
<OBJECTCLASSID="CLSID:E80F7B8F-7906-4A89-B59E-B19871F474A9"
CODEBASE="runtime/VA_Start.ocx#Version=-1,-1,-1,-1"> <PARAM NAME="addr" VALUE="stanford -client_only"></OBJECT>
Makes conferencing as simple as surfing the web
3. Life size displays are ideal
Copyright 2002 Milton Chen
Each video should be between 6° and 14° wide
smile recognition time
0
350
700
0 10 20 30
video size (deg of visual angle)
time
(mse
c)
* 12 people sat 10’ from the display Subjectively, people reported 6° as minimum and 14° as ideal. Life size is 12°.
Copyright 2002 Milton Chen
Balance between size and head movements
* 12 people viewed 9 and 36 students on a large and immersive display. Immersive display requires head movements to see all the students.
0%
50%
100%
9 students 36 students
class size
pre
fere
nce
immersive (64°)
large (27°)
9°
14°
7°
4°
4. Effective floor control requires
interactive frame rate
Copyright 2002 Milton Chen
Minimum required frame rate
Interactive 10 fps
Tolerable 5 fps– [Tang and Isaac ’93]
Lip synchronization 5 fps– [Watson and Sasse ’96]
Content understanding 5 fps– [Ghinea and Thomas ’98]
Sign language recognition 1 fps– [Johnson and Caird ’96]
Copyright 2002 Milton Chen
Gesture Detection Algorithm
input image frame difference after erosion
Visualization of algorithm
Copyright 2002 Milton Chen
Requires 10% of full motion bandwidth
0
25
50
75
100
0 100 200 300
time (frame number)
fram
e s
ize (
kb
its)
0
25
50
75
100
0 100 200 300
time (frame number)
fram
e s
ize (
kb
its)
full-motion (10 fps)
gesture-sensitive (0.2 fps)
* MPEG4 encoded at 320x240
Copyright 2002 Milton Chen
Gesture sensitive allows dynamic discussion
15 fps ~0.2 fps 0.2 fps
0
1
2
3
4
5
full motion gesture sensitive low update
spea
ker c
hang
e per
min
ute )
* 8 groups of 4 people during a discussion
5. Eye contact is difficult
Copyright 2002 Milton Chen
Eye contact fires up our brain
[Kampe et al. ’01]
Copyright 2002 Milton Chen
Eye contact is difficult
Looking into the camera Attempting eye contact
Copyright 2002 Milton Chen
Solutions to eye contact
Half-silvered mirror [Rosenthal ’47] MAJIC [Okada, et al. ’94]
ClearBoard [Ishii, et al. ’92]GazeMaster [Gemmell, et al. ’00]
Copyright 2002 Milton Chen
A simple solution
Hydra [Sellen, Buxton, and Arnott ’92]
Copyright 2002 Milton Chen
Eye contact sensitivity is high
Spatial perception task
As good as Snellen acuity[Gibson and Pick ’63]
2 m
0 8.5-8.50
100stdev = 2.8°
Eye
con
tact
(%
)
Angle (deg)
* 6 observers judged 1 looker
looker observer
Copyright 2002 Milton Chen
Sensitivity is symmetricCline ’67
Kruger and Huckstedt ‘69
Anstis, et al. ’69
Stokes ’69
Ellgring ’70
PicturePhonecamera above display
Hydracamera below display
Copyright 2002 Milton Chen
Methodology
* Two rooms can be linked in a videoconferencing session
Observers watch videos of looker and judge eye contact
large display with camera at the center
Record lookers gazing at different targets
Copyright 2002 Milton Chen
Sensitivity is asymmetric
* 16 observers judged recorded videos of 1 looker
Copyright 2002 Milton Chen
An anatomical explanation
looking at you looking sideways
looking up
looking down eye closing
Illustrations from The Artist’s Guide to Facial Expression[Faigin ’90]
Copyright 2002 Milton Chen
Sensitivity is less in conversation
0
25
50
75
100
0 5 10 15visual angle (deg)
eye
cont
act (
%)
* 16 observers judged videos of 1 looker
(down)
recorded
conversation
Copyright 2002 Milton Chen
Sensitivity is less in video
0
25
50
75
100
0 5 10 15visual angle (deg)
eye
cont
act (
%)
* 16 observers judged 1 looker in conversation
(down)
face-to-face
video
Copyright 2002 Milton Chen
We are biased to perceive contact
angle
eye
cont
act (
%)
sideway,up down
down &video
down &video &conversation
Snellen Acuity Conferencing Acuity
0
100
Copyright 2002 Milton Chen
Maximum camera to eyes distance
* Assuming a sensitivity of 7°
device minimum viewing distance
camera to rendered eyes distance
Palm held 1’ 1.5”
Desktop 2’ 3”
Wall size 8’ 12”
Copyright 2002 Milton Chen
Eye contact in the Video Auditorium
Why is videoconferencing essential to distance learning:
An evaluation of distance learning at Stanford
Copyright 2002 Milton Chen
Distance learning at Stanford
Remote students can call in during class
Instructor cannot see the remote students
a 1969 classroom
a 2002 operator console
a 2002 lecture viewer
Copyright 2002 Milton Chen
Students like distance learning
Attitude toward distance learning
0%
50%
100%
students TAs faculty
enjoy
does not matter
dislike
other
* 120 students, 15 TAs, and 41 faculty
Copyright 2002 Milton Chen
Learning is less effective
Learning outcome
0%
50%
100%
students TAs faculty
increasegreatly
increasesomewhat
does notchange
decreasesomewhat
decreasegreatly
* 120 students, 15 TAs, and 41 faculty
Copyright 2002 Milton Chen
F2F interaction is important
Importance of f2f interaction
0%
50%
100%
students TAs faculty
extremely
very
moderately
somewhat
not
F2F is important for lecturing and crucial for discussions
Copyright 2002 Milton Chen
No interaction with remote students
Classroom observation of 4 CS classes– Instructor on average asked 9 questions per
session– Local students on average asked/made 3
questions/comments per session
– Remote students spoke once in 6 month
Copyright 2002 Milton Chen
Value of video beyond audio
Cues only transmitted by the visual channel– Negative feedbacks, …
Emotional bond– Establishing and maintaining relationships
Can you imagine it?– A new face, …
A proposal
The world’s largest video wall:link all Internet2 members for Spring 03
Developed technologyOne Mouse
AV stream migration
Bandwidth: 2 x 300 x (100 Kbps + 10 Kbps) 60Mbps
Cost: 10 P4 laptops + 10 portable projectors $30K
A prediction
Copyright 2002 Milton Chen
A plane that does not fly is not a plane
First flight, Wrights 1903
A videophone that limits communication is not a videophone• poor audio fidelity• poor video fidelity• excessive latency• no eye contact• poor lip synchronization
Why all videoconferencing products has failed
Copyright 2002 Milton Chen
Threshold of quality for the 2nd revolution
first mobile phone, 1924 first handheld phone, 1973
1st Revolution: Possible 2nd Revolution: Practical
first videoconferencing system, 1927
Copyright 2002 Milton Chen
Conclusion
Common assumptions1. High-fidelity AV requires dedicated hardware higher on a PC
2. Difficult to install/use one click
3. Life size displays are ideal 6° to 14°
4. Floor control requires at least 10fps 0.2 fps avg
5. Eye contact is difficult 7° down
Videoconferencing is essential to distance learning
A MCU-less and H323-less future
You already have a one-click high-fidelity multiparty
videoconferencing system
We are at the dawn of a videoconferencing revolution that will fuel the demand for a 1000X increase in available bandwidth
Acknowledgement– NASA– Intel– Sony– Interval Research– Wallenberg Global Learning Network– Department of Defense
Future work– Gold release for Feb 2003– SDK– The Wall