Date post: | 20-Jan-2016 |
Category: |
Documents |
View: | 215 times |
Download: | 0 times |
2005.03.31 SLIDE 1IS146 – SPRING 2005
Case Study: Cameraphones
Prof. Marc Davis, Prof. Peter Lyman, and danah boyd
UC Berkeley SIMS
Tuesday and Thursday 2:00 pm – 3:30 pm
Spring 2005http://www.sims.berkeley.edu/academics/courses/is146/s05/
IS146:
Foundations of New Media
2005.03.31 SLIDE 2IS146 – SPRING 2005
Lecture Overview
• Review of Last Time– Understanding Visual Media
• Today– Case Study: Cameraphone
• Preview of Next Time– Databases
2005.03.31 SLIDE 3IS146 – SPRING 2005
Lecture Overview
• Review of Last Time– Understanding Visual Media
• Today– Case Study: Cameraphone
• Preview of Next Time– Databases
2005.03.31 SLIDE 4IS146 – SPRING 2005
What Are Comics?
• “Juxtaposed pictorial and other images in deliberate sequence, intended to convey information and/or to produce an aesthetic response in the viewer.” (p. 9)
• How do comics differ from– Photographs?– Movies?– Hieroglyphics?– Emoticons?
2005.03.31 SLIDE 5IS146 – SPRING 2005
Old Comics: Mayan Codex Nuttall
2005.03.31 SLIDE 6IS146 – SPRING 2005
Scott McCloud’s “Big Triangle”
McCloud found that “The Big Triangle” as it came to be known, was an
interesting tool for thinking about comics art...
Picture Plane
Reality Language
2005.03.31 SLIDE 7IS146 – SPRING 2005
Cartoons and Viewer Identification
2005.03.31 SLIDE 8IS146 – SPRING 2005
Closure: From Parts To The Whole
2005.03.31 SLIDE 9IS146 – SPRING 2005
Closure: Bridging Time and Space
2005.03.31 SLIDE 10IS146 – SPRING 2005
Closure in Comics
2005.03.31 SLIDE 11IS146 – SPRING 2005
Types of Closure
• Moment-To-Moment
• Action-To-Action
• Subject-To-Subject
• Scene-To-Scene
• Aspect-To-Aspect
• Non-Sequitur
2005.03.31 SLIDE 12IS146 – SPRING 2005
Questions for Today
• How do we interpret images and sequences of images?
• How do we read different visual representations of the world (especially different levels of realism and abstraction) differently?
• How does what is left out affect how we understand images and sequences of images?
2005.03.31 SLIDE 13IS146 – SPRING 2005
Questions for Today
• What are some of the differences between how text and images function in comics?
• What would be lost/gained in moving between images and text?
2005.03.31 SLIDE 14IS146 – SPRING 2005
Questions for Today
• How could we represent images and sequences of images in order to make them programmable?
• What could computation do to affect how we produce, manipulate, reuse, and understand images and sequences of images?
2005.03.31 SLIDE 15IS146 – SPRING 2005
Lecture Overview
• Review of Last Time– Understanding Visual Media
• Today– Case Study: Cameraphone
• Preview of Next Time– Databases
2005.03.31 SLIDE 16IS146 – SPRING 2005
What is the Problem?
• Today people cannot easily find, edit, share, and reuse digital visual media
• Computers don’t understand visual media content– Digital visual media are opaque and data rich– We lack structured representations
• Without metadata, manipulating digital visual media will remain like word-processing with bitmaps
2005.03.31 SLIDE 17IS146 – SPRING 2005
Signal-to-Symbol Problems
• Semantic Gap– Gap between low-
level signal analysis and high-level semantic descriptions
– “Vertical off-white rectangular blob on blue background” does not equal “Campanile at UC Berkeley”
2005.03.31 SLIDE 18IS146 – SPRING 2005
Signal-to-Symbol Problems
• Sensory Gap– Gap between how an object appears and what it is– Different images of same object can appear
dissimilar– Images of different objects can appear similar
2005.03.31 SLIDE 19IS146 – SPRING 2005
Computer Vision and Context
• You go out drinking with your friends• You get drunk• Really drunk• You get hit over the head and pass out• You are flown to a city in a country you’ve never been to
with a language you don’t understand and an alphabet you can’t read
• You wake up face down in a gutter with a terrible hangover
• You have no idea where you are or how you got there• This is what it’s like to be most computer vision systems
—they have no context
• Context is what enables us to understand what we see
2005.03.31 SLIDE 20IS146 – SPRING 2005
How We Got Here: Disabling Assumptions
1. Contextual (spatial, temporal, social, etc.) metadata about the capture and use of media are not available
• Therefore all analysis of media content must be focused on the media signal alone
2. Media capture and media analysis are separated in time and space
• Therefore removed from their context of creation and the users who created them
3. Multimedia content analysis must not involve humans• Therefore missing out on the possibility of “human-in-the-loop”
approaches to algorithm design and network effects of the activities of groups of users
2005.03.31 SLIDE 21IS146 – SPRING 2005
Where To Go: Enabling Assumptions1. Leverage contextual, sensory-rich metadata
(spatial, temporal, social, etc.) about the capture and use of media content
2. Integrate media capture and analysis at the point of capture and throughout the media lifecycle
3. Design systems that incorporate human beings as interactive functional components and aggregate and analyze user behavior
2005.03.31 SLIDE 22IS146 – SPRING 2005
METADATA
Traditional Media Production Chain
M E T A D A T A
PRE-PRODUCTION POST-PRODUCTIONPRODUCTION DISTRIBUTION
Metadata-Centric Production Chain
2005.03.31 SLIDE 23IS146 – SPRING 2005
Moore’s Law for Cameras2000
Kodak DC40
Nintendo GameBoy Camera
$400
$ 40
2002
Kodak DX4900
SiPix StyleCam Blink
2005.03.31 SLIDE 24IS146 – SPRING 2005
Capture+Processing+Interaction+Network
2005.03.31 SLIDE 25IS146 – SPRING 2005
Camera Phones as Platform
• Media capture (images, video, audio)
• Programmable processing using open standard operating systems, programming languages, and APIs
• Wireless networking• Personal information
management functions• Rich user interaction modalities• Time, location, and user
contextual metadata
2005.03.31 SLIDE 26IS146 – SPRING 2005
Camera Phones as Platform
• In the first half of 2003, more camera phones were sold worldwide than digital cameras
• By 2008, the average camera phone is predicted to have 5 megapixel resolution
• Last month Samsung introduced 7 megapixel camera phones with optical zoom and photo flash
• There are more cell phone users in China than people in the United States (300 million)
• For 90% of the world their “computer” is their cell phone
2005.03.31 SLIDE 27IS146 – SPRING 2005
Campanile Inspiration
2005.03.31 SLIDE 28IS146 – SPRING 2005
Mobile Media Metadata Idea
• Leverage the spatio-temporal context and social community of media capture in mobile devices– Gather all automatically available information at the
point of capture (time, spatial location, phone user, etc.)
– Use metadata similarity and media analysis algorithms to find similar media that has been annotated before
– Take advantage of this previously annotated media to make educated guesses about the content of the newly captured media
– Interact in a simple and intuitive way with the phone user to confirm and augment system-supplied metadata for captured media
2005.03.31 SLIDE 29IS146 – SPRING 2005
Campanile Scenario
Gatheringof Contextual
Metadata
UserVerification
ImageCapture
Gathered Data:• Location Data• Time• Date• Username
Processing Results:• Location City: Berkeley (100%)• Day of Week: Saturday (100%)• Location Name: Campanile (62%)• Setting: Outside (82%)
Verified Information: • Location Name: Campanile (100%)• Setting: Outside (100%)
Metadata (andMedia) Similarity
Processing
Metadata and Media Sharing
and Reuse
2005.03.31 SLIDE 30IS146 – SPRING 2005
From Context to Content
• Context– When
• Date and time
– Where• CellID refined to
semantic place
– Who• Cellphone user
– What• Activity as product of
when, where, and who
• Content– When was the photo
taken?– Where is the subject of
the photo?– Who is in the photo?– What are the people
doing?– What objects are in
the photo?
2005.03.31 SLIDE 31IS146 – SPRING 2005
SPATIAL
SOCIAL
Space – Time – Social Space
TEMPORAL
2005.03.31 SLIDE 32IS146 – SPRING 2005
What is “Location”?
2005.03.31 SLIDE 33IS146 – SPRING 2005
Camera Location vs. Subject Location
• Camera Location = Golden Gate Bridge
• Subject Location = Golden Gate Bridge
• Camera Location = Albany Marina
• Subject Location = Golden Gate Bridge
2005.03.31 SLIDE 34IS146 – SPRING 2005
Kodak Picture Spot
2005.03.31 SLIDE 35IS146 – SPRING 2005
Location Guesser
• Weighted sum of features– Most recently “visited” location– Most “visited” location by me in this CellID
around this time– Most “visited” location by me in this CellID– Most “visited” location by “others” in this
CellID around this time– Most “visited” location by “others” in this
CellID
2005.03.31 SLIDE 36IS146 – SPRING 2005
Location Guesser Performance
• Exempting the occasions on which a user first enters a new location into the system, MMM guessed the correct location of the subject of the photo (out of an average of 36.8 possible locations):– 100% of the time within the first four guesses– 96% of the time within the first three guesses– 88% of the time within the first two guesses– 69% of the time as the first guess
2005.03.31 SLIDE 37IS146 – SPRING 2005
MMM1: Context to Content
Context Content
• When– Network Time
Server• Where
– CellID• Who
– Cellphone ID• What
– Faceted Annotation
2005.03.31 SLIDE 38IS146 – SPRING 2005
From MMM-1 To MMM-2
• MMM-1 asked– “What did I just take a picture of?”
• MMM-2 adds– “Whom do I want to share this picture with?”
CommunityContext
Content
Community
2005.03.31 SLIDE 39IS146 – SPRING 2005
Sharing Metadata
• From contextual metadata to sharing– A parent takes a photo of his child on the
child’s birthday– Whom does he share it with?
• From sharing to content metadata– A birdwatcher takes a photo in a bird
sanctuary and sends it to her birdwatching group
– What is the photo of?
2005.03.31 SLIDE 40IS146 – SPRING 2005
MMM2: Context to Sharing
Context Community
• When– Network Time Server
• Where– CellID– GPS– Bluetooth
• Who– Cellphone ID– Bluetooth– Sharing History
• What– Faceted Annotation– Captions
2005.03.31 SLIDE 41IS146 – SPRING 2005
MMM2: Context to Sharing
2005.03.31 SLIDE 42IS146 – SPRING 2005
MMM2 Interfaces: Phone
2005.03.31 SLIDE 43IS146 – SPRING 2005
MMM2 Interfaces: Web
2005.03.31 SLIDE 44IS146 – SPRING 2005
MMM2 Image Map
2005.03.31 SLIDE 45IS146 – SPRING 2005
More Captures and Uploads
STATS MMM1 MMM2 DIFF
Users 38 40 5%
Days 63 39 -38%
Raw totals
Personal photos uploaded 155 1478 854%
Total photos uploaded 535 1678 214%
Photos not uploaded 108 52 -52%
Average per user per day
Personal photos uploaded 0.06 0.95 1363%
Total photos uploaded 0.22 1.08 381%
Photos not uploaded 0.05 0.03 -26%
Upload failure rate 16.8% 3.0% -82%
2005.03.31 SLIDE 46IS146 – SPRING 2005
Reasons For 13.6 Times Increase
• Better image quality– VGA vs. 1 megapixel image resolution– Night mode for low light– Digital zoom
• Familiarity of the user population with cameraphones– 12 prior cameraphone users this year vs. 1 last year
• The availability of only 1 rather than 2 camera applications in MMM2 vs. MMM1
• Automatic background upload of photos to the web photo management application
• Automatic support for sharing on the cameraphone and on the web
2005.03.31 SLIDE 47IS146 – SPRING 2005
More Sharing With Suggestions
0
20
40
60
80
100
120
140
160
11/2
11/9
11/1
611
/23
11/3
012
/7
UPLOADED SHARED RECEIVED
2005.03.31 SLIDE 48IS146 – SPRING 2005
More Sharing With Suggestions
MMM2 USER BEHAVIOR
BEFORE SHARE
GUESSER
AFTER SHARE
GUESSER DIFF
TOTAL PHOTOS UPLOADED 688
990 144%
TOTAL PERSONAL PHOTOS UPLOADED 688
790 115%
TOTAL PHOTOS SHARED 249
791 318%
TOTAL PERSONAL PHOTOS SHARED 249
591 237%
PERCENTAGE OF PHOTOS SHARED 36% 80% 221%
PERCENTAGE OF PERSONAL PHOTOS SHARED 36% 75% 207%
2005.03.31 SLIDE 49IS146 – SPRING 2005
Sharing Graph
2005.03.31 SLIDE 50IS146 – SPRING 2005
Numberof
Sources100M1 100M
PhotosPer
Source
100K
Scaling Up Photo Sharing100K
2005.03.31 SLIDE 51IS146 – SPRING 2005
MMM3: Context Content Sharing
Context
Community
Content
• When– Network Time Server– Calendar Events
• Where– CellID– GPS– Bluetooth
• Who– Cellphone ID– Bluetooth– Sharing History
• What– Faceted Annotations– Captions– Weather Service– Image Analysis
2005.03.31 SLIDE 52IS146 – SPRING 2005
MMM3 Research Questions
• MMM1– Context Content
• MMM2– Context Community
• MMM3– Community Context– Community Content– Content Context– Content Community
CommunityContext
Content
2005.03.31 SLIDE 53IS146 – SPRING 2005
Social Uses of Personal Photos
• Looking not just at what people do with digital imaging technology, but why they do it
• Goals– Identify social uses of photography to predict
resistances and affordances of next generation mobile media devices and applications
• Methods– Situated video interviews– Review of online photo sites– Sociotechnological prototyping (magic thing,
technology probes)
2005.03.31 SLIDE 54IS146 – SPRING 2005
From What to Why to What
2005.03.31 SLIDE 55IS146 – SPRING 2005
Preliminary Findings
• Social uses of personal photos– Creating and maintaining social relationships– Constructing personal and group memory– Self-presentation– Self-expression– Functional: self and others
• Media and resistance– Materiality– Orality– Storytelling
2005.03.31 SLIDE 56IS146 – SPRING 2005
Photo Examples of Social Uses
2005.03.31 SLIDE 57IS146 – SPRING 2005
Summary
• Cameraphones are a paradigm-changing device for multimedia computing
• Context-aware mobile media metadata will solve many problems in media asset management
• MMM1– Content can be inferred from context
• MMM2– Sharing can be inferred from context
2005.03.31 SLIDE 58IS146 – SPRING 2005
Alex Jaffe on Cameraphone Uses
• Many of the users of cell phone cameras in this paper felt compelled to chronicle very "normal" aspects of their daily life, either to share with others or for personal memories. Do you think the ability to constantly record one's life satisfies an existing desire, or is the technology fulfilling a need it itself inspires in people? Regardless, can you think of examples where technology is used to do something not because there is a need, but simply because it becomes possible?
2005.03.31 SLIDE 59IS146 – SPRING 2005
Alex Jaffe on Cameraphone Uses
• Respondents indicated that one of their favorite features unique to MMM(2) was their ability to send pictures to people immediately after they were taken. This created a sense of immediacy and "being there" in the viewer. How is communicating in this way reminiscent of orality, albeit in visual form? Might this be an important part of secondary orality in times to come?
2005.03.31 SLIDE 60IS146 – SPRING 2005
Magen Farrar on Context-To-Content• “Context-to-content” inferencing promises to
solve the problems of the sensory and semantic gaps in multimedia information systems...By using the spatio-temporal-social context of image capture, we are able to infer that different images taken in the vicinity of the Campanile are very likely of the Campanile at UC Berkeley and know that they are not of, for example, the Washington Monument... So, how is the system of “context to content” inferencing changing to allow deciphering, or specifics, between similar content within the same context?
2005.03.31 SLIDE 61IS146 – SPRING 2005
Magen Farrar on Context-To-Content
• Sharing metadata is exceptionally useful in inferring media content from context, but can potentially violate one's privacy. Other than the opt-in/opt-out mechanisms in the system, what other steps are being thought of to assure the preservation of privacy while sharing information in the Mobile Media Metadata system?
2005.03.31 SLIDE 62IS146 – SPRING 2005
Lecture Overview
• Review of Last Time– Understanding Visual Media
• Today– Case Study: Cameraphone
• Preview of Next Time– Databases
2005.03.31 SLIDE 63IS146 – SPRING 2005
Readings for Next Week
• Tuesday (Guest Lecture by Dr. Frank Nack)– Lev Manovich. Database as a Symbolic Form. 1999,
p. 1-16. http://www.manovich.net/DOCS/database.rtf• Discussion Questions
– Dorian Peters– Joshia Chang
• Thursday (Guest Lecture by Prof. Yehuda Kalay)– Steve Harrison and Paul Dourish. Re-Place-ing
Space: The Roles of Place and Space in Collaborative Systems. in: Proceedings of ACM Conference on CSCW. New York: ACM Press, 1996, p. 67-76.
• Discussion Questions– Vlad Kaplun– Annie Chiu