Sensible Visual Search

transcript

Shih-Fu Chang

Digital Video and Multimedia Lab

Columbia University

www.ee.columbia.edu/dvmmJune 2008

(Joint Work with Eric Zavesky and Lyndon Kennedy)

digital video | multimedia lab

User Expectation for Web Search

“…type in a few words at most, then expect the engine to bring back the perfect results. More than 95 percent of us never use the advanced search features most engines include, …”

– The Search, J. Battelle, 2003

“…type in a few words at most, then expect the engine to bring back the perfect results. More than 95 percent of us never use the advanced search features most engines include, …”

– The Search, J. Battelle, 2003

Keyword search is still the primary search method

Straightforward extension to visual search

Keyword-based Visual Search Paradigm

Web Image Search Text Query : “Manhattan Cruise” over Goggle Image

What are in the results? Why are these images returned? How to choose better search terms?

Minor Changes in Keywords Big Difference Text Query : “Cruise around Manhattan”

When metadata are unavailable:Automatic Image Classification

Audio-visual features Geo, social features SVM or graph models Context fusion

Rich semantic description based on content analysis

Statistical models

Semantic Indexes

AnchorSnowSoccerBuildingOutdoor

A few good detectors for LSCOM conceptswaterfront bridge crowd explosion fire US flag Military personnel

Remember there are many not so good detectors.

Keyword Search over Statistical Detector Scores

www.ee.columbia.edu/cuvidsearch

Columbia374: Objects, people, location,

scenes, events, etc

Concepts defined by expert analysts over news video

Query “car crash snow” over TRECVID video using LSCOM concepts

How are keywords mapped to concepts?

What classifiers work? What don’t? How to improve the search terms?

Frustration of Uninformed Users of Keyword Search

Difficult to choose meaningful words/concepts without in-depth knowledge of entire vocabulary

Pains of Uninformed Users Forced to take “one shot” searches, iterating

queries with a trial and error approach...

Challenge: user frustration in visual search

A lot of work on content

analytics

? Research needed to address

user frustration

Proposal: Sensible SearchMake search experience more Sensible

Help users stay “informed” in selecting effective keywords/concepts in understanding the search results in manipulating the search criteria

rapidly and flexibly Keep users engaged

Instant feedback with minimal disruptionas opposed to “trial-and-error”

A prototype CuZero: Zero-Latency Informed Search & Navigation

Informed User: Instant Informed Query Formulation

Informed User for Visual Search:Instant visual concept suggestion

query time concept mining

Instant Concept

Suggestion

Instant Concept

Suggestion

Lexical mapping

Mapping keywords to concept definition, synonyms, sense context, etc

Co-occurrent concepts

roadroad

carcar

Basketball courts and the American won the Saint Denis on the Phoenix Suns because of the 50 point for 19 in their role within the National Association of Basketball

George Rizq led Hertha for the local basketball game the wisdom and sports championship of the president

Baghdad to attend the game I see more goals and the players did not offer great that Beijing Games as the beginning of his brilliance Nayyouf 10 this atmosphere the culture of sports championship

imagesimages

texttext

resultsresultsresultsresults

visual miningvisual miningvisual miningvisual miningdominant conceptsdominant conceptsdominant conceptsdominant concepts

personperson

suitssuits

Query-Time Concept Mining

CuZero Real-Time Query Interface (demo)

Instant Concept

Suggestion

Instant Concept

Suggestion

Auto-completespeech transcripts

A prototype CuZero: Zero-Latency Informed Search & Navigation

(Zavesky and Chang, MIR2008)

Informed User:Intuitive Exploration of Results

only outdooronly people

CMU Informedia Concept Filter

linear browser restricts inspection flexibility

Informed User:Rapid Exploration of Results

Media Mill Rotor Browser

Revisit the user struggle… Car detector

Car crash detector

Snow detector

Query: {car, snow, car_crash}

How did each

concept influence

the results?

CuZero:Real-Time Multi-Concept Navigation Map

Create a multi-concept gradient map Direct user control: “nearness” = “more

influence” Instant display for each location, without new

“boat”

“sky”

“water”

Achieve Breadth-Depth Flexibilityby Dual Space Navigation (demo)

Breadth: Quick scan of many permutations Depth: Instant exploration of results with fixed

weightsmany query permutationsmany query permutations

Deep exploration of single permutation

execute execute query and query and download download

ranked ranked concept listconcept list

package package results with results with

scoresscores

transmit transmit to clientto client

unpackage unpackage results results

at interfaceat interface

score score images by images by concept concept weights; weights;

guarantee guarantee unique unique

positionspositions

download download images to images to

interface in interface in cached cached modemode

Latency Analysis: Workflow Pipeline

time to execute is disproportional!time to execute is disproportional!

Pipelined processing for low latency

Concept formulation (“car”)

concept formulation (“snow”)

• Overlap (concept formulation) with (map rendering) • Hide rendering latency during user interaction• Course-to-fine concept map planning/rendering• Speed optimization on-going …

Challenge: user frustration in visual search

? Research needed to address

user frustration

sensible search:

(1) query(2) visualize

+(3) analyze

DVMM Lab, Columbia University

Help Users Make Sense of Image Trend

• Many re-used content found

• How did it occur?• What manipulations?• What distribution path?• Correlation with

perspective change?

Query: “John Kennedy”

Manipulation correlated with Perspective

Raising the Flag on Iwo Jima Joe Rosenthal, 1945

Anti-Vietnam War, Ronald and Karen Bowen, 1969

Reused Images Over Time

Question for Sensible Search: Insights from Plain Search Results?

Issue a text queryFind duplicate images, merge into clusters Explore history/trend

Get top 1000 results from web search engine Rank clusters (size?, original rank?)

Duplicate Clusters Reveal Image Provenance

Biggest Clusters Contain Iconic Images

Smallest Clusters Contain Marginal Images

Deeper Analysis of Search Results: Visual Migration Map (VMM)

Duplicate Cluster Visual Migration Map

(Kennedy and Chang, ACM Multimedia 2008)

Visual Migration Map (VMM)

“Most Original” at the root

“Most Divergent” at the leaves

Images Derived through Series of Manipulations

VMM uncovers history of image manipulation and plausible dissemination paths among content

owners and users.

Ground truth VMM is hard to get

• Hypothesis

• Approximation of history is feasible by visual analysis.

• Detect manipulation types between two images

• Derive large scale history among a large image set

Basic Image Manipulation Operators

• Each is observable by inspecting the pair

• Each implies direction (one image derived from other)

• Other possible manipulations: color correction, multiple compression, sharpening, blurring

Original Scaled Cropped Gray Overlay Insertion

Detecting Near-Duplicates

Duplicate detection is very useful and relatively reliable

Remaining challenges: scalability/speed; video duplicates; object (sub-image) (TRECVID08)

Graph Matching [Zhang & Chang, 2004] Matching SIFT points [Lowe, 1999]

Scale Detection

• Draw bounding box around matching points in each image

• Compare heights/widths of each box

• Relative difference in box size can be used to normalize scales

Color Removal

• Simple case: image stored in single channel file

• Other cases: image is grayscale, but stored in 3-channel file

• Expect little difference in values in various channels within pixels

More Challenging:Overlay Detection?

• Given two images, we can observe that a region is different between the two

• But how do we know which is the original?

Cropping or Insertion?

• Can find differences in image area

• But is the smaller-area due to a crop or is the larger area due to an insertion?

CroppingOriginal Insertion

Use Context from Many Duplicates

Normalize Scales and Positions

Get average value for each pixel

“Composite” image

Cropping Detection w/ Context

• In cropping, we expect the content outside the crop area to be consistent with the composite image

Image A Composite A Residue A

Image B Composite B Residue B

Overlay Detection w/ Context

• Comparing images against composite image reveals portions that differ from typical content

• Image with divergent content may have overlay

Insertion Detection w/ Context

• In insertion, we expect the area outside the crop region to be different from the typical content

DVMM Lab, Columbia University 56

Evaluation: Manipulation Detection

• Context-Free detectors have near-perfect performance

• Context-Dependent detectors still have errors

• Consistency checking can further improve the accuracy

• Are these error-prone results sufficient to build manipulation histories?

Context-DependentContext-Free

Inferring Direction from Consistency

Not Plausible

Manipulation Direction from Consistency

Plausible

Derive Manipulation among Multiple Images

Emerging Migration Map

• Individual parent-child relationships give rise to a manipulation history

• Relationships are only plausible (we don’t know for sure)

• Absences of relationships are more concrete (we can be more certain)

• Redundancy: plausible derivations from parents and ancestors of parents

Experiments

• Select 22 iconic images

• Mostly political figures, culled from Google Zeitgeist and TRECVID queries

• Generate manipulation histories:• through manual annotation

• and through fully-automatic mechanisms

Automatic Visual Migration Map“Originals” at source nodes

“Manipulated” at sink nodes

“Originals” at source nodes

“Manipulated” at sink nodes

Evaluation: Automatic Histories

• High agreement with manually-constructed histories

• Detect edits with Precision of 91% and Recall of 71%

Automatically-ConstructedManually-Constructed

Deleted

Inserted

Application: Summarizing Changes

• Analyze manipulation history graph structure to extract most-original and most highly-manipulated images

Application: Finding Perspective

• Survey image type and corresponding perspective across many examples

• Find correlation between high manipulation and negative/critical opinion

Joke Website:“Every time I get stoned, I go and do something stupid!” “Osama Bashed Laden”

http://www.almostaproverb.com/captions2.html

Democratic National Committee Site:“Capture Osama Bin Laden!”

http://www.democrats.org/page/petition/osama

Myspace Profile from Malaysia: “Osama Bin Laden - My Idol of All Time!”

http://www.myspace.com/mamu_potnoi

Daily Excelsior Newspaper:“Further Details of Bin Laden Plot Unearthed: ABC Report.”

http://www.dailyexcelsior.com/00jan31/inter.htm

Application: Finding Perspective

Geographic/Cultural Dispersion

Reverse Profiling

Conclusions• Advocate Focus on Sensible Visual Search

• Address user frustration in interactive keyword search

• In addition to work on content analytics

• Develop utilities for Informed Users

• Demo: CuZero prototype

• Instant query suggestion

• Rapid multi-concept result navigation

Conclusions• Explore Deeper Insight: Visual Migration Map

• Explore image reuse patterns to reveal image provenance

• Approximate image manipulation history from visual content, alone.

• Find “interesting” images at source and sink nodes within the image history

• Strong correlation with view point change

• Useful role in socio-cultural information dissemination (Web 2.0)

References

• CuZero:Eric Zavesky and Shih-Fu Chang, “CuZero: Low-Latency Query Formulation and Result Exploration for Concept-Based Visual Search,” ACM Multimedia Information Retrieval Conference, Oct. 2008, Vancouver, Canada.

• Internet Image Manipulation History:Lyndon Kennedy and Shih-Fu Chang, “Internet Image Archaeology: Automatically Tracing the Manipulation History of Photographs on the Web,” ACM Multimedia Conference, Oct. 2008, Vancouver, Canada

Sensible Visual Search

Documents