Motivation - University of...

Post on 16-Jul-2020

3 views 0 download

transcript

Motivation

Where is my W-2 Form?

Video-based Tracking

Camera view of the desk

Camera

Overhead video camera

Example

40 minutes, 1024x768 @ 15 fps

System Overview

UserInput Video

System Overview

User

Internal Representation

Video Analysis

Vision Engine

Desk Desk

T T+1

Input Video

Query Interface

System Overview

Query(where is my W-2 form?)

Internal Representation

Input Video User

Video Analysis

Desk Desk

T T+1

System Overview

Query(where is my W-2 form?)

Internal Representation

Input Video User

Video Analysis

Answer

Query Interface

Desk Desk

T T+1

System Overview

User

Internal Representation

Video Analysis

Input Video

Vision Engine

Desk Desk

T T+1

Vision Problem

… …

Vision ProblemEvent

… …

Vision Problem

Event

… …

Desk

Vision Problem

Desk

… …

Event

… …

Desk

Vision Problem

… …tut-article.pdf

sanders01.pdf

objectspaces.pdf kidd94.pdf

lowe04sift.pdf

Event

… …

Desk Desk

Vision Problem

… …

Event

Scene Graph(DAG)

… …

tut-article.pdf

sanders01.pdf

objectspaces.pdf kidd94.pdf

lowe04sift.pdf

Desk Desk

Assumptions

• Simplifying– Corresponding electronic copy exists

Assumptions

• Simplifying– Corresponding electronic copy exists– 3 event types: move/entry/exit– One document at a time– Only topmost document can move– No duplicate copies of same document

Assumptions

• Simplifying– Corresponding electronic copy exists– 3 event types: move/entry/exit– One document at a time– Only topmost document can move– No duplicate copies of same document

• Other– Desk need not be initially empty

Algorithm OverviewInput

Frames… …

Event Detection

Event Interpretation

Document Recognition

Scene Graph Update

Algorithm OverviewInput

Frames… …

Event Detection

Event Interpretation

Document Recognition

before after

Scene Graph Update

Algorithm OverviewInput

Frames… …

Event Detection

Event Interpretation

“A document moved from (x1,y1) to (x2,y2)”

Document Recognition

before after

Scene Graph Update

Algorithm OverviewInput

Frames… …

Event Detection

Event Interpretation

“A document moved from (x1,y1) to (x2,y2)”

Document Recognition

before after

File1.pdf

File2.pdf

File3.pdf

Scene Graph Update

Algorithm OverviewInput

Frames… …

Event Detection

Event Interpretation

“A document moved from (x1,y1) to (x2,y2)”

Document Recognition

before after

File1.pdf

File2.pdf

File3.pdf

Scene Graph Update

Algorithm OverviewInput

Frames… …

Event Detection

Event Interpretation

“A document moved from (x1,y1) to (x2,y2)”

Document Recognition

before after

File1.pdf

File2.pdf

File3.pdf

Scene Graph Update

Event Detection

… …

Event Detection

time

Frame Difference

… …

Event Detection

time

Threshold

Event Frames

time

… …

Frame Difference

Event Detection

before after

Event Frames

… …

Algorithm OverviewInput

Frames… …

Event Detection

Event Interpretation

“A document moved from (x1,y1) to (x2,y2)”

Document Recognition

before after

File1.pdf

File2.pdf

File3.pdf

Scene Graph Update

Event Interpretation

Move

before after

Event Interpretation

Move

Entry

before after

Event Interpretation

Move

Entry

Exit

before after

Event Interpretation

Move

Entry

Exit

Motion: (x,y,θ)

before after

Event Interpretation

Move

Entry

Exit

1. Move vs. Entry/Exit

before after

Event Interpretation

Move

Entry

Exit

2. Entry vs. Exit

before after

Event Interpretation

• Use SIFT [Lowe 99]

– Scale Invariant Feature Transform– Distinctive feature descriptor– Reliable object recognition

Move vs. Entry/Exit

before after

Move vs. Entry/Exit

before after

Move vs. Entry/Exit

before after

Move vs. Entry/Exit

before after

Move vs. Entry/Exit

before after

Move vs. Entry/Exit

before after

Move vs. Entry/Exit

before after

Move vs. Entry/Exit

before after

Move vs. Entry/Exit

before after

Move vs. Entry/Exit

before after

Move vs. Entry/Exit

before after

Move vs. Entry/Exit

before after

Motion: (x,y,θ)

Entry vs. Exit

before after

Example 1 (entry)

Entry vs. Exit

before after

Example 1 (entry)

Entry vs. Exit

before after

Example 1 (entry)

Entry vs. Exit

before after

File1.pdf File2.pdf File3.pdf File4.pdf File5.pdf File6.pdf

Entry vs. Exit

before after

Example 2 (entry)

Entry vs. Exit

before after

File1.pdf File2.pdf File3.pdf File4.pdf File5.pdf File6.pdf

Entry vs. Exit

before after

Entry vs. Exit

before after

?

Entry vs. Exit

before after

?

Entry vs. Exit

before after

……

Entry vs. Exit

before after

……

Amount of change

time

Algorithm OverviewInput

Frames… …

Event Detection

Event Interpretation

“A document moved from (x1,y1) to (x2,y2)”

Document Recognition

before after

File1.pdf

File2.pdf

File3.pdf

Scene Graph Update

Document Recognition

File1.pdf File2.pdf File3.pdf File4.pdf File5.pdf File6.pdf

• Match against PDF image database

Document Recognition

File1.pdf File2.pdf File3.pdf File4.pdf File5.pdf File6.pdf

• Match against PDF image database

• Performance– Can differentiate between ~100 (or more) documents– ~200x300 pixels per document for reliable match

Algorithm OverviewInput

Frames… …

Event Detection

Event Interpretation

“A document moved from (x1,y1) to (x2,y2)”

Document Recognition

before after

File1.pdf

File2.pdf

File3.pdf

Scene Graph Update

Scene Graph Update

before after

Motion: (x,y,θ)

Desk

Scene Graph Update

before after

Motion: (x,y,θ)

Desk

Scene Graph Update

before after

Motion: (x,y,θ)

Desk Desk

Scene Graph Update

before after

Motion: (x,y,θ)

Desk

?

Scene Graph Update

before after

Motion: (x,y,θ)

Desk Desk

Scene Graph Update

Desk DeskDeskDesk

Scene Graph Update

Desk DeskDeskDesk

Photo Sorting Example

Current Directions

• Handle more realistic desktops• Speed up processing time• Other useful functionalities

– Written annotation– Version management– Bookmark– Multi-user queries

For More Information

• Our publications– Jiwon Kim, Steven M. Seitz, Maneesh Agrawala. The Office of

the Past. IEEE Workshop on Real-time Vision for HCI, 2004.– Jiwon Kim, Steven M. Seitz, Maneesh Agrawala. Video-based

Document Tracking: Unifying Your Physical and Electronic Desktops. To appear in Proceedings of UIST, 2004.

• Other related work– David G. Lowe. Distinctive image features from scale invariant

keypoints. International Journal of Computer Vision, 2004.– Pierre Wellner. Interacting with paper on the DigitalDesk.

Communications of the ACM, 36(7):86.97,1993.– D. Rus and P. deSantis. The self-organizing desk. In

Proceedings of International Joint Conference on Artificial Intelligence, 1997.

For More Information

• Our publications– Jiwon Kim, Steven M. Seitz, Maneesh Agrawala. The Office of

the Past. IEEE Workshop on Real-time Vision for HCI, 2004.– Jiwon Kim, Steven M. Seitz, Maneesh Agrawala. Video-based

Document Tracking: Unifying Your Physical and Electronic Desktops. To appear in Proceedings of UIST, 2004.

• Other related work– David G. Lowe. Object recognition from local scale-invariant

features. International Conference on Computer Vision, 1999.– Pierre Wellner. Interacting with paper on the DigitalDesk.

Communications of the ACM, 36(7):86.97,1993.– D. Rus and P. deSantis. The self-organizing desk. In

Proceedings of International Joint Conference on Artificial Intelligence, 1997.