Multi-Scale Video CroppingHazem El-Alfy, David Jacobs
and Larry Davis
Department of Computer ScienceUniversity of Maryland, College Park
Sep 25th 2007, ACM MM ’07
2
Modern Surveillance Systems Networks of sur-
veillance cameras. Control Room:
Fewer monitors than cameras.
Far fewer operators than monitors.
Cameras cycle through monitors.
3
Modern Surveillance SystemsTypical Control Rooms: airports, subways, metropolitan areas, seaports, crowd control.
4
“Future” Control Rooms “Continuous” display
wall versus a fixed set of discrete monitors.
Algorithms to control: where to display videos, how much area to
assign to them, how to display them.
Barco Control Room, Vienna, Austria
5
Video Cropping
Munich Airport – Courtesy Siemens, NJ
6
Why Cropping? Resize video to
save bandwidth or to fit display area.
Cropping before resizing to focus operator attention of on important areas.
7
Problem DefinitionDetermine trajec-tories
of cropping windows through the video: variable size window maximize captured
saliency smooth trajectory occasional jumps (cuts)
between trajectories.
x
t
y
8
Problem Definition Each frame t covered by
variable size overlapping windows Wi,t
Saliency measure S(Wi,t) argmaxQ Σt S(Wi,t), over all
window sequences Q Subject to constraints for
smooth window motion and size change.
Wi,t
9
Our Approach: Overview Extract motion energy. Model video as a graph. Find trajectories as shortest paths in graph. Merge trajectories. Repeat for other segments of long videos.
Extract Motion Energy
BuildingGraph
WipingFrames
MergingTrajectories
ShortestPath +
SmoothingVideo
FramesFramesMotion
Trajectories CroppedVideo
10
Extracting Motion Energy Motion energy as a saliency measure. Frame differences are smoothed using
morphological operations.
11
Modeling Graph Nodes: cropping windows in each frame. Add dummy source and target nodes. Edges: allowable window changes (location and
size) between consecutive frames.
dummy source node
dummy target node
w=0
w=0
windows of first frame
windows of last frame
windows of i th frame
12
Modeling Graph Multi-scale energy function for window W:
E(W) = S(W): always favors large windowsE(W) = S(W)/A(W): favors small (dense) windowsE(W) = S(Win)/A(Win) – Sbelt/K Edge weight: wij = 1 – ENorm(Wj)
Win
Sbelt1
4 3
2
13
Modeling Graph Energy function computed
for all windows in all frames. Efficiently computed using
integral images [Viola & Jones ’01]: ii(x,y) = Σx’<x,y’<y i(x’,y’)E(W)=ii(x3)-ii(x2)-ii(x4)+ii(x1)
x4 x3
x1 x2
video frame
cropping windowW
14
Shortest Path Dial’s implementation of
Dijkstra’s algorithm: linear in # graph nodes.
Smoothing: low-pass filter + cubic Hermite interpolation.
15
Merging Trajectories More cropping windows needed to capture simultaneous activity. Wipe captured activity from motion frames and repeat earlier
process on remaining motion. Merge trajectories: find shortest path through a graph of
trajectories.
16
Processing Long Videos Problems:
Graph gets too big if video is long. Latencies must be short in surveillance systems.
Solution: Break long videos into segments with overlap. Process each segment then stitch results together.
breakhere
breakhere
17
Processing Long Videos Issues
How short can segments be? Are there preferable locations to break video? Overlap amount needed for smooth transitions?
We ran many experiments for fixed size crop Shortest path converge quickly. Segments can be
as short as 40 frames. Avoid periods of low activity when breaking video. Overlap intervals of 20 frames are sufficient.
18
ResultsMunich Airport: variable size single window.
19
ResultsMunich Airport: video-in-video display.
20
ResultsTraffic at a stop sign on campus (2 windows).
21
Contributions
Variable size smooth cropping window. Simultaneous multiple cropping windows. Relatively short video segments
processed vs. the entire video (online). Empirically shown identical to processing
the largest video that can be processed as a whole.