Date post: | 13-Nov-2014 |
Category: |
Education |
Upload: | wingztechnologieschennai |
View: | 35 times |
Download: | 2 times |
Finding the Needle in the ImageStack: Performance Metrics for BigData Image Analysis
O n 15 April 2013 at 2:49 p.m. Eastern Stand-
ard Time in the United States city of Bos-
ton, Massachusetts, two pressure cooker
bombs exploded, killing three people and
injuring more than 250. The blasts occurred
near the finish line of the Boston Marathon,
the world’s oldest annual marathon. Almost
immediately following the explosions, the US
Federal Bureau of Investigation (FBI) enlisted
the help of the public—spectators, media, and
public and private closed-circuit surveillance
systems—to help in its investigation. Any indi-
viduals having taken pictures or video during
the race were encouraged to submit it to the
FBI for review. On 18 April 2013, three days
later, the FBI released photographs and video
showing two suspects identified as suspect 1
and suspect 2. The suspects were also referred
to as “black hat” and “white hat” because of
the color of the baseball caps they were wear-
ing in the footage. Although an official
account has yet to be released detailing the
FBI’s operational analysis, computer software
probably played an important role in identify-
ing the individual among the many thousands
of public image and video submissions.
How did the FBI comb through likely tera-
bytes of data and close in on a pair of suspects?
Did they use software to identify and filter out
individuals near the finish line with backpacks
of sufficient size to hold the explosives? Once
the suspects were identified, how did the FBI
use those images to analyze the submissions
from the public to locate additional images or
video frames that contained the suspects?
This article explores these questions to deter-
mine whether or not it is possible to do similar
analysis on a smaller scale. In the interest of
transferring our results to other applications,
we also look at how visual data, which has
become so ubiquitous in all aspects of modern
society, can be used and analyzed on a large
scale.
State of Image AnalysisFueled by the growth in camera-enabled cell
phones and the commoditization of computer
storage such as hard drives and other medium,
the amount of images and video being con-
sumed and stored is enormous. Cisco, a major
networking equipment manufacturer, projects
that consumer Internet video traffic will be 69
percent of all consumer Internet traffic in 2017,
up from 57 percent in 2012.1 This trend is
expected to continue in the foreseeable future
as the adoption and use of the Internet and
mobile devices increases.
Consumers are a significant driver of the
growth in image and video use and storage, but
they are not alone. Private companies and gov-
ernments around the world are relying on it as
an investigative, monitoring, and forensic tool
to search for patterns or characteristics and/or
document and identify individuals. This type of
image analysis is done both in real time as the
image or video is being captured and in a post-
processing investigative setting.
Regardless of the method of capture and
when the media is processed, images and videos
require large amounts of space for storage and
computing power for analysis. Consider that,
with a moderate compression algorithm, an
average minute of video recorded on an iPhone
in 640 � 480 resolution is approximately
40 Mbytes. A metropolitan area with a high pop-
ulation density and presumably large number of
people recording image and video, coupled with
public safety and private closed-circuit television
(CCTV) monitoring systems, the computing
requirements necessary to store and analyze all
the image and video captured in one area in a
short period of time quickly balloons.
Kieran MillerKean University
Patricia MorrealeKean University
Multimedia at Work Wenjun ZengUniversity of Missouri, [email protected]
1070-986X/14/$31.00�c 2014 IEEE Published by the IEEE Computer Society84
Analyzing image and video data from multi-
ple devices and sources is a daunting task, and
it highlights the need for the creation of com-
puter software and algorithms to analyze bits
into meaningful and actionable information
for law enforcement. Without it, combing
through terabytes of data would be an exercise
in futility. The explosives detonated at the Bos-
ton Marathon in 2013 provide an excellent case
study, albeit a chilling one, of the role that
images, video, and analytical software can play
in an investigation. By identifying the perform-
ance measurements involved in working with
and analyzing images on a consumer-grade
machine, it is possible to identify the extent to
which an individual, using open source soft-
ware, can identify or track people in images or
video frames.
Analysis EnvironmentTo begin a performance analysis of video and
image data, performance benchmarks must be
established. Factors that affect system and soft-
ware performance include the operating system
and its version, CPU clock rate, number of
cores, type of hard drive, and if applicable, hard
drive rotations per minute (RPM). Other factors
include the programming language, frame-
work, and language runtime used.
For consistency, a single machine was used
during all of the analysis we describe here. Spe-
cifically, we used a 64-bit Windows 7 Service
Pack 1 operating system using Microsoft’s .NET
Framework runtime 4.0 and the C# program-
ming language.
Benchmarking BackgroundBefore we analyze performance metrics, we
must review two important points.
First, in the programming language and
environment used to evaluate performance, it
is necessary to identify what represents a unit of
time and how elapsed time is calculated. In
Microsoft’s .NET framework, the programming
language used in this research, the smallest unit
of measurement is a tick, expressed as a property
in the System.DateTime class. One tick repre-
sents “one hundred nanoseconds or one ten-
millionth of a second.”2 We measure elapsed
ticks with the Stopwatch class defined in the
System.Diagnostics namespace.3
Second, when conducting performance tests
involving images, it is important that the data-
set of images be as random as possible. The
Internet is ideal for this, in particular the wealth
of images available on the Wikimedia.org fam-
ily of websites. Conveniently, there is a specific
URL that will display a random image from one
of those sites: http://commons.wikimedia.org/
wiki/Special:Random/File.
For the sample images used in this research,
we downloaded images for later analysis using
this URL. This process was repeated until we
retrieved approximately 2,900 images, consum-
ing 2.9 Gbytes of space.
Benchmark Image TestsAs part of this research, it was important to
establish a set of tests to provide benchmarks to
determine how much time a machine takes to
perform simple tasks as well as the time needed
for image processing.
The analysis began by assessing the perform-
ance of a basic programming construct: the for-
loop. This is particularly relevant to images
because, in a simplified manner, an image can
be thought of as a two dimensional array of pix-
els and a video as a sequence of many images.
A for-loop is concerned with iteration, repeat-
ing a section of code a specified number of times.
To measure performance of the for-loop, the
methodology used is straightforward. With vary-
ing values for the upper bound, how long does it
take to run the for-loop until completion? The
logical use for values of the upper bound in the
computing world is increasingly large powers
of 2. Starting with 21 up to 224 (16.78 million).
Table 1 summarizes the results. The highest
number, 224, took an average of 4.7325 millisec-
onds over 100 runs to execute the loop.
Having established benchmark measures of
performance of the standard for-loop, we next
move on to benchmarks involving images.
With a random sample of images retrieved from
the Internet as a dataset, we were able to estab-
lish a benchmark of how long it took to load an
image into memory. For this test, 1,000 images
were selected from the sample set, picking one
at random, loading it in to memory, and repeat-
ing that step 10 times. The end result is data
measuring the time it takes to load 10,000
images.
With an average image size of 1.49 Mbytes, it
took an average of 167,122.99 ticks or 16.71 ms
to load each image.
Finding Pedestrians in ImagesWith identified and established benchmarks for
image processing in its most basic form, the
objective is to illustrate the extent to which it is
Jan
uary
–March
2014
85
possible to identify or track a pedestrian from a
known sample set of images.
The mathematics involved with detecting
people in an image are complex and beyond the
scope of this article, but it is nevertheless ex-
plored using the Open Source toolkit EMGUCv,
a .NET implementation of the OpenCV project.
OpenCV was developed by Intel in 1999 and
first released at the IEEE Conference on Com-
puter Vision and Pattern Recognition in 2002.
To obtain images for analysis, we recorded
video of an individual wearing two different col-
ored hats at five different locations on the Kean
University campus in Union, New Jersey, at vari-
ous distances and angles. The data was recorded
with a Samsung Galaxy S3 phone. Each section
of video was categorized by its location and
direction and programmatically split into image
frames, which we then analyzed.
OpenCV returns rectangular regions that
represent sections of an image that its algo-
rithms determine may contain pedestrians.
Here we show examples of the analysis of
one of the video frames analyzed at Kean (see
Figure 1) as well as one from a photo of a busy
New York City street scene (see Figure 2) to
show how this looks in practice. Programmatic
rectangular red boxes were drawn on the
images after analysis to illustrate the sections it
located.
Table 2 lists the video samples taken from
five locations across Kean’s campus with an
approximation of distance.
After the video was captured, the images
were split into frames, capturing approximately
1 image per second. This resulted in a total of
517 frames of images across the various loca-
tions. We performed analysis on each and man-
ually evaluated the results using two primary
criteria:
� Was the pedestrian correctly identified in
the image?
� Did the rectangular region identified sur-
round the entire person, or was the person
cut off?
The overall success rate was modest, with
roughly 58 percent of frames correctly identify-
ing the pedestrian, as Figure 3 shows.IEEE
Mu
ltiM
ed
ia
Table 1. Time to execute the for-loop of various iterations.
Number of iterations Average ticks Max ticks Min ticks
2 2.74 273.00 0.00
4 0.06 1.00 0.00
8 0.22 1.00 0.00
16 0.07 1.00 0.00
32 0.24 1.00 0.00
64 0.33 1.00 0.00
128 0.51 1.00 0.00
256 0.86 2.00 0.00
512 1.56 2.00 1.00
1,024 2.95 4.00 2.00
2,048 5.78 9.00 5.00
4,096 11.93 34.00 11.00
8,192 22.73 28.00 22.00
16,384 46.95 93.00 45.00
32,768 91.01 145.00 90.00
65,536 182.51 228.00 180.00
131,072 363.88 422.00 360.00
262,144 742.15 829.00 721.00
524,288 1,456.75 1,533.00 1,443.00
1,048,576 2,923.30 3,146.00 2,886.00
2,097,152 5,941.81 6,345.00 5,773.00
4,194,304 11,839.82 12,634.00 11,546.00
8,388,608 23,618.03 24,562.00 23,142.00
16,777,216 47,325.69 49,108.00 46,385.00
Multimedia at Work
86
Figure 4 displays the findings from the tests
across each of these locations on campus. The
graph shows, for each location and hat color,
the relationship between the total number of
frames and how successful each was in the two
evaluation criteria.
Figure 4 clearly shows that the color of the
hat and the direction the pedestrian was mov-
ing (right to left versus left to right) had little
impact on the relative success rates. Location 2
stands out as performing exceptionally poorly,
which can be attributed in part to the distance
of the pedestrian from the camera at roughly 50
yards. This makes sense because it is presum-
ably more difficult for OpenCV’s algorithms to
detect features that identify a set of pixels as
representing a human when the person is made
up of a smaller number of pixels.
Locations 4 and 5 had the best success rates,
likely due to a combination of two factors: short
distances and the orientation of the pedestrian
from the camera. Locations 1 though 4 con-
sisted of an individual walking left to right or
right to left, perpendicular to the camera so that
they were viewed from the side. Location 4 con-
sisted of the pedestrian walking toward the cam-
era at an angle of approximately 20 degrees,
while location 5 consisted of the pedestrian
walking toward and away from the camera at an
angle of approximately 60 degrees. As a result,
more of the frontal profile of the person was
viewable in locations 4 and 5. It is logical that
the success rates will improve when more of the
person’s defining characteristics are visible
when viewed from the front—two arms and legs
extending from the body. These conclusions are
Jan
uary
–March
2014
(a) (b)
Figure 2. Locating pedestrians on a busy New York City street. (a) Before and (b) after analysis.
(a) (b)
Figure 1. Locating a pedestrian in an image. (a) Before and (b) after analysis. A red box is drawn on the image after analysis to
illustrate the located sections.
87
speculative and further testing and analysis of
the inner workings of OpenCV’s algorithms are
necessary to quantify the impact of the viewing
angle on its ability to identify a pedestrian.
ApplicationsA broad range of industries could apply this
technology and analysis. Here we review exam-
ple scenarios in two industries.
Health Care/Hospital Security
Modern hospitals typically employ radio wrist-
bands or some other sort of radio frequency
device to restrict patients to certain areas and
prevent them from entering unauthorized
areas. This is particularly important in mental
health facilities. Although radio devices are
effective, imaging software could be utilized as
a secondary level of security when radio devices
are lost or stolen. Such a device could be deacti-
vated if it was known to be misplaced, but the
system should also function in the event a
device is unknowingly lost or stolen. For exam-
ple, sensitive areas of the hospital could be
equipped with cameras and software that, in
addition to verifying access through a radio
device, could scan the person and look for an
ID badge that is required to be prominently
displayed.
This assumes that OpenCV would have simi-
lar levels of success identifying people wearing
hospital attire. Without clearly visible legs,
however, the software may be less accurate.
Public Transportation
Public transportation is one of the most practi-
cal uses for this type of technology when
applied to law enforcement and crime/terror-
ism investigation and prevention. Software,
similar to what was discussed here, albeit more
sophisticated, could be used to sift through gig-
abytes of images from CCTV cameras to iden-
tify a suspect with known characteristics.
The technology could also be used in a more
preventive manner, such as looking for people
with backpacks large enough to carry an explo-
sive device. It could also be used to identify
unusual or suspicious patterns. For example, if
the software could hook into the train signaling
system, it would know when and which trainsIEEE
Mu
ltiM
ed
ia
Table 2. Locations of video capture on Kean University’s campus.
Location ID
Direction
of pedestrian Building or location
Approx. distance
(yards)
1 Left to right Outside Vaughn-Eames 15
1 Right to left Outside Vaughn-Eames 15
2 Left to right Between Vaughn-Eames and
Wilkins Theatre
50
3 Left to right Front of Wilkins Theatre
from Bridge
10
3 Right to left Front of Wilkins Theatre
from Bridge
10
4 Left to right Front of Nancy Thompson Library 5–15
5 Left to right Outside Hennings Hall 5–20
5 Right to left Outside Hennings Hall 5–20
300
217
Missed or incorrectly identified
Successfully identified
Figure 3. Overall success rate for all locations.
Roughly 58 percent of frames correctly identified
the pedestrian.
Multimedia at Work
88
arrived and departed on a particular track. If an
individual remains on a platform for a lengthy
period of time after the trains arrived and
departed, the software could flag the suspicious
behavior. This would require the software to be
able to track an individual pedestrian and not
just identify a random person.
ConclusionAt relatively short distances, the results are
impressive. Overall, the software was able to
identify a pedestrian in 300 of the 517 frames.
At location 5, at a distance of between five and
15 yards, the success rate was greater than 90
percent. This is comparable to the short-dis-
tance image captures of closed-circuit cameras
located in and around public spaces in the US
as well as elsewhere in the world.
The results that can be obtained using a con-
sumer grade machine and open source software
are promising, which opens up additional ave-
nues to the other applications. Desktop image
processing could be regularly used by law
enforcement, for instance. With the resources
available to the federal government, with thou-
sands of industry- and research-level computing
machines, the FBI would be able to perform this
type of analysis against terabytes of data. MM
References
1. “Cisco Visual Networking Index: Forecast and Meth-
odology, 2012–2017,” Cisco Systems, 29 May 2013;
www.cisco.com/en/US/solutions/collateral/ns341/
ns525/ns537/ns705/ns827/white paper c11-
481360 ns827 Networking Solutions White Paper.
html.
2. “System.DateTime.Ticks Property,” Microsoft
Developer Network, Framework Version 4.0 Docu-
mentation, http://msdn.microsoft.com/en-us/
library/system.datetime.ticks(v¼vs.100).aspx.
3. “System.Diagnostics.Stopwatch Class,” Microsoft
Developer Network, Framework Version 4.0 Docu-
mentation, http://msdn.microsoft.com/en-us/
library/system.diagnostics.stopwatch(v¼vs.100).
aspx.
Kieran Miller is a research student in the Depart-
ment of Computer Science at Kean University. Con-
tact him at [email protected].
Patricia Morreale is an associate professor in the
Department of Computer Science at Kean University.
Contact her at [email protected].
Jan
uary
–March
2014
70
60
50
40
Num
ber
of fr
ames
30
20
10
0Location 1
(left to right)
Black hat total frames
White hat total frames
Black hat frame success
White hat frame success
Black hat frame success with whole person
White hat frame success with whole person
Location 3(left to right)
Location 1(right to left)
Location 3(right to left)
Location 5(left to right)
Location 5(right to left)
Location 2 Location 4
Figure 4. Frame analysis success rate by location. Location 2 performed exceptionally poorly, likely
because the pedestrian was roughly 50 yards from the camera, which resulted in a smaller number
of pixels.
89