Post on 03-Nov-2014
description
transcript
Key Frame Extraction On MPEG by using Threshold Algorithm
CHAPTER 1
INTRODUCTION
1.1 LITERATURE REVIEW
Recent years have witnessed an enormous increase in video data on the
internet. This rapid increase demands efficient techniques for management and
storage of video data. Video summarization is one of the commonly used mechanisms
to build an efficient video archiving system. The video summarization methods
generate summaries of the videos which are the sequences of stationary or moving
images (Money and Agius, 2008). Key frame extraction is a widely used method for
video summarization. The key frames are the characteristic frames of the video which
render limited, but meaningful information about the contents of the video (Li et al.,
2001).
The researchers have attempted to exploit various features for the extraction
of key frames in videos. These features have been utilized in a variety of different
ways. Some of the low level features which are commonly used include color
histogram, frame correlation, motion information and edge histogram etc. (Jiang et al.,
2009). Zhang et al. (1997) used the color histogram difference between the current
frame and the last extracted key frame to draw out key frames from the video. Gunsel
and Tekalp (1998) compared the histogram of current frame with the average color
histograms of the previous frames to compute the discontinuity value.
A thorough survey of existing techniques reveals that the researchers have
used many different visual features for the problem of key frame extraction. In our
project we dealt with the frame difference measures such as color histogram, frame
correlation and edge orientation histogram for the extraction of key frame.
1.2 OVERVIEW OF PROJECT
Efficient key frame extraction enables efficient cataloguing and retrieval with
large video collections. Video is rich in content and it results in a tremendous amount
of data to process. This can be made easier by only processing some frames, such as
the key frames of video. In general, a key frame extraction technique must be fully
automated in nature and must use the contents of the video to generate summary.
Department. of ECE, MRITS 1
Key Frame Extraction On MPEG by using Threshold Algorithm
Theoretically, key frames must be extracted using high level features such as
objects, actions and events. However, the key frame extraction based on high level
features is mostly specific to certain applications and usually low level features have
been employed. Some of the examples of low features that are commonly used are
colour histogram, correlation, moments, edges and motion features. These low level
features can then be employed to derive high level features to generate domain
specific applications.
A common methodology is to compare consecutive frames based on some
low level Frame Difference Measures (FDMs) and extract a key frame if this
difference satisfies a certain threshold value. The low level features used in our
project are
(1) Colour histogram
(2) Frame correlation
(3) Edge orientation histogram
The basic block diagram of Elicitation of key frames in sports video based on
multiple frame difference features is shown in Fig.1.1. It consists of Extraction of
frames, color histogram, correlation, and edge orientation histogram and threshold
logic modules. Extraction of frames module extract all the frames from the given
input video and the keyframes are identified based on color histogram, correlation and
edge orientation histogram methods by making use of threshold logic. In our work,
the results from these three methods are compared for sample video (Foot Ball),
Cricket video, Hockey video and Foot Ball Video.
Colour Histogram for the frames is calculated in HSV color space. HSV
stands for Hue, Saturation and Value. HSV colour model is based on how colors
appear to a human observer. From the colour histograms of these three channels
between the frames, colour histogram difference measure is calculated. This measure
lies between -64 to 0.
Frame correlation is done by using Pearson’s Distance. Pearson’s Distance is
defined as one minus Pearson’s correlation coefficient. Pearson's correlation
coefficient between two variables is defined as the covariance of the two variables
Department. of ECE, MRITS 2
Key Frame Extraction On MPEG by using Threshold Algorithm
divided by the product of their standard deviations. The Pearson correlation
coefficient falls between [-1, 1] and the Pearson distance lies in [0, 1].
The third measure used for computing is the histogram of edge orientation.
Edge orientation histogram is done by sobel operator. The edges are first computed
using horizontal and vertical sobel operators which are then used to find gradient and
angle of edges. The angles are then used to build a histogram of edge orientation. The
range of values for edge orientation measure is 0 to 82.
Department. of ECE, MRITS 3
Thr
esho
ld
logi
c
Thr
esho
ld
logi
cK
ey f
ram
es
Key
fra
mes
Thr
esho
ld
logi
c
Edg
e
orie
ntat
ion
hist
ogra
m
Inpu
t
Vid
eo
Cor
rela
tion
Col
our
hist
ogra
m
Ext
ract
ion
of
all f
ram
es in
the
vide
o
Fea
ture
Ext
ract
ion
Key
fra
mes
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig 1.1: Basic Block DiagramInput Video
Cricket, football and hockey are taken as input videos to this work. The video
can be in format of .avi, .flv, .mov, .mp4, .mpg, .rm etc. To process this video, frames
have to be extracted. The following is a brief explanation of the different video file
formats found commonly
AVI (.avi)
The AVI (Audio Video Interleave) format was developed by Microsoft. The
AVI format is supported by all computers running Windows, and by all the most
popular web browsers.
MWV (.mwv)
The Windows Media format is developed by Microsoft. Windows Media is a
common format on the Internet, but Windows Media movies cannot be played on
non-Windows computer without an extra (free) component installed. Some later
Windows Media movies cannot play at all on non-Windows computers because no
player is available.
MPEG (.mpg/.mpeg)
The MPEG (Moving Pictures Expert Group) format is the most popular format
on the Internet. It is cross-platform, and supported by all the most popular web
browsers.
QuickTime (.mov)
The QuickTime format is developed by Apple. QuickTime is a common
format on the Internet, but QuickTime movies cannot be played on a Windows
computer without an extra (free) component installed.
Flash (.flv/.swf)
Department. of ECE, MRITS 4
Key Frame Extraction On MPEG by using Threshold Algorithm
The Flash (Shockwave) format was developed by Macromedia. The
Shockwave format requires an extra component to play. But this component comes
preinstalled with web browsers like Firefox and Internet Explorer.
3GP(.3gp)
The 3gp format is both an audio and video format that was designed as a
multimedia format for transmitting audio and video files between 3G cell phones and
the internet. It is a 3G streaming video format, mainly used to meet the high
transmission speed of 3G networks and is currently the most common type of mobile
phone video format.
Realmedia(.rm)
Real Media is a format which was created my Real Networks. It contains
both audio and video data and typically used for streaming media files over the
internet. Real media can play on a wide variety of media players for both Mac and
Windows platforms. The real player is the most compatible.
Mpeg-4(.mp4)
Mpeg-4 is the new format for the internet. In fact, You Tube recommends
using MP4. You Tube accepts multiple formats, and then converts them all to .flv
or .mp4 for distribution. More and more online video publishers are moving to MP4
as the internet sharing format for both Flash players and HTML5.
Advances Streaming Format (.asf)
ASF is a subset of the mwv format and was developed by Microsoft. It is
intended for streaming and is used to support playback from digital media and HTTP
servers, and to support storage devices such as hard disks. It can be compressed using
a variety of video codes. The most common files types that are contained within an
ASF file are Windows Media Audio, and Windows Media video
Frame Extraction
The video taken as input is divided into frames in this section. To do this task
we have used mmreader and extracted frames. The input to mmreader can be any of
the above mentioned formats.
Department. of ECE, MRITS 5
Key Frame Extraction On MPEG by using Threshold Algorithm
Feature extraction
The features of the extracted key frames can be colour, edge, motion or textual
features. The low level features such as colour histogram, frame correlation and edge
histogram are obtained using certain frame difference measures. Then the frame
difference values are calculated for all extracted frames for different videos.
Key frame extraction
To start the extraction process, the first frame is declared as a key frame. Then the
frame difference is computed between the current frame and the last extracted key
frame. If the frame difference satisfies a certain threshold condition, then the current
frame is selected as key frame. This process is repeated for all frames in the video.
1.3 ORGANISATION OF THESIS
In view of the proposed research work, explanation of theoretical aspects used
in this work is presented as per the sequence described below.
Chapter 2 Describes the need for key frames and introduction to frame difference
measures for the extraction of key frames.
Chapter 3 Deals with the basic colour models and different frame difference
measures for the extraction of key frames based on colour histogram.
Chapter 4 Explains different correlation coefficients for the extraction of key
frames.
Chapter 5 Describes fundamentals of edge detection and different edge detection
operators for the extraction of key frames.
Chapter 6 Deals with results and discussions.
Chapter 7 has conclusions and future scope.
CHAPTER 2
Department. of ECE, MRITS 6
Key Frame Extraction On MPEG by using Threshold Algorithm
VIDEO COMPACTION USING KEY FRAME EXTRACTION
2.1 DEFINITION
Key frame is the frame which can represent the salient content and distinct
information as compared to the previous frame. Key frame extraction is a widely used
method for video summarization that is the Key frames extracted will summarize the
characteristics of the video. Video summarization is a method to generate succinct
version of a video by eliminating the redundant frames. The method for video
summarization is shown in Fig 2.1. The effective way of generating key frames is
shown in Fig.2.2.
.
Fig. 2.1: Scheme For Video Summary
Fig. 2.2: The Basic Framework Of The Key Frame Extraction Algorithm
Department. of ECE, MRITS 7
Frame sequence
Key frame extraction
Video stream
Key Frame Extraction On MPEG by using Threshold Algorithm
2.2 NEED FOR KEY FRAMES
Key frame extraction is an essential part in video analysis and management,
providing a suitable video summarization for video indexing, browsing and retrieval.
General video is rich in content and consists of 24 frames per second. Hence a one
hour video would contain around 24x60x60 frames. Most of these frames contain
redundant information and thus key frame extraction is essential. Thus, the use of key
frames reduces the amount of data required in video indexing and provides the
framework for dealing with the video content. A basic rule of key frame extraction is
that key frame extraction would rather be wrong than not enough. So it is necessary to
discard the frames with repetitive or redundant information during the extraction.
To extract valid information from video, process video data efficiently, and
reduce the transfer stress of network, more and more attention is being paid to the
video processing technology. The amount of data in video processing is significantly
reduced by using video segmentation and key-frame extraction. To reduce the transfer
stress in network and invalid information transmission, the transmission, storage and
management techniques of video information become more and more important.
2.3 EXTRACTION OF KEY FRAMES USING FRAME
DIFFERENCE MEASURES (FDMs)
2.3.1 INTRODUCTION TO FDMs
A common methodology for extraction of key frames is to compare
consecutive frames based on some low level Frame Difference Measures (FDMs).
The frame difference is measured and if this difference exceeds a certain threshold,
then that frame is selected as a key frame otherwise discard the frame. Some of the
low level features which are commonly used for the extraction purpose include colour
histogram, frame correlation, motion information and edge histogram etc.
2.3.2 KEY FRAME EXTRACTION
To start the extraction process, the first frame is declared as a key frame.
Instead of computing one histogram for the entire image, we divide the image shown
in Fig 2.3(a) into total of Ts sections each of size m*m, as shown in fig 2.3(b). This is
Department. of ECE, MRITS 8
Key Frame Extraction On MPEG by using Threshold Algorithm
to effectively measure the level of difference between the two frames. Then the frame
difference is computed between the current frame and the last extracted key frame.
This frame difference is computed by using colour histogram, correlation, edge
orientation histogram. Then the obtained frame difference is compared with certain
threshold, if the difference satisfies with the threshold condition then the current
frame is selected as a key frame. By continuously repeating the procedure for all
frames we can extract the key frames.
Fig 2.3(a): Original image Fig 2.3(b): Division of image in to sections
Department. of ECE, MRITS 9
Key Frame Extraction On MPEG by using Threshold Algorithm
CHAPTER 3
COLOUR HISTOGRAM DIFFERENCE
3.1 INTRODUCTION
The colour histograms have been commonly used for key frame extraction in
frame difference based techniques. This is because the colour is one of the most
important visual features to describe an image. Colour histograms are easy to compute
and are robust in case of small camera motions. An image histogram is a type
of histogram that acts as a graphical representation of the tonal distribution in
an image. It plots the number of pixels for each tonal value. By looking at the
histogram for a specific image a viewer will be able to judge the entire tonal
distribution at a glance.
The horizontal axis of the graph represents the tonal variations, while the vertical
axis represents the number of pixels in that particular tone. The left side of the
horizontal axis represents the black and dark areas, the middle represents medium
grey and the right hand side represents light and pure white areas. The vertical axis
represents the size of the area that is captured in each one of these zones. Thus, the
histogram for a very dark image will have the majority of its data points on the left
side and centre of the graph. Conversely, the histogram for a very bright image with
few dark areas and/or shadows will have most of its data points on the right side and
centre of the graph. The representation of color histogram is shown in the Fig.3.1.
Fig 3.1: Colour Histogram representation of image
Department. of ECE, MRITS 10
Key Frame Extraction On MPEG by using Threshold Algorithm
3.2COLOR IMAGE PROCESSING
3.2.1 Color fundamentals
Basically, the colors the humans and some other animals perceive in an object
are determined by the nature of the light reflected from the object. Visible light is
composed of relatively narrow band of frequencies in the electromagnetic spectrum.
A body that reflects light that is balanced in all visible wavelengths appears
white to the observer. However, a body that favors reflectance in a limited range of
the visible spectrum exhibits some shades of colors. For example, Green objects
reflect light with wavelengths primarily in the 500-570nm range while observing most
of the energy at other wavelengths
Characterization of light is central to the science of color. If the light is
achromatic (void of color), It’s only attribute is its intensity. Achromatic light is what
viewers see on a black and white television set and it has been an implicit component
of discussion of Image processing thus far. The term gray level refers to a scalar
measure of intensity that ranges from black to grays and finally to white.
Chromatic light spans the electromagnetic spectrum from approximately 400 -
700nm.three basic quantities are used to describe the quality of a chromatic light
source: radiance, luminance and brightness. Radiance is total amount of energy that
flows from light source and is usually measure is Watts (W). Luminance measured in
lumens (ln), gives a measure of amount of energy an observer perceives from light
source. For example, light emitted from a source operating in the far infrared region
of the spectrum could have significant energy (radiance), but an observer would
hardly perceive it. Its luminance would be almost zero. Finally, brightness is a
subjective descriptor that is practically impossible to measure. It embodies the
achromatic notion of intensity and is one of the key factors in describing color
sensation.
Department. of ECE, MRITS 11
Key Frame Extraction On MPEG by using Threshold Algorithm
3.2.2 Primary colors
Cones are the sensors in the eye responsible for color vision. Cones in the
human eye can be divided in to thee principle sensing categories corresponding
roughly to red green and blue. Approximately 65% of all cones are sensitive to red
light.33% are sensitive to green light and only 2% are sensitive to blue (but the blue
cones are the most sensitive).Due to these absorption characteristics of human eye,
colors are seen as variable combinations of the so called primary colors Red (R),
Green (G), Blue (B).The wavelength values to the three primary colors:
Blue=435.8nm, Green=546.1nm and red=700nm
The primary colors can be added to produce the secondary colors of light
magenta(red plus blue),cyan(green plus blue) and yellow(red plus green)Mixing these
primary or secondary with its opposite primary color, in the right intensity produces
white light. Differentiating between primary colors of light and the primary colors of
pigments or solourants is important. In the later, a primary color is defined as one that
subtracts or absorbs the primary color of light and reflects or transmits the other two.
Therefore, the primary colors of pigments are magenta, cyan and yellow and the
secondary colors red, green and blue. A proper combination of the three pigment
primaries or a secondary with its opposite primary produces black.
3.2.3 Hue and saturation
The characteristics generally used to distinguish one color from another are
brightness, hue and saturation. Brightness embodies the chromatic notion of intensity.
Hue represents dominant color as perceived by an observer. Saturation refers to
relative purity or the amount of white light mixed with hue. The pure spectrum color
is fully saturated. Colors sic as pink (red and white) and lavender (violet and white)
are less saturated, with the degree of saturation being inversely proportional to the
amount of white light added. Hue and saturation taken together are called
chromaticity and therefore a color may be characterized by its brightness and
chromaticity.
Department. of ECE, MRITS 12
Key Frame Extraction On MPEG by using Threshold Algorithm
3.2.4 Importance of color image processing
The use of color in image processing is motivated by two principle factors
(1) First, color is a powerful descriptor that often simplifies object identification
and extraction from a scene.
(2) Second, humans can discern thousands of color shades ad intensities compared
to about every two dozen shades of gray. This second factor is particularly
important in manual. (i.e., when performed by human) image analysis.
3.3 COLOR MODELS
3.3.1 Introduction to Color models
The purpose of color model is to facilitate the specification of color in some
standards, generally accepted way in essence; a color model is the specification of a
coordinate system and a subspace within that system where each color is represented
by a single point.
Most color models in use today are oriented either towards hardware or
towards application where color manipulation is o goal .In terms of digital image
processing, the hardware oriented models most commonly used in practice are the
RGB(red, green, blue) model for color monitors and a broad class of color video
cameras: the CMY(cyan magenta and yellow)and CMYK(cyan, magenta ,yellow and
black)models for color primitive; and the HSI (hue saturation and intensity)models,
which corresponds closely with the way humans describe and interpret colors. The
HIS model also has the advantage that it decouples the color and gray-scale
information in an image making it suitable for many of gray-scale techniques
developed. There are numerous color models in use today due to the fact that color
science is a broad field that encompasses many areas of applications.
3.3.2 RGB color model
The RGB color model is an additive color model in which red, green and blue
light are added together in various ways to reproduce a broad array of colors. The
name of model comes from the initials of the three additive primary colors red, green
and blue.
Department. of ECE, MRITS 13
Key Frame Extraction On MPEG by using Threshold Algorithm
The main purpose f the RGB color model is for the sensing, representation
and display of images in electronic systems such as televisions and computers, though
it has also been used in conventional photography. Before the electronic age, the RGB
color model already had a solid theory behind it based on human perception of colors.
Typical RGB input devices are color TV and video cameras, image scanners
and digital cameras. Typical RGB output devices are TV sets of various technologies
(CRT, LCD, and Plasma etc), computer and mobile phone displays and video
projectors, multicolor LED displays and large screens as jumbotron etc. Color printers
on the other hand are not RGB devices but subtractive color devices (typically CMYK
color models).
Fig. 3.2: RGB Colour Model
Fig.3.2 shows the RGB colour mode. To form a color with RGB, three colored light
beams (one red, one green and one blue) must be superimposed (for e.g., by emission
from a black screen, or by reflection from a white screen).Each of the three beams is
called a component of that color, and each of them can have an arbitrary intensity
from fully off to fully on in the mixture.
3.3.2.1 Representation of RGB
We can represent the RGB model by using a unit cube. Each point in the cube
(or vector where the other point is the origin) represents a specific color. This model
is the best for setting the electron guns for a CRT.Note that for the complimentary
colures the sum of the values equals white light (1,1,1).For example:
Red(1,0,1)+cyan(0,1,1)=white(1,1,1)
Department. of ECE, MRITS 14
Key Frame Extraction On MPEG by using Threshold Algorithm
Green(0,1,0)+magenta(1,0,1)=white(1,1,1)
Blue(0,0,1)+yellow(1,1,0)=white(1,1,1)
Fig. 3.3 Cartesian coordinates (3D)
MATLAB code for extraction of a particular component:
R=RGB (:,:, 1) //extracting red component
G=RGB (:,:,2) //extracting green component
B=RGB(:,:,3) //extracting blue component
3.3.3 HSV color model
The characteristics generally used to distinguish one color from another are
brightness hue and saturation. Brightness embodies the chromatic motion of intensity.
(1) Hue represents the dominant wavelength of the light wave. Thus, when we
call an object red, orange or yellow, we are specifying its hue.
(2) Saturation refers to the relative purity or the amount of white light mixed
with the hue. The pure spectrum colors are fully saturated.
The HSV (Hue saturation and value) color model is more intuitive than the RGB
color model. The user specifies a color (hue) and then adds white or black. There are
Department. of ECE, MRITS 15
Key Frame Extraction On MPEG by using Threshold Algorithm
three color parameters: Hue, Saturation and value. Change in the saturation parameter
corresponds to adding or subtracting whiter and changing the value parameter
corresponds to adding or subtracting black. The HSV model is shown in Fig.3.4.
Fig. 3.4: HSV Color Model
HSV improves on the color cube representation of RGB by arranging colors of
each hue in a radial slice around a simple axis of neutral colors which ranges from
black at the bottom to white at the top. The fully saturated colors of each hue lie in a
circle, a color wheel.
Matlab code for extraction of a particular component:
H=HSV (:,:,1); //extracting hue component
S=HSV (:,:,2); //extracting saturation component
V=HSV (:,:,3); //extracting value
Conversion from RGB to HSV:
Let r, g, b [0, 1] be the red, green and blue coordinates respectively, of a color in
RGB space.
Let max be the greatest of r, g and b and min the least.
To find the hue angle h [0,360], compute:
Department. of ECE, MRITS 16
Key Frame Extraction On MPEG by using Threshold Algorithm
0, if max=min
H= (60*((g-b)/ (max-min)) +360), if max=r
(60*((b-r)/ (max-min) +120), if max=g
(60*((r-g)/ (max-min) +240), if max=b
The values for s and v of an HSV color are defined as follows:
0, if max=0
S= ((max-min)/min) =1-(min/max), otherwise
3.3.4 CMYK Color model
It is possible to achieve a large range of colors seen by humans combining cyan,
magenta and yellow transparent dyes/inks on white substrate. These are the
subtractive primary colors. Often a fourth black is added to improve reproduction of
some dark colors. This is called “CMY” or “CMYK”colour space. The cyan ink will
reflect all but the red light, the yellow ink will reflect all but the blue light and the
magenta ink will reflect all but the green light. This is because cyan light is an equal
mixture of green and blue, yellow is a mixture of red, green and magenta light is an
equal mixture of red and blue.
Cyan=green+blue, so light reflected from a cyan pigment has no red component
i.e., the red is absorbed by cyan. Similarly magenta subtracts green and yellow
subtracts blue. Printers usually use four colors: cyan, yellow, magenta and black. This
is because cyan, yellow and magenta together produce a dark gray rather than a true
black. The conversion between the RGB and CMY is easily computed as below:
C=1-R; M=1-G; Y=1-B
R=1-C; G=1-M; B=1-Y
Department. of ECE, MRITS 17
Key Frame Extraction On MPEG by using Threshold Algorithm
3.3.5 YIQ Color Model
This model was designed to separate chrominance from luminance. This was a
requirement in the early days of color television when black and white sets were
expected to pick up and display what were originally color pictures .The Y-channel
contains luminance information (sufficient for black and television sets) while the I
and Q channels (in-phase and in quadrature) carried the color information .A color
television set would take these three channels Y, I and Q and the information back to
R, G and B levels on a display on a screen.
3.3.6 HIS color model
In this color model, as in YIQ model, luminance or intensity (I) is decoupled
from the color information which is described by a Hue channel and Saturation
channel .Hue and saturation of colors respond closely to the way humans perceive
color and thus this model is suited for interactive manipulation of color images where
changes occur for each variable shift that corresponds to what the operator expects.
3.3.7 L*a*b* Colour Space
The L*a*b* (Brightness, red-green and yellow blue content) system gives
quantitative expression to the Munsell system of colour classification L*a*b* colour
space is best according to perceptual similarity. It is not dependent on any particular
device. Colours can be set as them are perceived when operating a repro system. In
the analysis L*a*b* is divided into 7 L* levels, 5 a* levels and 5 b* levels. The
problems with L*a*b* colour space is quantization. From Fig. 2 can be seen, that on
each edge the quantization should be coarser, because the volume should be the same
for each subspace. In our tests the volume is smaller for values near the edges.
3.4 COLOUR HISTOGRAM DISCRIMINATION
There are several distance formulas for measuring the similarity of colour
histograms. In general, the techniques for comparing probability distributions, such as
the kolmogoroff-smirnov test are not appropriate for colour histograms. This is
because visual perception determines similarity rather than closeness of the
probability distributions. Essentially, the colour distance formulas arrive at a measure
of similarity between images based on the perception of colour content. Three
Department. of ECE, MRITS 18
Key Frame Extraction On MPEG by using Threshold Algorithm
distance formulas that have been used for image retrieval including histogram
Euclidean distance, histogram intersection and histogram quadratic (cross) distance.
3.4.1 Histogram Euclidean distance
Let h and g represent two colour histograms. The Euclidean distance between
the colour histograms h and g can be computed as:
In this distance formula, there is only comparison between the identical bins in
the respective histograms. Two different bins may represent perceptually similar
colours but are not compared crosswise. All bins contribute equally to the distance.
3.4.2 Histogram quadratic (cross) distance
The colour histogram quadratic distance was used by the QBIC system
introduced in the cross distance formula is given by:
The cross distance formula considers the cross-correlation between histogram
bins based on the perceptual similarity of the colours represented by the bins. And the
set of all cross-correlation values are represented by a matrix A, which is called a
similarity matrix. And a (i, j) the element in the similarity matrix A is given by: for
RGB space,
Where dij is the L2 distance between the colour i and j in the RGB space. In the case
that quantization of the colour space is not perceptually uniform the cross term
contributes to the perceptual distance between colour bins.
For HSV space it is given in by:
Department. of ECE, MRITS 19
Key Frame Extraction On MPEG by using Threshold Algorithm
3.4.3 Histogram intersection distance
The colour histogram intersection was proposed for colour image retrieval in the
intersection of histograms h and g is given by:
Where |h| and |g| gives the magnitude of each histogram, which is equal to the
number of samples. Colours not present in the user's query image do not contribute to
the intersection distance. This reduces the contribution of background colours. The
sum is normalized by the histogram with fewest samples.
3.5 FORMULATION:
For computing FDM, colour histogram has been built in HSV colour space by
performing a quantization step to reduce the number of distinct colours to 64. Instead
of computing one histogram for the entire image, we divided image in a total of ‘Ts’
sections, each of size mxm. This is to effectively measure the level of difference
between two frames. Each corresponding section of one frame is compared with the
corresponding section of other frame using the histogram intersection mechanism.
The histogram difference HDi,j,s between two corresponding sections ‘s’ of histogram
His of frame i and histogram Hjs of frame j is defined as:
The histogram difference “HD” between two frames i and j is then calculated by
taking the average of the difference measure between each section.
Department. of ECE, MRITS 20
Key Frame Extraction On MPEG by using Threshold Algorithm
CHAPTER 4
CORRELATION DIFFERENCE HISTOGRAM
4.1 INTRODUCTION
The correlation coefficients have been very popular scheme to find similarity
between two data sets. The correlation coefficients are invariant to brightness .The
cross correlation is used to determine the degree of similarity between two similar
images, or, with the addition of a linear offset to one of the images, the spatial shift or
spatial correlation between the images. The degree of similarity between the two
images is determined by correlation coefficient. The correlation coefficient has value
1 if the two images are identical, 0 if they are completely uncorrelated, and –1 if they
are completely anti-correlatedangles in the contrast.
4.2 TYPES OF CORRELATION COEFFICIENTS
4.2.1. Pearson’s Correlation Coefficient (PCC)
The Pearson’s Correlation Coefficient, r, is widely used in statistical analysis,
pattern recognition, and image processing. Applications on the latter include
comparing two images for image registration purposes, object recognition, and
disparity measurement.
For monochrome digital images, the Pearson’s Correlation Coefficient is described by
Where xi is the intensity of the ith pixel in the first image, yi is the intensity of the ith
pixel in the next image.
The correlation coefficient has value 1 if the two images are identical, 0 if they
are completely uncorrelated, and –1 if they are completely anti-correlated, for
example, if one image is the negative of the other. In theory, they would obtain a
value of 1 for r if the object is intact and a value of less than 1 if alteration or
Department. of ECE, MRITS 21
Key Frame Extraction On MPEG by using Threshold Algorithm
movement has occurred. In practice, distortions in the imaging system, pixel noise,
slight variations in the object’s position relative to the camera, and other factors
produce an r value less than 1, even if the object has not been moved or physically
altered in any manner. For security applications, typical r values for two digital
images of the same scene, one recorded immediately after the other using the same
imaging system and illumination, range from 0.95 to 0.98.
Interpretation of Correlation Coefficient (r) is shown in Fig.4.1.The value of
correlation coefficient ‘r’ ranges from -1 to +1.
Case1: If r = +1, then the correlation between the two variables is said to be perfect
and positive.
Case2: If r = -1, then the correlation between the two variables is said to be perfect
and negative.
Case3: If r = 0, then there exists no correlation between the variables.
Fig.4.1: Coefficient(r) of Determination between x and y frames
One of the obvious advantages of Pearson’s correlation coefficient is that it
condenses the comparison of the two dimensional images down to a single vector r. The
most widely recognized disadvantage is that it is computationally intensive.
4.2.2 Point-Biserial
The point-biserial correlation coefficient, referred to as rpb, is a special case of
Pearson in which one variable is quantitative and the other variable is dichotomous
and nominal. The calculations simplify since typically the values 1 (presence) and 0
(absence) are used for the dichotomous variable. This simplification is sometimes
expressed as follows:
rpb = (Y1 - Y0) • sqrt (pq)/y,
Department. of ECE, MRITS 22
Key Frame Extraction On MPEG by using Threshold Algorithm
where Y0 and Y1 are the Y score means for data pairs with an x score of 0 and 1,
respectively, q = 1 - p and p are the proportions of data pairs with x scores of 0 and 1,
respectively, and y is the population standard deviation for the y data. An example
usage might be to determine if one gender accomplished some task significantly better
than the other gender.
4.2.3 Phi Coefficient
If both variables instead are nominal and dichotomous, the Pearson simplifies
even further. First, we need to introduce contingency tables. A contingency table is
two dimensional table containing frequencies by category. For this situation it will be
two by two since each variable can only take on two values, but each dimension will
exceed two when the associated variable is not dichotomous. In addition, column and
row headings and totals are frequently appended so that the contingency table ends up
being n + 2 by m + 2, where n and m are the number of values each variable can take
on.
4.2.4 Biserial Correlation Coefficient
Another measure of association, the biserial correlation coefficient, termed rb, is
similar to the point biserial, but pits quantitative data against ordinal data, but ordinal
data with an underlying continuity but measured discretely as two values
(dichotomous). An example might be test performance vs. anxiety, where anxiety is
designated as either high or low. Presumably, anxiety can take on any value in
between, perhaps beyond, but it may be difficult to measure. We further assume that
anxiety is normally distributed. The formula is very similar to the point-biserial but
yet different; rb = (Y1 - Y0) • (pq/Y) / y, where Y0 and Y1 are the Y score means for data
pairs with an x score of 0 and 1, respectively, q = 1 - p and p are the proportions of
data pairs with x scores of 0 and 1, respectively, and y is the population standard
deviation for the y data, and Y is the height of the standardized normal distribution at
the point z, where P(z'<z)=q and P(z'>z)=p. Since the factor involving p, q, and the
height is always greater than 1, the biserial is always greater than the point-biserial.
4.2.5 Tetrachoric Correlation Coefficient
The tetrachoric correlation coefficient, rtet, is used when both variables are
dichotomous, like the phi, but we need also to be able to assume both variables really
are continuous and normally distributed. Thus it is applied to ordinal vs. ordinal data
Department. of ECE, MRITS 23
Key Frame Extraction On MPEG by using Threshold Algorithm
which has this characteristic. Ranks are discrete so in this manner it differs from the
Spearman. The formula involves a trigonometric function called cosine. The cosine
function, in its simplest form, is the ratio of two side lengths in a right triangle,
specifically, the side adjacent to the reference angle divided by the length of the
hypotenuse. The formula is:
rtet = cos (180/ (1 + sqrt (BC/AD)).
4.2.6 Rank-Biserial Correlation Coefficient
The rank-biserial correlation coefficient, rrb, is used for dichotomous nominal
data vs. rankings (ordinal). The formula is usually expressed as rrb = 2 •(Y1 - Y0)/n,
where n is the number of data pairs, and Y0 and Y1, again, are the Y score means for
data pairs with an x score of 0 and 1, respectively. These Y scores are ranks. This
formula assumes no tied ranks are present. This may be the same as a Somer's D
statistic for which an online calculator is available.
4.3 FORMULATION
For computing correlation measure, we divide frames into Ts sections of size
mxm. The correlation values of each section are then averaged. The correlation is
measurement for three colour channel values red, green and blue. The correlation
difference CDp,q,s,c of a colour channel ‘c’ between two corresponding sections ‘s’
of frame p and q is defined as
Where s=1………Ts; c=red, green, blue; fic=mean value of channel c for the
frame i ; fj,c=mean value of channel c for the frame j.
The correlations of all sections of frame i and j are averaged to obtain the overall
correlation CDi, j, c for a colour channel.
Department. of ECE, MRITS 24
Key Frame Extraction On MPEG by using Threshold Algorithm
Then, the overall correlation difference measure CDi, j between frames i and j is
obtained by averaging the value of each colour channel.
Department. of ECE, MRITS 25
Key Frame Extraction On MPEG by using Threshold Algorithm
CHAPTER 5
EDGE ORIENTATION HISTOGRAM
5.1 INTRODUCTION
Edge detection is one of the most commonly used operations in image
analysis, and there are probably more algorithms in the literature for enhancing and
detecting edges. The reason for this is that edges form the outline of an object. An
edge is the boundary between an object and the background, and indicates the
boundary between overlapping objects. This means that if the edges in an image can
be identified accurately, all of the objects can be located and basic properties such as
area, perimeter, and shape can be measured. Edges define the boundaries between
regions in an image, which helps with segmentation and object recognition. They can
show where shadows fall in an image or any other distinct change in the intensity of
an image. Edge detection is a fundamental of low-level image processing and good
edges are necessary for higher level processing. The problem is that in general edge
detectors behave very poorly. The quality of edge detection is highly dependent on
lighting conditions, the presence of objects of similar intensities, density of edges in
the scene, and noise. The detection of edges is shown in Fig.5.1.
Fig.5.1: Edge detection results
Department. of ECE, MRITS 26
Key Frame Extraction On MPEG by using Threshold Algorithm
5.2 FUNDAMENTALS OF EDGE DETECTION:
Edge detection refers to the process of identifying and locating sharp
discontinuities in an image. The discontinuities are abrupt changes in pixel intensity
which characterize boundaries of objects in a scene. Classical methods of edge
detection involve convolving the image with an operator (a 2-D filter), which is
constructed to be sensitive to large gradients in the image while returning values of
zero in uniform regions. There are an extremely large number of edge detection
operators available, each designed to be sensitive to certain types of edges. Variables
involved in the selection of an edge detection operator include:
(1) Edge orientation: The geometry of the operator determines a
characteristic direction in which it is most sensitive to edges. Operators
can be optimized to look for horizontal, vertical, or diagonal edges.
(2) Noise environment: Edge detection is difficult in noisy images, since both
the noise and the edges contain high-frequency content. Attempts to
reduce the noise result in blurred and distorted edges. Operators used on
noisy images are typically larger in scope, so they can average enough data
to discount localized noisy pixels. This results in less accurate localization
of the detected edges.
(3) Edge structure: Not all edges involve a step change in intensity. Effects
such as refraction or poor focus can result in objects with boundaries
defined by a gradual change in intensity. The operator needs to be chosen
to be responsive to such a gradual change in those cases. Newer wavelet-
based techniques uses actually characterize the nature of the transition for
each edge in order to distinguish, for example, edges associated with hair
from edges associated with a face.
5.3 EDGE DETECTION OPERATORS
5.3.1 Prewitt’s operator
Prewitt operator is similar to the Sobel operator and is used for detecting
vertical and horizontal edges in images. The Prewitt operator is used in image
processing, particularly within detection algorithms. Technically, it is a discrete
Department. of ECE, MRITS 27
Key Frame Extraction On MPEG by using Threshold Algorithm
differentiation operator, computing an approximation of the gradient of the image
intensity function. At each point in the image, the result of the Prewitt operator is
either the corresponding gradient vector or the norm of this vector. The Prewitt
operator is based on convolving the image with a small, separable, and integer valued
filter in horizontal and vertical direction and is therefore relatively inexpensive in
terms of computations. On the other hand, the gradient approximation which it
produces is relatively crude, in particular for high frequency variations in the image.
5.3.2 Canny Operator
Another approach to edge detection using colour information is simply to
extend a traditional intensity based edge detector into the colour space. This method
seeks to take advantage of the known strengths of the traditional edge detector and
tries to overcome its weaknesses by providing more information in the form of three
colour channels rather than a single intensity channel. As the Canny edge detector is
the current standard for intensity based edge detection, it seemed logical to use this
operator as the basis for colour edge detection.
The algorithm runs in 5 separate steps:
1 .Smoothing: Blurring of the image to remove noise.
2 Finding gradients: The edges should be marked where the gradients of the image
has large magnitudes.
3. Non-maximum suppression: Only local maxima should be marked as edges.
4. Double thresholding: Potential edges are determined by thresholding.
5. Edge tracking by hysteresis: Final edges are determined by suppressing all edges
that are not connected to a very certain (strong) edge.
5.3.3 Sobel operator
The Sobel operator is used in image processing, particularly within edge
detection algorithms. Technically, it is a discrete differentiation operator, computing
an approximation of the opposite of the gradient of the image intensity function. At
each point in the image, the result of the Sobel operator is either the corresponding
opposite of the gradient vector or the norm of this vector. The Sobel operator is based
on convolving the image with a small, separable, and integer valued filter in
horizontal and vertical direction and is therefore relatively inexpensive in terms of
Department. of ECE, MRITS 28
Key Frame Extraction On MPEG by using Threshold Algorithm
computations. On the other hand, the opposite of the gradient approximation that it
produces is relatively crude, in particular for high frequency variations in the image.
Mathematically, the operator uses two 3×3 kernels which are convolved with the
original image to calculate approximations of the derivatives - one for horizontal
changes, and one for vertical. If we define A as the source image, and Gx and Gy are
two images which at each point contain the horizontal and vertical derivative
approximations, the computations are as follows:
Where * here denotes the 2-dimensional convolution operation.
The x-coordinate is here defined as increasing in the "right"-direction, and the
y-coordinate is defined as increasing in the "down"-direction. At each point in the
image, the resulting Gradient approximations can be combined to give the gradient
magnitude, using
Using this information, we can also calculate the opposite of the gradient's direction:
Fig 5.2(b) shows the application of sobel operator for the original image shown in
Fig.5.2(a)
Department. of ECE, MRITS 29
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig. 5.2(a): Colour picture of a steam engine
Fig. 5.2(b): sobel operator applied to that image
The Fig 5.2(b) shows the application of sobel operator for the original image
shown in Fig.5.2 (a).
Department. of ECE, MRITS 30
Key Frame Extraction On MPEG by using Threshold Algorithm
5.4 FORMULATION
The purpose of edge detection in general is to significantly reduce the amount
of data in an image, while preserving the structural properties to be used for further
image processing. The edges are good under illumination changes. The edges are first
computed using horizontal and vertical Sobel operators which are then used to find
gradient and angle of edges. The angles are then used to build a histogram of edge
orientation. For simplicity, we defined only 72 bins for the angles. As in the case of
histograms, we compare histograms of corresponding sections of the two frames. The
edge Histogram difference “ED” between two frames i and j is calculated by taking
the average of the difference measure between each section. The formula for
calculating ED is
Department. of ECE, MRITS 31
Key Frame Extraction On MPEG by using Threshold Algorithm
CHAPTER 6
RESULTS6.1 FLOW CHART FOR THE EXTRACTION OF KEY FRAMES
Department. of ECE, MRITS 32
Video
Frames from input video
For n=0 n=n+10<n<total number of frames from the video
First frame?
Key frame data basek=0k=k+11<k<n
Current frame
Correlation difference (CD)
Colour histogram difference (HD)
Edge orientation histogram difference (ED)
Threshold
Discard frame
Key frame= current frame
Stop
False
True
Fig 6.1: Flow chart for the extraction of key frames
False
True
Last frame? FalseTrue
Start
Key Frame Extraction On MPEG by using Threshold Algorithm
6.2 ALGORITHM FOR EXTRACTING KEY FRAMES BASED ON
CORRELATION
The key frame extraction method is composed of the following steps
Step1: All the frames are extracted from the input sports video.
Step2: Consider first frame as a key frame.
Step3: Select the next subsequent frame from the extracted frames and divide frame
into a total of ‘Ts’ sections, each of size mxm (8x8).
Step4: Histogram Creation
Step4.1: Correlation Histogram Creation: The correlation values of each
section are then averaged. The correlation is measured for three color channel values
red, green and blue.
Step4.2: The correlation difference CDp,q,s,c of a color channel ‘c’ between
two corresponding sections ‘s’ of frame p and q is defined as:
Where s =1…T ; c =red, green, blue ;f= mean value of c channel of the frame.
Step4.3: The correlations of all sections of frame i and j are averaged to
Obtain the overall correlation CDi,j,c for a color channel.
Step4.4: Then, the overall correlation difference measure CDi,j between
frames i and j is obtained by averaging the value of each color channel.
Step4.5: CDi,j is compared with the threshold value to detect key frame. The
frames with higher CDi,j as compared to threshold are treated as key frame.
Department. of ECE, MRITS 33
Key Frame Extraction On MPEG by using Threshold Algorithm
Step5: To detect key frames based on correlation difference measure in entire video
repeat step3 & step4.
6.3 FLOW CHART FOR CORRELATION
Department. of ECE, MRITS 34
Current Frame
Key frame from the database
Division of each frame into Ts sections of size (m*m)
Correlation difference of two corresponding sections of current frame and previous frame (C1, C2 ...Cs) are calculated
Mean of correlation difference values
CD
Fig 6.2: Flow chart for correlation difference
Key Frame Extraction On MPEG by using Threshold Algorithm
6.4 ALGORITHM FOR EXTRACTING KEY FRAMES BASED ON
COLOUR DIFFERENCE MEASURE
The key frame extraction method is composed of the following steps
Step1: All the frames are extracted from the input sports video.
Step2: Consider first frame as a key frame.
Step3: select the next subsequent frame from the extracted frames and convert RGB
to HSV colour space then divide frame into a total of ‘Ts’ sections, each of size
mxm(8x8).
Step4: Histogram Creation
Step4.1: Colour Histogram Creation: A three dimensional colour histogram is
built by subdividing the HSV colour space into 8:2:4 bins.
Step4.2: The histogram difference HDi,j,s between two corresponding sections
‘s’ of histogram His of frame i and histogram Hjs of frame j is calculated by using the
formula
Step4.3: The histogram difference “HD” between two frames i and j is then
Calculated by taking the average of the difference measure between each section by
the formula
Step4.4: HDi,j is compared with the threshold value to detect key frame. The
frames with lower HDi,j as compared to threshold are treated as key frame.
Step5: To detect key frames based on colour difference measure in entire video
repeat step3 & step4.
Department. of ECE, MRITS 35
Key Frame Extraction On MPEG by using Threshold Algorithm
6.5 FLOW CHART FOR COLOUR HISTOGRAM
Department. of ECE, MRITS 36
Current Frame
Key frame from the database
Conversion of RGB to HSV
Colour histogram difference of two corresponding sections of current frame and previous frame (ch1, ch2 ...chs)
Mean of colour difference
values
HD
Division of each frame into Ts sections of size (m*m)
Fig 6.3: Flow chart of colour histogram difference
Key Frame Extraction On MPEG by using Threshold Algorithm
6.6 ALGORITHM FOR EXTRACTING KEY FRAMES BASED ON EDGE DIFFERENCE MEASURE
The key frame extraction method is composed of the following steps
Step1: All the frames are extracted from the input sports video.
Step2: Consider first frame as a key frame.
Step3: select the next subsequent frame from the extracted frames and convert RGB
to Gray image then divide frame into a total of ‘Ts’ sections, each of size mxm(8x8).
Step4: Histogram Creation Step4.1: Edge Histogram Creation: The edges are first computed using
horizontal and vertical Sobel operators which are then used to find gradient magnitude
and angle of edges. Gradient’s magnitude is given by
Gradient’s direction is given by
Step4.2: the angles are computed for only those pixels where value of gradient
is above a certain threshold (>3). The angles are then used to build a histogram of
edge orientation. We defined only 82 bins for the angles.
Step4.3: we compare histograms of corresponding sections of the two frames.
The edge histogram difference “ED” between two frames i and j is calculated by
taking the average of the difference measure between each section.
Step4.4: EDi,j is compared with the threshold value to detect key frame. The
frames with higher EDi,j as compared to threshold is treated as key frame.
Step5: To detect key frames based on edge difference measure in entire video repeat
step3 & step4.
Department. of ECE, MRITS 37
Key Frame Extraction On MPEG by using Threshold Algorithm
6.7 FLOW CHART FOR EDGE ORIENTATION HISTOGRAM
Department. of ECE, MRITS 38
Current Frame
Key frame from the database
RGB to gray conversion
Correlation difference of two corresponding sections of current frame and previous frame (e1, e2 ...es)
Mean of edge orientation difference values
Division of each frame into Ts sections of size (m*m)
Evaluate gradients magnitude
for all sections
If gradient magnitude <3
Evaluate gradient direction(ø=arc tan (Gy/Gx))
Eliminate edge False
True
Fig 6.4: Flow chart for edge orientation histogram difference
ED
Calculate gradients ( Gx & Gy )
Key Frame Extraction On MPEG by using Threshold Algorithm
6.8 COLOUR HISTOGRAM OUTPUT
Fig 6.5 : Reading the frames from the input video
Figure 6.5 indicates the reading of frames from the video as well as the comparisons of frame with the previous frame to find out the key frames.
Department. of ECE, MRITS 39
Key Frame Extraction On MPEG by using Threshold Algorithm
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
17 18 19 20
Fig 6.6: Frames extracted from the (sample) football video
Department. of ECE, MRITS 40
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig 6.7: colour histogram difference values for the sample (football) video
Figure 6.7 indicates the colour histogram difference values of the current frame and previous frame. Total 19 colour histogram difference values are generated from 20 frames in the football video. The range of colour histogram difference values is -64 to 0.The absolute value of the colour histogram differences are compared with the set of threshold value to extract key frames based on colour histogram. In this the frames with colour histogram difference value greater than the threshold are discarded.
Department. of ECE, MRITS 41
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig 6.8: output graph of colour histogram
The above Fig 6.8 shows the graph between frames and colour difference value.
Department. of ECE, MRITS 42
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig 6.9 (a): key frames based on colour histogram for the sample (football) video with the threshold value as 35.
The above figure shows the number of key frames extracted based on colour histogram technique with the threshold value as 35.Total 8 frames are obtained with this threshold value.
Department. of ECE, MRITS 43
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig 6.9 (b): set of key frames based on colour histogram for the sample (football) video with the threshold value as 35.
With 35 as the threshold value we obtained 8 frames as key frames based on colour histogram.
Department. of ECE, MRITS 44
Key Frame Extraction On MPEG by using Threshold Algorithm
Department. of ECE, MRITS 45
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig 6.10 (a): key frames based on colour histogram for the sample (football) video with the threshold value as 45.
The above figure shows the number of key frames extracted based on colour histogram technique with the threshold value as 45. Total 12 frames are obtained with this threshold value.
Department. of ECE, MRITS 46
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig 6.10(b): set of key frames based on colour histogram for the sample(football) video with the threshold value as 45.
With 45 as the threshold value we obtained 12 frames as key frames based on colour histogram.
Department. of ECE, MRITS 47
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig 6.11 (a): key frames based on colour histogram for the sample (football) video with the threshold value as 55.
The above figure shows the number of key frames extracted based on colour histogram technique with the threshold value as 55. Total 13 frames are obtained with this threshold value.
Department. of ECE, MRITS 48
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig 6.11(b): set of key frames based on colour histogram for the sample (football) video with the threshold value as 55.
With 55 as the threshold value we obtained 13 frames as key frames based on colour histogram.
Department. of ECE, MRITS 49
Key Frame Extraction On MPEG by using Threshold Algorithm
6.9 CORRELATION OUTPUT
Fig 6.12: Correlation difference values for the sample (football) video
Fig 6.12 indicates the correlation difference values of the current frame and previous frame. Total 19 correlation difference values are generated from 20 frames in the football video. The range of correlation difference values is 0 to 1. The absolute value of the correlation differences are compared with the set of threshold value to extract key frames based on correlation. In this the frames with correlation difference value lesser than the threshold are discarded.
Department. of ECE, MRITS 50
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig 6.13: output graph of correlation
The above figure shows the graph between frames and correlation difference value.
Department. of ECE, MRITS 51
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig 6.14 (a): key frames based on correlation for the sample (football) video with the threshold value as 0.4.
The above figure shows the number of key frames extracted based on correlation technique with the threshold value as 0.4. Total 4 frames are obtained with this threshold value.
Department. of ECE, MRITS 52
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig 6.14(b): set of key frames based on correlation for the sample (football) video with the threshold value as 0.4.
With 0.4 as the threshold value we obtained 4 frames as key frames based on correlation.
Department. of ECE, MRITS 53
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig 6.15 (a): key frames based on correlation for the sample (football) video with the threshold value as 0.6.
The above figure shows the number of key frames extracted based on correlation technique with the threshold value as 0.6. Total 2 frames are obtained with this threshold value.
Department. of ECE, MRITS 54
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig 6.15(b) :set of key frames based on correlation for the sample (football) video with the threshold value as 0.6.
With 0.6 as the threshold value we obtained 2 frames as key frames based on correlation.
Department. of ECE, MRITS 55
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig 6.16 (a): key frames based on correlation for the sample (football) video with the threshold value as 0.8.
The above figure shows the number of key frames extracted based on correlation technique with the threshold value as 0.8. Only one frame is obtained as a key frame.
Department. of ECE, MRITS 56
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig 6.16(b): set of key frames based on correlation for the sample (football) video with the threshold value as 0.8.
With 0.8 as the threshold value we obtained 1 frame as key frames based on correlation.
Department. of ECE, MRITS 57
Key Frame Extraction On MPEG by using Threshold Algorithm
6.10 EDGE ORIENTATION HISTOGRAM OUTPUT
Fig 6.17: edge orientation histogram difference values for the sample (football) video
Figure 6.17 indicates the edge orientation histogram difference values of the current frame and previous frame. Total 19 edge orientation histogram difference values are generated from 20 frames in the football video. The range of edge orientation histogram difference values is 0 to 82.the absolute value of the edge orientation histogram differences are compared with the set of threshold value to extract key frames based on edge orientation histogram. In this the frames with edge orientation difference value lesser than the threshold are discarded.
Department. of ECE, MRITS 58
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig 6.18: output graph of edge orientation histogram
The above figure shows the graph between frames and edge orientation difference value.
Department. of ECE, MRITS 59
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig 6.19 (a): key frames based on edge orientation histogram for the sample (football) video with the threshold value as 40.
The above figure shows the number of key frames extracted based on edge orientation histogram technique with the threshold value as 40. Total 11 frames are obtained with this threshold value.
Department. of ECE, MRITS 60
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig 6.19(b): set of key frames based on edge orientation histogram for the sample (football) video with the threshold value as 40.
With 40 as the threshold value we obtained 11 frames as key frames based on edge orientation histogram.
Department. of ECE, MRITS 61
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig 6.20 (a): key frames based on edge orientation histogram for the sample (football) video with the threshold value as 50.
The above figure shows the number of key frames extracted based on edge orientation histogram technique with the threshold value as 50. Total 6 frames are obtained with this threshold value.
Department. of ECE, MRITS 62
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig 6.20(b): set of key frames based on edge orientation histogram for the sample (football) video with the threshold value as 50
With 50 as the threshold value we obtained 6 frames as key frames based on edge orientation histogram.
Department. of ECE, MRITS 63
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig 6.21 (a): key frames based on edge orientation histogram for the sample (football) video with the threshold value as 60.
The above figure shows the number of key frames extracted based on edge orientation histogram technique with the threshold value as 60. Total 4 frames are obtained with this threshold value.
Department. of ECE, MRITS 64
Key Frame Extraction On MPEG by using Threshold Algorithm
Fig 6.21(b): set of key frames based on edge orientation histogram for the sample (football) video with the threshold value as 60.
With 60 as the threshold value we obtained 4 frames as key frames based on edge orientation histogram.
Department. of ECE, MRITS 65
Key Frame Extraction On MPEG by using Threshold Algorithm
6.11 OUTPUT
For different sport videos the number of key frames for different threshold value based on colour, correlation and edge orientation techniques are shown below.
COLOR HISTOGRAM
Type of video Total no. of frames
Number of key frames for the Threshold value
35 45 55Sample(football) 20 8 12 13
Cricket 455 19 84 231Football 121 1 1 1Hockey 476 1 17 260
Table 6.1: Colour histogram key frames for different frames on different videos.
CORRELATION
Type of video Total no. of frames
Number of key frames for the Threshold value
0.4 0.6 0.8Sample(football) 20 4 2 1
Cricket 455 192 72 2Football 121 89 27 1Hockey 476 292 101 1Table 6.2: Correlation key frames for different frames on different videos.
Department. of ECE, MRITS 66
Key Frame Extraction On MPEG by using Threshold Algorithm
EDGE ORIENTATION HISTOGRAM
Type of video Total no. of frames
Number of key frames for the Threshold value
40 50 60Sample(football) 20 11 6 4
Cricket 455 106 28 3Football 121 22 2 1Hockey 476 49 6 1
Table 6.3: Edge orientation histogram key frames for different frames on different videos.
Colour histogram
correlation Edge orientation histogram
Exactly matched
64 0 0
Partially matched
35 0.6 50
Mismatch 0 1 82
Table 6.4: Frame difference measures
The above table 6.4 gives the behaviour of different frame difference measures.
6.12 PERFORMANCE MEASURES
6.12.1 Accuracy rate:
Accuracy rate is defined as the ratio of number of matched key frames from the automatic summary to the number of key frames from the user summary.
Accuracy rate =
Department. of ECE, MRITS 67
Key Frame Extraction On MPEG by using Threshold Algorithm
6.12.2 Error rate:
Error rate is defined as the ratio of number of non matched key frames from the automatic summary to the number of key frames from the user summary.
Error rate =
Where Nmas= number of matched key frames from the automatic summary
Nnmas=number of non matched key frames from the automatic summary
Nus= number of key frames from the user summary.
The value of Accuracy Rate varies from 0 to 1, 1 being the best value where
all frames of automated summary matches with all frames of user summaries. The
value of Error Rate ranges from 0 to Nas /Nus where 0 is the best value (Nas is the
number of frames in automatic summary). The quality of a summary is superior if it
has high Accuracy Rate and low Error Rate.
Color histogram correlation Edge orientation histogram
Accuracy rate 0.8 0.7 1.0
Error rate 0.2 0.3 0.0
Table 6.5: Comparison of accuracy and error rates of a sport (football) video
The above table 6.5 clearly shows the accuracy and error rates for a soccer
video. The accuracy rate and error rate of colour histogram are 0.8 and 0.2 with a
threshold of 35. Similarly, for correlation and edge orientation histogram the accuracy
rate and error rate values are 0.7 , 0.3 and 1, 0 for a threshold of 0.2 and 40
respectively. From the above table an error of 0.2 occurs for colour histogram
measure, because in most of the sport videos the camera is mostly concentrated on the
field. In such situation the colour histogram difference is almost similar for the frames
even though there is a change in the frame and an error will occurs in extracting the
key frames. The error in the correlation is due to the pixel wise comparison. So, the
edge orientation histogram feature works well for the sports videos to extract
keyframes.
Department. of ECE, MRITS 68
Key Frame Extraction On MPEG by using Threshold Algorithm
CHAPTER 7
CONCLUSION AND FUTURE SCOPE
7.1 CONCLUSION
Our proposed system is able to extract the key frames from most of the
sports videos. The methods used are computationally simple and dynamically
determines the number of key frames. Experiments on other type of videos such as
cartoons, documentaries etc., have shown that the method is adaptive to the video
content. The experimental results shows that the frame difference features using edge
orientation histogram has high accuracy rate and low error rate.
7.2 FUTURE SCOPE
In our project we had extracted key frames by using multiple frame
difference features individually. But in general one frame difference feature alone is
not enough to capture all the visual contents of the image. For instance, color
histograms have been a very popular feature for image representation and
computation of key frames. However, key frame methods that use color histograms as
FDM, tends to fail in scenes with illumination changes. For instance, in a video of a
soccer game, where the camera is mostly focused on the field, edge orientation is an
appropriate feature to capture the camera motion.
This means that for a particular genre of videos, different visual features must
be combined with varying weights, giving more weight to the visual feature (or FDM)
which provides more detail about the visual content of the video. Therefore certain
low level features can be combined to get an effective representation of a frame.
Department. of ECE, MRITS 69
Key Frame Extraction On MPEG by using Threshold Algorithm
REFERENCES
[1] Automatic Video Classification: A Survey of the Literature Darin Brezeale and
Diane J. Cook, Senior Member, IEEE, 2007.
[2] Ciocca G, Schettini R (2006).Innovative Algorithm for Key Frame
Extraction in Video Summarization. J. Real Time Image Process, 1(1): 69-88.
[3] “Classification of sports videos using edge based features and auto associative
neural models”, C.Krishna Mohan, B. Yegnanarayana in Signal, image and video
processing.
[4] Combined Key-frame Extraction and Object-based Video Segmentation, Lijie
Liu, Student Member, IEEE, and Guoliang Fan, Member, IEEE.
[5] Gunsel B, Tekalp AM (1998). Content-based video abstraction. Proceedings of
IEEE International Conference of Image Processing, Chicago, USA, 1998, pp. 128–
132.
[6] International Journal of Computer and Electrical Engineering, Vol. 2, No. 2, April,
2010 1793-8163Integrating Pixel Cluster Indexing, Histogram Intersection And
Discrete Wavelet Transform Methods For Colour Images Content Based Image
Retrieval System.
[7] Jianxinwu and James m.Rehg:”Beyond the Euclidean distance: creating effective
visual code books using the histogram intersection kernel.
[8] J Sklansky, “Image Segmentation and Feature Extraction,” IEEE Trans on
Systems, Man and Cybernetics, vol8, pp237-247, 1978.
[9] Jiang RM, Sadka AH, Crooks D (2009). Advances in Video Summarization and
Skimming. In: Grgic M et al. (eds.) Recent Advances in Multimedia Signal
Processing and Communications, Springer, Berlin, 231: 27-50.
[10] Li Y, Zhang T, Tretter D (2001). An overview of video abstraction techniques.
Tech. Rep., HP-2001-191, HP Laboratory.
Department. of ECE, MRITS 70
Key Frame Extraction On MPEG by using Threshold Algorithm
[11] Lin Mei and Gred “Kernel biased discriminate analysis using histogram
intersection kernel for content based image”.
[12] Money AG, Agius H (2008). Video summarisation: A conceptual framework and
survey of the state of the art. J. Visual Commun. Image Represent. 19(2): 121-143.
[13] Mundur P, Rao Y, Yesha Y (2006). Key frame-based video summarization using
Delaunay clustering. Int. J. Digital Lib., 6(2): 219-232
[14] N.Dalal and B.triggs “Histogram of oriented gradients for human
detection”InCVPR, volume 1 page 886-893, 2005.
[15]. “Pearson's Correlation Coefficient for Discarding Redundant Information in
Real Time Autonomous Navigation System”, A. Miranda Neto, Member, IEEE, L.
Rittner, Member, IEEE, N. Leite, D. E. Zampieri, R. Lotufo and A. Mendeleck.
[16] Tianming L, Zhang HJ, Qi FH (2003). A novel video key-frame extraction
algorithm based on perceived motion energy model.
[17]. Y.K. Eugene and R.G. Johnston, “The Ineffectiveness of the Correlation
Coefficient for Image Comparisons”, Technical Report LA-UR-96-2474, Los
Alamos, 1996.l. 13(10): 1006-1013.
[18]. Zhang HJ,Wu J, Zhong D, Smoliar SW (1997). An integrated system for
content-based video retrieval and browsing. Pattern Recognit., 30(4): 643–658.
Department. of ECE, MRITS 71