International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
221
KEY FRAME EXTRACTION METHODOLOGY FOR VIDEO
ANNOTATION
Ms. Khushboo Khurana1, Dr. M. B. Chandak
2
M.Tech Scholar, CSE Department, SRCOEM, Nagpur, India 1
Associate Professor and Head, CSE Department, SRCOEM, Nagpur, India 2
ABSTRACT
Recent advances in technology have made tremendous amount of multimedia content
available. The amount of video content is increasing, due to which the systems that improve
the access to the video is needed. This can be done by annotation of video, which facilitate the
faster access to the videos. The first step towards the video annotation is the extraction of key
frames. Instead of analysing all the frames in the video, only the frames which contain
important information of the video can be used for further processing. In this paper, key frame
extraction method is discussed which assist the video annotation process. The key frames are
found by computing the edge difference between the consecutive frames and those frames
exceeding the threshold are considered as key frames.
KEYWORDS: Key frame extraction, edge difference, video annotation
1. INTRODUCTION
The world as a living space is shrinking, are we really shrinking or have we found a
new horizon to live in. It is true we are expanding leaps and bounds in Gbs and terabyte
world. Recent advances in technology have made tremendous amounts of multimedia
information available to the general population. A video in simplest of words is
agglomeration of data. With the ever escalating videos the systems for processing these
videos need to be developed. Analyzing these videos as small data packets for the simplicity
of human effort is the need of the hour.
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING
& TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 4, Issue 2, March – April (2013), pp. 221-228
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
www.jifactor.com
IJCET
© I A E M E
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
222
Video annotation is a promising and essential step for content-based video search and
retrieval. It refers to attaching a metadata to the video for its faster and easier access.
Extraction of key frames from the video and to analyze only these frames instead of all the
frames present in the video can greatly improve the performance of the systems. Analysis of
these key frames can help in forming the annotations for the video.
Key frame is the frame which can represent the salient content and information of the
video. The key frames extracted must summarize the characteristics of the video, and the image
characteristics of a video can be tracked by all the key frames in time sequence. A basic rule of
key frame extraction is that key frame extraction would rather be wrong than not enough [1].
In this paper, we have proposed an algorithm for key frame extraction to facilitate the
video annotation process. The algorithm uses edge difference between the two consecutive
frames to find the difference between their contents. Our approach is shot-based. In shot based
method shots of the original video are first detected, and then one or more key frames are
extracted from each shot.
Methods of shot transition detection are: pixel-based comparison, template matching
and histogram-based method [2-3]. The pixel-based methods are susceptible to motion of
objects. So it is suitable to detect segmentation transition of the camera and object movement.
But in this method as each pixel is compared the time required is more. Template matching is
apt to result in error detection if only this method is used. The Histogram-based methods
entirely lose the location information. For example, two images with similar histograms may
have completely different content. So we have used the edge- based method. This method
considers the content of the frames.
The rest of this paper is organized as follows. Section 2 describes the uses of key frame
extraction. Section 3 presents the related work in the field of key frame extraction. In Section 4,
the proposed approach is described with the help of algorithm and flowchart. In section 5 the
results are specified and finally, we conclude in Section 6.
2. USES OF KEY-FRAME EXTRACTION
• Video transmission: In order to reduce the transfer stress in network and invalid information
transmission, the transmission, storage and management techniques of video information
become more and more important [1].
When a video is being transmitted, the use of key frames reduces the amount of data
required in video indexing and provides the framework for dealing with the video content [4].
In [5], a key frame based on-line coding video transmission is proposed. Key-frames
are fixed in advance. Each frame can only choose the latest coded and reconstructed key frame
as its reference frame. After coding and packetisation, compressed video packets are
transmitted with differentiated service classes. Key frame along with difference values are sent
from the source, using the key frame picture and the difference values the picture is
reconstructed at the destination.
• Video summarization: Video summarization is a compact representation of a video sequence.
It is useful for various video applications such as video browsing and retrieval systems. A
video summarization can be a preview sequence which can be a collection of key frames
which is a set of chosen frames of a video. Key-frame-based video summarization may lose
the spatio-temporal properties and audio content in the original video sequence; it is the
simplest and the most common method. When temporal order is maintained in selecting the
key frames, users can locate specific video segments of interest by choosing a particular key
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
223
frame using a browsing tool. Key frames are also effective in representing visual content of a
video sequence for retrieval purposes. Video indexes may be constructed based on visual
features of key frames, and queries may be directed at key frames using image retrieval
techniques [6].
• Video annotation: Video annotation is the extraction of the information about video, adding
this information to the video which can help in browsing, searching, analysis, retrieval,
comparison, and categorization. Annotation is to attach data to some other piece of data (i.e.
add metadata to data) [7].
To fasten the access of video, it is annotated. It is not momentous to analyze each video frame
for this, so key frames are found and only these are analyzed for annotation purpose.
• Video indexing: Key frames reduce the amount of data required in video indexing and
provides framework for dealing with the video content.
• Before downloading any video over the internet, if key frames are shown besides it, users can
predict the content of the video and decide whether it is pertinent to his search.
• Other applications such as creating chapter titles in DVDs and prints from video.
3. RELATED WORK
The work in the area of key frame extraction is either in the spatial domain or in the
compressed domain. In [8] key frames are extracted using histogram difference between two
consecutive frames.
Jin-Woo Jeong, Hyun-Ki Hong, and Dong-Ho Lee have proposed an approach for the
detection of a video shot and its corresponding key frame can be performed based on the
visual similarity between adjacent video frames.They used Euclidean distance measure to
visual similarity between video frames. First frame of each shot is selected as a key frame [9].
Janko Calic and Ebroul Izquierdo proposed an algorithm for scene change detection
and key frame extraction [10]. It generates the frame difference metrics by analyzing statistics
of the macro-block feature extracted from MPEG videos. Temporal segmentation is used to
detect the scene change.
A more elaborate method is employed by [11] that propose an approach which uses
shot boundary detection to segment the video into shots and the k-means algorithm to
determine cluster representatives for each shot that are used as key frames. MPEG-7 Color
Layout Descriptor (CLD) is used as a feature to compute differences between consecutive
frames. As k-means is employed after finding shot boundary its complexity increases.
4. THE PROPOSED APPROACH
The first step towards video annotation is the extraction of key frames. The key
frames must contain the important frames so as to describe the contents of the video in the
later processing stages. After the extraction of important frames, instead of analyzing the
contents of all video frames, only the key frame images are analyzed to give the annotation.
The number of frames should not be reduced to an extent that important information is not
covered by the key frames. As the key frames are analyzed after the key frame extraction
process, the algorithm for extraction should not be very complex or time consuming.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
224
4.1 ALGORITHM FOR KEY FRAME EXTRACTION FROM VIDEO
All the frames in the video do not contain important information. Each frame is a
slight variation of the previous frame. It is not meaningful to analyze all the frames; so we find
those frames which contain important information.
For the detection of key frame we have used edge difference to calculate the
difference between two consecutive frames. Only when the difference exceeds a threshold,
one of the consecutive frames is considered as the key frame. The reason we choose edge
difference is that the edge is content dependent. The detailed description for key frame
extraction from the video is as follows:
Input: Video V, consisting of N frames
Output: Key frames for input video
Algorithm Key frame Extraction {
Step 1: For each video frame k = 1 to N
{
1. Read frame V k and V k+1
2. Obtain the gray level image for V k and V k+1
G k = gray image of V k
G k+1 = gray image of V k+1
3. Find the edge difference between G k and G k+1 using Canny edge detector.
Let diff(k) be their difference.
diff(k) = ∑ ∑ (G k - G k+1 )
i j
where i,j are row and column index
}
Step 2:
Compute the mean and standard deviation
Mean, M =
Standard deviation, S =
Step 3: Compute the threshold value
Threshold = M + a x S
Where, a is a constant
Step 4:
Find the key-frames
for k = 1 to (N-1)
{
if diff(k) > Threshold
{
Write frame V k+1 as the output key-frame
}
}
}
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
225
Video V is given as the input; this video consists of total N frames. We first read 1st and 2nd
frame, convert them to gray scale and find their edge difference using Canny edge detector. The
difference is stored in diff (1). Next 2nd
and 3rd
frames are read, the edge difference of their gray scale
images are computed. Now the difference is stored in diff(2). Then consider 3rd and 4th frame, 4th and
5th frame, as so on. The procedure is repeated for all the N frames of the video. Diff (k) contains the
differences between all the consecutive frames for the given input video V. Fig demonstrates how the
edge differences are computed. As show in the fig.1 the last difference is k, where k = N -1.
Canny edge detector gives a matrix for the difference between frames; hence diff(k) is
Calculated by summation of values of rows and columns to get a single difference value
Diff (k) = ∑ ∑ (G k - G k+1 )
i j
Where i,j are row and column index.
After getting frame differences, mean and standard deviation are calculated (refer step 2 of
algorithm). Then threshold is calculated using the formula:
Threshold = mean + a x standard deviation
Where, a is a constant. After trying for various values, we used value of a=2, as the results were as
desired using this value.
The differences which exceed the threshold are considered. If so happens the contents have a
significant change and may contain important information. If the difference of two consecutive frames
exceeds the threshold, the latter frame is considered as the key frame. All the key frame images are
stored in a folder.
4.2 Flowchart for key-frame extraction from video
The flowchart for key frame extraction from a video is shown in Fig.2.
.
Fig.2. Flowchart for key frame extraction
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
226
5. RESULTS
The videos mainly from transport domain consisting of videos with airplane, bus, car or bike
are considered for the input to the system. The videos are downloaded from youtube. Audio part of the
video is not considered. Videos with slight moment of the camera and with no or small amount of
background changes were used. We have implemented the algorithm in Matlab R2012a.
The input video containing airplane had more than 500 frames; some of the frames are shown
in Fig.3.
The edge difference between the consecutive frames was found. The edge difference between
1st and 2nd frame was 4138, edge difference between 2nd and 3rd frame was 3352, between 3rd and 4th –
4185, between 4th and 5
th – 3564, and so on. After finding the edge differences between all the
consecutive frames the following values were computed:
Max 5734
Min 162
Median 2725
Mean 2.8222e+03
Standard deviation 1.3575e+03
Threshold 5.5371e+03
Those frames which exceed the threshold value are considered as key frames. Fig. 4 shows the
extracted frames as key frames for the input video whose frames are shown in fig.3.
Result of key frame extraction on input video containing car and humans, along with the
frame number is shown in fig.5. This video had a still background with humans moving in the video.
Analysis of these key frames can result in semantic annotation the videos. The actions or events can
also be analyzed.
Fig.3. Frames of the input video
Fig.4. Output key frames for airplane video
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
227
The fig.6 shows the result on the video where the change in the content is high. In this video many cars
are moving on the road. The result shows that each car is captured by the key frames.
Fig.5. Output key frames for car video
Fig.6. Output key frames for video with more amount of content change.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
228
6. CONCLUSION AND FUTURE WORK
Depending upon the contents and the change in contents of the video, the key frames are
extracted. As seen in the first video the no. of key frames is less; this is because the change of content
in this video was very less. In the third video example above, the change of content or the amount of
information in the video is more so more number of frames are extracted as key frames.
As the key frames need to be processed for annotation purpose, the important information must
not be missed. Our algorithm can be improved by further reducing the number of key frames extracted.
This can be done by adding one more pass. After the phase 1 the key frames extracted can again be
given as input to the algorithm. This will reduce the redundant frames or the frames which contain
similar contents, but adding one more pass will increase the execution time. As the frames need to be
analyzed after key frame extraction for the purpose of annotation, some amount of redundancy can be
considered rather than increasing the execution time.
In future, we can design a video annotation system which will utilize the key frames obtained
from the above algorithm.
REFERENCES
[1] G. Liu, and J. Zhao, “Key Frame Extraction from MPEG Video Stream ”, Proceedings of the
Second Symposium International Computer Science and Computational Technology (ISCSCT
’09) China, 26-28, Dec. 2009, pp. 007-011.
[2] C. F. Lam, M. C. Lee, “Video segmentation using color difference histogram,” Lecture Notes in
Computer Science, New York: Springer Press, pp. 159–174., 1998.
[3] A. Hampapur, R. Jain, and T. Weymouth, “Production model based digital video segmentation,”
Multimedia Tools Application, vol. 1, no. 1, pp.9–46, 1995.
[4] T. Liu, H. Zhang, and F. Qi, “A novel video key-frame-extraction algorithm based on perceived
motion energy model,” IEEE Transactions on Circuits and Systems. For Video Technology, vol.
13, no. 10, pp. 1006-1013, 2003.
[5] Q. Zhang and G. Liu, “A key-frame-based error resilient coding scheme for video transmission
over differentiated services networks,” In proceeding of: Packet Video 2007, 12-13 Nov. 2007 ,
pp. 85 – 90.
[6] P. Mundur, Y. Rao, Y. Yesha, “Keyframe-based Video Summarization using Delaunay
Clustering,” International Journal on Digital Libraries , Volume 6 Issue 2, April 2006
pp 219 - 232.
[7] K. Khurana, M. B. Chandak, “Study of Various Video Annotation Techniques,” International
Journal of Advanced Research in Computer and Communication Engineering Vol. 2, Issue 1,
January 2013.
[8] S. Thakare, “Intelligent Processing and Analysis of Image for shot Boundary Detection”,
International Journal of Engineering Research and Applications, Vol. 2, Issue 2, Mar-Apr 2012,
pp.366-369.
[9] J. Jeong, H. Hong, and D. Lee, “Ontology-based Automatic Video Annotation Technique In
Smart TV Environment”, IEEE Transaction on consumer Electronics, Vol. 57, No. 4, November
2011
[10] J. Calic and E. Izquierdo, “Efficient Key-frame Extraction And Video Analysis”, International
Symposium On Information Technology, April 2002,IEEE.
[11] D. Borth, A. Ulges, C. Schulze, T. M. Breuel, “Key frame Extraction for Video Tagging &
Summarization”, 2008.
[12] Reeja S R and Dr. N. P Kavya, “Motion Detection for Video Denoising – The State of Art And
The Challenges” International journal of Computer Engineering & Technology (IJCET), Volume
3, Issue 2, 2012, pp. 518 - 525, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.