Key frame extraction methodology for video annotation

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME

221

KEY FRAME EXTRACTION METHODOLOGY FOR VIDEO

ANNOTATION

Ms. Khushboo Khurana1, Dr. M. B. Chandak

2

M.Tech Scholar, CSE Department, SRCOEM, Nagpur, India 1

Associate Professor and Head, CSE Department, SRCOEM, Nagpur, India 2

ABSTRACT

Recent advances in technology have made tremendous amount of multimedia content

available. The amount of video content is increasing, due to which the systems that improve

the access to the video is needed. This can be done by annotation of video, which facilitate the

faster access to the videos. The first step towards the video annotation is the extraction of key

frames. Instead of analysing all the frames in the video, only the frames which contain

important information of the video can be used for further processing. In this paper, key frame

extraction method is discussed which assist the video annotation process. The key frames are

found by computing the edge difference between the consecutive frames and those frames

exceeding the threshold are considered as key frames.

KEYWORDS: Key frame extraction, edge difference, video annotation

1. INTRODUCTION

The world as a living space is shrinking, are we really shrinking or have we found a

new horizon to live in. It is true we are expanding leaps and bounds in Gbs and terabyte

world. Recent advances in technology have made tremendous amounts of multimedia

information available to the general population. A video in simplest of words is

agglomeration of data. With the ever escalating videos the systems for processing these

videos need to be developed. Analyzing these videos as small data packets for the simplicity

of human effort is the need of the hour.

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING

& TECHNOLOGY (IJCET)

ISSN 0976 – 6367(Print)

ISSN 0976 – 6375(Online)

Volume 4, Issue 2, March – April (2013), pp. 221-228

© IAEME: www.iaeme.com/ijcet.asp

Journal Impact Factor (2013): 6.1302 (Calculated by GISI)

www.jifactor.com

IJCET

© I A E M E



222

Video annotation is a promising and essential step for content-based video search and

retrieval. It refers to attaching a metadata to the video for its faster and easier access.

Extraction of key frames from the video and to analyze only these frames instead of all the

frames present in the video can greatly improve the performance of the systems. Analysis of

these key frames can help in forming the annotations for the video.

Key frame is the frame which can represent the salient content and information of the

video. The key frames extracted must summarize the characteristics of the video, and the image

characteristics of a video can be tracked by all the key frames in time sequence. A basic rule of

key frame extraction is that key frame extraction would rather be wrong than not enough [1].

In this paper, we have proposed an algorithm for key frame extraction to facilitate the

video annotation process. The algorithm uses edge difference between the two consecutive

frames to find the difference between their contents. Our approach is shot-based. In shot based

method shots of the original video are first detected, and then one or more key frames are

extracted from each shot.

Methods of shot transition detection are: pixel-based comparison, template matching

and histogram-based method [2-3]. The pixel-based methods are susceptible to motion of

objects. So it is suitable to detect segmentation transition of the camera and object movement.

But in this method as each pixel is compared the time required is more. Template matching is

apt to result in error detection if only this method is used. The Histogram-based methods

entirely lose the location information. For example, two images with similar histograms may

have completely different content. So we have used the edge- based method. This method

considers the content of the frames.

The rest of this paper is organized as follows. Section 2 describes the uses of key frame

extraction. Section 3 presents the related work in the field of key frame extraction. In Section 4,

the proposed approach is described with the help of algorithm and flowchart. In section 5 the

results are specified and finally, we conclude in Section 6.

2. USES OF KEY-FRAME EXTRACTION

• Video transmission: In order to reduce the transfer stress in network and invalid information

transmission, the transmission, storage and management techniques of video information

become more and more important [1].

When a video is being transmitted, the use of key frames reduces the amount of data

required in video indexing and provides the framework for dealing with the video content [4].

In [5], a key frame based on-line coding video transmission is proposed. Key-frames

are fixed in advance. Each frame can only choose the latest coded and reconstructed key frame

as its reference frame. After coding and packetisation, compressed video packets are

transmitted with differentiated service classes. Key frame along with difference values are sent

from the source, using the key frame picture and the difference values the picture is

reconstructed at the destination.

• Video summarization: Video summarization is a compact representation of a video sequence.

It is useful for various video applications such as video browsing and retrieval systems. A

video summarization can be a preview sequence which can be a collection of key frames

which is a set of chosen frames of a video. Key-frame-based video summarization may lose

the spatio-temporal properties and audio content in the original video sequence; it is the

simplest and the most common method. When temporal order is maintained in selecting the

key frames, users can locate specific video segments of interest by choosing a particular key



223

frame using a browsing tool. Key frames are also effective in representing visual content of a

video sequence for retrieval purposes. Video indexes may be constructed based on visual

features of key frames, and queries may be directed at key frames using image retrieval

techniques [6].

• Video annotation: Video annotation is the extraction of the information about video, adding

this information to the video which can help in browsing, searching, analysis, retrieval,

comparison, and categorization. Annotation is to attach data to some other piece of data (i.e.

add metadata to data) [7].

To fasten the access of video, it is annotated. It is not momentous to analyze each video frame

for this, so key frames are found and only these are analyzed for annotation purpose.

• Video indexing: Key frames reduce the amount of data required in video indexing and

provides framework for dealing with the video content.

• Before downloading any video over the internet, if key frames are shown besides it, users can

predict the content of the video and decide whether it is pertinent to his search.

• Other applications such as creating chapter titles in DVDs and prints from video.

3. RELATED WORK

The work in the area of key frame extraction is either in the spatial domain or in the

compressed domain. In [8] key frames are extracted using histogram difference between two

consecutive frames.

Jin-Woo Jeong, Hyun-Ki Hong, and Dong-Ho Lee have proposed an approach for the

detection of a video shot and its corresponding key frame can be performed based on the

visual similarity between adjacent video frames.They used Euclidean distance measure to

visual similarity between video frames. First frame of each shot is selected as a key frame [9].

Janko Calic and Ebroul Izquierdo proposed an algorithm for scene change detection

and key frame extraction [10]. It generates the frame difference metrics by analyzing statistics

of the macro-block feature extracted from MPEG videos. Temporal segmentation is used to

detect the scene change.

A more elaborate method is employed by [11] that propose an approach which uses

shot boundary detection to segment the video into shots and the k-means algorithm to

determine cluster representatives for each shot that are used as key frames. MPEG-7 Color

Layout Descriptor (CLD) is used as a feature to compute differences between consecutive

frames. As k-means is employed after finding shot boundary its complexity increases.

4. THE PROPOSED APPROACH

The first step towards video annotation is the extraction of key frames. The key

frames must contain the important frames so as to describe the contents of the video in the

later processing stages. After the extraction of important frames, instead of analyzing the

contents of all video frames, only the key frame images are analyzed to give the annotation.

The number of frames should not be reduced to an extent that important information is not

covered by the key frames. As the key frames are analyzed after the key frame extraction

process, the algorithm for extraction should not be very complex or time consuming.



224

4.1 ALGORITHM FOR KEY FRAME EXTRACTION FROM VIDEO

All the frames in the video do not contain important information. Each frame is a

slight variation of the previous frame. It is not meaningful to analyze all the frames; so we find

those frames which contain important information.

For the detection of key frame we have used edge difference to calculate the

difference between two consecutive frames. Only when the difference exceeds a threshold,

one of the consecutive frames is considered as the key frame. The reason we choose edge

difference is that the edge is content dependent. The detailed description for key frame

extraction from the video is as follows:

Input: Video V, consisting of N frames

Output: Key frames for input video

Algorithm Key frame Extraction {

Step 1: For each video frame k = 1 to N

{

1. Read frame V k and V k+1

2. Obtain the gray level image for V k and V k+1

G k = gray image of V k

G k+1 = gray image of V k+1

3. Find the edge difference between G k and G k+1 using Canny edge detector.

Let diff(k) be their difference.

diff(k) = ∑ ∑ (G k - G k+1 )

i j

where i,j are row and column index

}

Step 2:

Compute the mean and standard deviation

Mean, M =

Standard deviation, S =

Step 3: Compute the threshold value

Threshold = M + a x S

Where, a is a constant

Step 4:

Find the key-frames

for k = 1 to (N-1)

{

if diff(k) > Threshold

{

Write frame V k+1 as the output key-frame

}

}

}



225

Video V is given as the input; this video consists of total N frames. We first read 1st and 2nd

frame, convert them to gray scale and find their edge difference using Canny edge detector. The

difference is stored in diff (1). Next 2nd

and 3rd

frames are read, the edge difference of their gray scale

images are computed. Now the difference is stored in diff(2). Then consider 3rd and 4th frame, 4th and

5th frame, as so on. The procedure is repeated for all the N frames of the video. Diff (k) contains the

differences between all the consecutive frames for the given input video V. Fig demonstrates how the

edge differences are computed. As show in the fig.1 the last difference is k, where k = N -1.

Canny edge detector gives a matrix for the difference between frames; hence diff(k) is

Calculated by summation of values of rows and columns to get a single difference value

Diff (k) = ∑ ∑ (G k - G k+1 )

i j

Where i,j are row and column index.

After getting frame differences, mean and standard deviation are calculated (refer step 2 of

algorithm). Then threshold is calculated using the formula:

Threshold = mean + a x standard deviation

Where, a is a constant. After trying for various values, we used value of a=2, as the results were as

desired using this value.

The differences which exceed the threshold are considered. If so happens the contents have a

significant change and may contain important information. If the difference of two consecutive frames

exceeds the threshold, the latter frame is considered as the key frame. All the key frame images are

stored in a folder.

4.2 Flowchart for key-frame extraction from video

The flowchart for key frame extraction from a video is shown in Fig.2.

.

Fig.2. Flowchart for key frame extraction



226

5. RESULTS

The videos mainly from transport domain consisting of videos with airplane, bus, car or bike

are considered for the input to the system. The videos are downloaded from youtube. Audio part of the

video is not considered. Videos with slight moment of the camera and with no or small amount of

background changes were used. We have implemented the algorithm in Matlab R2012a.

The input video containing airplane had more than 500 frames; some of the frames are shown

in Fig.3.

The edge difference between the consecutive frames was found. The edge difference between

1st and 2nd frame was 4138, edge difference between 2nd and 3rd frame was 3352, between 3rd and 4th –

4185, between 4th and 5

th – 3564, and so on. After finding the edge differences between all the

consecutive frames the following values were computed:

Max 5734

Min 162

Median 2725

Mean 2.8222e+03

Standard deviation 1.3575e+03

Threshold 5.5371e+03

Those frames which exceed the threshold value are considered as key frames. Fig. 4 shows the

extracted frames as key frames for the input video whose frames are shown in fig.3.

Result of key frame extraction on input video containing car and humans, along with the

frame number is shown in fig.5. This video had a still background with humans moving in the video.

Analysis of these key frames can result in semantic annotation the videos. The actions or events can

also be analyzed.

Fig.3. Frames of the input video

Fig.4. Output key frames for airplane video



227

The fig.6 shows the result on the video where the change in the content is high. In this video many cars

are moving on the road. The result shows that each car is captured by the key frames.

Fig.5. Output key frames for car video

Fig.6. Output key frames for video with more amount of content change.



228

6. CONCLUSION AND FUTURE WORK

Depending upon the contents and the change in contents of the video, the key frames are

extracted. As seen in the first video the no. of key frames is less; this is because the change of content

in this video was very less. In the third video example above, the change of content or the amount of

information in the video is more so more number of frames are extracted as key frames.

As the key frames need to be processed for annotation purpose, the important information must

not be missed. Our algorithm can be improved by further reducing the number of key frames extracted.

This can be done by adding one more pass. After the phase 1 the key frames extracted can again be

given as input to the algorithm. This will reduce the redundant frames or the frames which contain

similar contents, but adding one more pass will increase the execution time. As the frames need to be

analyzed after key frame extraction for the purpose of annotation, some amount of redundancy can be

considered rather than increasing the execution time.

In future, we can design a video annotation system which will utilize the key frames obtained

from the above algorithm.

REFERENCES

[1] G. Liu, and J. Zhao, “Key Frame Extraction from MPEG Video Stream ”, Proceedings of the

Second Symposium International Computer Science and Computational Technology (ISCSCT

’09) China, 26-28, Dec. 2009, pp. 007-011.

[2] C. F. Lam, M. C. Lee, “Video segmentation using color difference histogram,” Lecture Notes in

Computer Science, New York: Springer Press, pp. 159–174., 1998.

[3] A. Hampapur, R. Jain, and T. Weymouth, “Production model based digital video segmentation,”

Multimedia Tools Application, vol. 1, no. 1, pp.9–46, 1995.

[4] T. Liu, H. Zhang, and F. Qi, “A novel video key-frame-extraction algorithm based on perceived

motion energy model,” IEEE Transactions on Circuits and Systems. For Video Technology, vol.

13, no. 10, pp. 1006-1013, 2003.

[5] Q. Zhang and G. Liu, “A key-frame-based error resilient coding scheme for video transmission

over differentiated services networks,” In proceeding of: Packet Video 2007, 12-13 Nov. 2007 ,

pp. 85 – 90.

[6] P. Mundur, Y. Rao, Y. Yesha, “Keyframe-based Video Summarization using Delaunay

Clustering,” International Journal on Digital Libraries , Volume 6 Issue 2, April 2006

pp 219 - 232.

[7] K. Khurana, M. B. Chandak, “Study of Various Video Annotation Techniques,” International

Journal of Advanced Research in Computer and Communication Engineering Vol. 2, Issue 1,

January 2013.

[8] S. Thakare, “Intelligent Processing and Analysis of Image for shot Boundary Detection”,

International Journal of Engineering Research and Applications, Vol. 2, Issue 2, Mar-Apr 2012,

pp.366-369.

[9] J. Jeong, H. Hong, and D. Lee, “Ontology-based Automatic Video Annotation Technique In

Smart TV Environment”, IEEE Transaction on consumer Electronics, Vol. 57, No. 4, November

2011

[10] J. Calic and E. Izquierdo, “Efficient Key-frame Extraction And Video Analysis”, International

Symposium On Information Technology, April 2002,IEEE.

[11] D. Borth, A. Ulges, C. Schulze, T. M. Breuel, “Key frame Extraction for Video Tagging &

Summarization”, 2008.

[12] Reeja S R and Dr. N. P Kavya, “Motion Detection for Video Denoising – The State of Art And

The Challenges” International journal of Computer Engineering & Technology (IJCET), Volume

3, Issue 2, 2012, pp. 518 - 525, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.

Date post:	07-Nov-2014
Category:	Business
Upload:	iaeme
View:	880 times
Download:	1 times

Key frame extraction methodology for video annotation

Business