http://www.iaeme.com/IJMET/index.asp 368 [email protected]
International Journal of Mechanical Engineering and Technology (IJMET) Volume 8, Issue 11, November 2017, pp. 368–375, Article ID: IJMET_08_11_041
Available online at http://www.iaeme.com/IJMET/issues.asp?JType=IJMET&VType=8&IType=11
ISSN Print: 0976-6340 and ISSN Online: 0976-6359
© IAEME Publication Scopus Indexed
ACTION RECONGNITION IN VIDEO
SURVILLANCE USING HIPI AND MAP
REDUCING MODEL
Ushapreethi P
School of Information Technology and Engineering, VIT University, Vellore, India
Balajee Jeyakumar
School of Information Technology and Engineering, VIT University, Vellore, India
BalaKrishnan P
School of Computing Sciences and Engineering, VIT University, Vellore, India
ABSTRACT
Action recognition in videos is possible using edge detection techniques such as
Cany edge detection algorithm and Sobel algorithm. The surveillance videos are the
major complex criterion for analyzing and produce the valuable results such as face
recognition and action recognition. But analysis of such videos is necessary for the
real-time applications like safety and security systems. To increase the scalability of
the edge detection based action recognition algorithm, the features of the video is
grouped together using Hadoop Image Processing Interface (HIPI). MapReduce
algorithm is proposed to parallelize the action recognition algorithm to achieve the
best results in large scale videos in minimum amount of time. The video has to be
converted to image and group of images and assembled as HIPI image bundle for
analyzing the video effectively.
Keywords: Action recognition, edge detection, Hadoop Image processing interface,
MapReduce, Image Bundle
Cite this Article: Ushapreethi P, Balajee Jeyakumar and BalaKrishnan P, Action
Recongnition in Video Survillance Using Hipi and Map Reducing Model,
International Journal of Mechanical Engineering and Technology 8(11), 2017,
pp. 368–375.
http://www.iaeme.com/IJMET/issues.asp?JType=IJMET&VType=8&IType=11
Action Recongnition in Video Survillance Using Hipi and Map Reducing Model
http://www.iaeme.com/IJMET/index.asp 369 [email protected]
1. INTRODUCTION
In recent years, large amount of surveillance video data has been accumulating. When
processing this mass amount of data standalone computers are facing many bottlenecks such
as computational power, less storage and poor efficiency obviously distributed systems are
used for processing such tasks yes very much effective in recent scenario the Hadoop
mapreduce framework is the suitable platform for distributed processing. Video can be
reduced into multiple numbers of frames for analysis. The group of images can be analyzed
using the Hadoop Image Processing Interface. This interface provides the tools for computing
the basic information of group of images such as average pixel rate, spatial dimensions and
several other image metadata. HIPI framework hides most of the technical details from the
user and it allows the naive user for distributed environment to operate on Hadoop image
processing applications. Most of the cumbersome learning curve is reduced by this HIPI
framework.
Edge detection algorithms are mostly useful for identifying the actions in the images [1].
Canny edge detection algorithm is considered for the edge detection in this work. The
difference between adjacent images is calculated for identifying the action. Action recognition
consists of four phases namely, feature extraction, codebook generation, feature encoding and
optimization techniques [2]. Initially, the video is divided into video shots frames. Each video
shot consists of a single key frame. The local features are extracted from the key frames in the
first phase, and the extracted features are known as the bases or code words. A code book or
dictionary is generated using set of code words in the second phase, namely codebook
generation or dictionary creation. Clustering techniques such as K-Means [3] algorithm is
used in this phase. These dictionaries represent the visual descriptors. Each descriptor is
capable of activating number of code words and generating a coding vector using a coding
technique. This phase is called feature encoding and the length of the coding vector is equal to
the number of code words. Several encoding techniques are used such as vector quantization
[4] soft coding [5], sparse coding [6] and so on. The last phase is optimization or pooling,
creates the compact signature or feature vector for a specific sample given. Max pooling [7, 8,
18] is the common technique usually used; This paper exhibits the work as the following
modules.
• Converting video into frames
• Image Analysis Using HIPI (Mapreduce)
• Edge detection based action recognition
Figure 1 shows the overall idea of the proposed work. The advantage of this work is
parallel processing. The HIPI framework makes the interface very easy and supports
distributed process until step 6. Each module of the work is discussed in the forthcoming
chapters.
Figure 1 Overview of the proposed work
Ushapreethi P, Balajee Jeyakumar and BalaKrishnan P
http://www.iaeme.com/IJMET/index.asp 370 [email protected]
2. RELATED WORK
With the rapid increase in the usage of social media and surveillance videos the increase in
multimedia data is also growing exponentially and the analysis methods needs are also very
high in the present scenario. Most of the researchers are concentrating on analysis of such
data using various technologies along with distributed processing. White et.al [9] presents a
clustering based classifier using Distributed processing. The work is mainly moves through
pre-processing of image and its exhibits the object recognition as a result. Pereira et.al [10]
discussed the inconveniences of Hadoop processing on image based video analytics and they
suggested some cloud based technologies. Lv et.al [11] describes some naïve classification
algorithms in Hadoop environment and they used the satellite images for color based analysis.
Human activity is categorized based on the levels of complexity. Conceptually the four
levels identified by [12] are gestures, actions, interactions and group activities. A gesture is a
body movement intended to express some meaning. It can be communicated with the hands,
arms or body. Examples of gestures includes head movement for expressing yes and no, eye
movements such as winking, exclaiming, or rolling eyes. Action is a goal-directed motion
sequence such as picking up a stone from the ground, golf swing etc. Actions are lengthy
compared with gestures and actions have clear starting and end points. Both gestures and
actions are expressed by the single subject. Most of the research works are contributed for
gesture recognition and identified the curve fit for the benchmarks. Now the research
community has moved to the next level called action recognition. And several surveys have
been done on gesture recognition [13]. More than one subject is involved in interactions and
group activities and level is also high to achieve. So the point of interest of this paper is on
action recognition. Human activity recognition approaches are hierarchically represented in
figure 2.
Figure 2 Human activity recognition approaches
3. METHODOLOGY
3.1. DIVIDING VIDEOS INTO FRAMES
Hadoop supports several programming languages such as java, python and etc. The basic
Mapreduce programs of this work are written using java language and the Java Virtual
Machine (JVM) supports the execution. Similarly the video frames are converted into frames
(images) using java. The accumulated images are then grouped as image bunldles using HIPI
framework and the acquired edge detection parameters are passed to the edge detection
algorithm.
Action Recongnition in Video Survillance Using Hipi and Map Reducing Model
http://www.iaeme.com/IJMET/index.asp 371 [email protected]
3.2. CANY EDGE DETECTOR
Canny edge detector is a Gaussian filter based edge detector. The Gaussian filter is used to
remove the noises in the image. The intensity gradients of the images are used in Canny edge
detection. Canny edge detection algorithm is not depend on any context specific meta data, so
that it is very effective among other edge detection methods. And this its detection mechanism
is useful for extracting the structural information of an image, which yields the action
recognition from the keyframes. Fig. 3 shows the result of cany edge detection algorithm
applied to the MRI image (left) and an image with the edges highlighted (right)
Figure 3 Result of Cany edge detection algorithm applied to the MRI image (left) and an image with
the edges highlighted (right)
3.3. HADOOP IMAGE PROCESSING INTERFACE (HIPI)
HIPI is an image processing library designed to be used with the Apache Hadoop MapReduce
parallel programming framework. HIPI facilitates efficient and high-throughput image
processing with MapReduce style parallel programs typically executed on a cluster. It
provides a solution for how to store a large collection of images on the Hadoop Distributed
File System (HDFS) and make them available for efficient distributed processing. HIPI also
provides integration with OpenCV, a popular open-source library that contains many
computer vision algorithms [14, 15].
The HIPI distribution includes several useful tools for creating HIBs, including a
MapReduce program that builds a HIB from a list of images downloaded from the Internet.
The first processing stage of a HIPI program is a culling step that allows filtering the images
in a HIB based on a variety of user-defined conditions like spatial resolution or criteria related
to the image metadata. The records emitted by the Mapper are collected and transmitted to the
Reducer according to the built-in MapReduce shuffle algorithm that attemps to minimize
network traffic. Finally, the user-defined reduce tasks are executed in parallel and their output
is aggregated and written to the HDFS. Fig. 4 shows the conversion of video frames into
image bundle. The mapreduce tasks are clearly mentioned in figure 1.
Ushapreethi P, Balajee Jeyakumar and BalaKrishnan P
http://www.iaeme.com/IJMET/index.asp 372 [email protected]
Figure 4 Conversion of video frames into image bundle.
3.4. Action Recognition Using Hadoop
In conversion of video into frames, jcodec is an open source library for video codecs and
formats are carrying out in java. There are different tools for the digital transcoding of the
video data into frames such as JCodec, Xuggler. Putting frames in HDFS using the command
put is not able to perform [16]. To store the images or frames into the HDFS, convert the
frames in stream of bytes and then store it HDFS. Hadoop make available us the ability to
read / write binary files. So, almost it can be altered into bytes can be stored in HDFS.
After transcoding the images, all are combine into a single large files that can be easily
managed and analyzed. The bunch of images is stored in the HIPI image bundle, each mapper
generates the HIPI bundle and reducer will merge all these bundles into single large bundle.
Mapreduce jobs run on these image bundles for image analysis [17,19,20]. Hadoop
Action Recongnition in Video Survillance Using Hipi and Map Reducing Model
http://www.iaeme.com/IJMET/index.asp 373 [email protected]
Mapreduce parallel programming framework helps to carry out on large number of images.
HIPI performs well and high-throughput image processing with MapReduce style parallel
programs on a cluster.
4. EXPERIMENTAL SETUP
The experimental setup is done with 4–6 1TB hard disks in a JBOD configuration (1 for the
OS, 2 for the FS image [RAID 1], 1 for Apache Zoo Keeper, and 1 for Journal node),2 quad-
/hex-/octo-core CPUs, running at least 2-2.5GHz, 64-128GB of RAM and Bonded Gigabit
Ethernet connected to the machines. Hadoop Image Processing Interface and gradle (java
compiler) are installed. Image Bundles are created for image processing in hadoop using
HIPI. Mapreduce programming model for image analysis is implemented. Average pixel
values and the edge information of sample images are calculated using the map reduce model.
At the end the actions are recognized based on edge detection.
4.1. Steps for Action Recognition
Figure 6 shows the steps for converting the sample images in fig. 5 to HIPI Image Bundle
(HIB) after installation of gradle (java enabler for Hadoop and HIPI) and Hadoop. The meta
data of the sample images are shown in fig. 7. Fig. 8 shows the edge detection based action
recognition.
Figure 5 Sample Images
Figure 6 Converting images to HIB (HIPI Image Bundle)
Ushapreethi P, Balajee Jeyakumar and BalaKrishnan P
http://www.iaeme.com/IJMET/index.asp 374 [email protected]
Figure 7 The metadata information of the images in the image bundle
Figure 8 Action Recognition by edge detection – sample images
5. CONCLUSIONS
In order to achieve the efficient action recognition for large scale video data, a MapReduce
based parallel algorithm, is proposed. Action recognition using mapreduce model is proposed.
Image analysis is the key concept of action recognition. . The conversion from video to
images is done and the images are analysed and the average pixel values and the edges of the
sample images are identified using HIPI. The other action recognition is identified using the
edge detection. The framework with other image analysis techniques is the future work of this
paper.
REFERENCES
[1] Ushapreethi, P. and Lakshmipriya, G.G., 2017. Survey on Video Big Data: Analysis
Methods and Applications. International Journal of Applied Engineering Research,
12(10), pp.2221-2231.
[2] Mohammad AB, Qigang G, Sergio E, Thomas BM, Huamin R, Elham E, (2017) Locality
regularized group sparse coding for action recognition Computer Vision and Image
Understanding 158: 106–114
[3] Lloyd S.P, 1982. Least squares quantization in pcm. IEEE Trans. Inf. Theory 28 (2), 129-
137.
[4] Sivic, J, Zisserman, A, 2003. Video google: A text retrieval approach to object matching
in videos. In: IEEE International Conference on Computer Vision, pp. 1470–1477.
[5] Liu, L, Wang, L, Liu, X, 2011. In defense of soft-assignment coding. In: IEEE Inter-
national Conference on Computer Vision, pp. 24 86–24 93.
[6] Van Gemert, J.C, Veenman, C.J, Smeulders, A.W, Geusebroek, J.-M. , 2010. Visual word
ambiguity. IEEE Trans. Pattern Anal. Mach. Intell. 32 (7), 1271–1283.
Action Recongnition in Video Survillance Using Hipi and Map Reducing Model
http://www.iaeme.com/IJMET/index.asp 375 [email protected]
[7] Yang, J, Yu, K, Gong, Y, Huang, T, 2009. Linear spatial pyramid matching using sparse
coding for image classification. In: IEEE Conference on Computer Vision and Pattern
Recognition, pp. 1794–1801.
[8] Yao T., Wang z., Xie z., Gao J., Feng DD., Learning universal multiview dictionary for
human action recognition Pattern Recognition., 64 (2017) 236–244
[9] B. White, T. Yeh, J. Lin, and L. Davis, Web-scale computer vision using map reduce for
multimedia data mining, in Proceedings of the Tenth International Workshop on
Multimedia Data Mining, ser. MDMKDD ’10. New York, NY, USA: ACM, 2010, pp.
9:1–9:10.
[10] R. Pereira, M. Azambuja, K. Breitman, and M. Endler, An architecture for distributed
high performance video processing in the cloud, in Cloud Computing (CLOUD), 2010
IEEE 3rd International Conference on, July 2010, pp. 482–489.
[11] Z. Lv, Y. Hu, H. Zhong, J. Wu, B. Li, and H. Zhao, Parallel k-means clustering of remote
sensing images based on mapreduce, in Proceedings of the 2010 International Conference
on Web Information Systems and Mining, ser. WISM’10. Berlin, Heidelberg: Springer-
Verlag, 2010, pp. 162–170.
[12] Aggarwal JK, Ryoo MS (2011) Human activity analysis: a review. ACM Comput Surv
43:1–43
[13] M. A. Ranzato, F.-J. Huang, Y. Boureau, and Y. LeCun. Unsupervised Learning of
Invariant Feature Hierarchies with Applications to Object Recognition. In CVPR, 2007.
[14] Ding, S., Li, G., Li, Y., Li, X., Zhai, Q., Champion, A. C. & Zheng, Y. F. (2017).
Survsurf: human retrieval on large surveillance video data. Multimedia Tools and
Applications, 76(5), 6521-6549.
[15] Xu, Z., Mei, L., Hu, C., & Liu, Y. (2016). The big data analytics and applications of the
surveillance system using video structured description technology. Cluster Computing, 19(3), 1283-1292.
[16] Verma, B., Zhang, L., & Stockwell, D. (2017). Roadside Video Data Analysis
Framework. In Roadside Video Data Analysis (pp. 13-39). Springer Singapore.
[17] Wang, K., Mi, J., Xu, C., Shu, L., & Deng, D. J. (2016, July). Real-time big data analytics
for multimedia transmission and storage. In Communications in China (ICCC), 2016
IEEE/CIC International Conference on (pp. 1-6). IEEE.
[18] Jeyakumar, B., Durai, M. S., & Lopez, D. (2018). Case Studies in Amalgamation of Deep
Learning and Big Data. In HCI Challenges and Privacy Preservation in Big Data Security
(pp. 159-174). IGI Global.
[19] Srinivasa Raghava S Janarthanan Y, Balajee J.M. (2016). Content based video retrieval
and analysis using image processing: A review. International Journal of Pharmacy and
Technology 8(4) (pp. 5042-5048).
[20] Kamalakannan, S. (2015). G., Balajee, J., Srinivasa Raghavan.,Superior content-based
video retrieval system according to query image”. International Journal of Applied
Engineering Research, 10(3) 7951-7957.
[21] Reena Jangra and Abhishek Bhatnagar. Comparison Analysis of Sensitivity of Noise B/W
Various Edge Detection Technique by Estimating Their PSNR Value. International
Journal of Computer Engineering and Technology, 6(10), 2015, pp. 01-12.
[22] Bhupendra Fataniya, Mekhala Kar, Grishma Joshi, Dr. Tanish Zaveri and Dr. Sanjeev
Acharya. Edge Detection of Microscopic Image, International Journal of Electronics and
Communication Engineering & Technology, 7(3), 2016, pp. 01–10
[23] Ms. Sonali Meghare & Roshani Talmale, Developing and Comparing an Encoding System
Using Vector Quantization & Edge Detection, International Journal of Computer
Engineering & Technology (IJCET), Volume 4, Issue 3, May-June (2013), pp. 503-511