A Depth-Map Approach for Automatic Mice Behavior Recognition

A DEPTH-MAP APPROACH FOR AUTOMATIC MICE BEHAVIOR RECOGNITION

Joao P. Monteiro?, Student Member, IEEE , Helder P. Oliveira?, Member, IEEEPaulo Aguiar‡, Jaime S. Cardoso?†, Senior Member, IEEE

?INESC TEC, † Faculdade de Engenharia, Universidade do Porto‡ Instituto de Biologia Molecular e Celular and Centro de Matematica, Universidade do Porto

ABSTRACT

Animal behavior assessment plays an important role in ba-sic and clinical neuroscience. Although assessing the higherfunctional level of the nervous system is already possible, be-havioral tests are extremely complex to design and analyze.Animal’s responses are often evaluated manually, making itsubjective, extremely time consuming, poorly reproducibleand potentially fallible. The main goal of the present workis to evaluate the use of consumer depth cameras, such asthe Microsoft’s Kinect, for detection of behavioral patterns ofmice. The hypothesis is that the depth information, should en-able a more feasible and robust method for automatic behav-ior recognition. Thus, we introduce our depth-map based ap-proach comprising mouse segmentation, body-like per-framefeature extraction and per-frame classification given temporalcontext, to prove the usability of this methodology.

Index Terms— Depth sensors; Feature extraction; Ani-mal behavior classification.

1. INTRODUCTION

A great effort has been developed by disciplines such as neu-roscience and pharmacology, trying to understand the com-plex relationship between genes and behavior [1]. In thisdomain, animal experimentation remains a key instrument.Among animals used in research, mice can be recognized asone of the most important models [2]. They tend to be usedfor analysis of behavioral patterns of targeted and chemicallyinduced mutations, often evaluated manually by direct obser-vation or by analysis of video recordings.

The demand for automated methods of mice behavioranalysis arises primarily in order to solve problems relatednot only to time and cost, but also reproducibility. Ad-ditionally, the availability of such systems, introduces thepossibility to rethink behavior tests themselves. With this,the typical testing time scale can be easily extended, and

This work is financed by the ERDF European Regional DevelopmentFund through the COMPETE Programme (operational programme for com-petitiveness) and by National Funds through the FCT Fundacao para aCiencia e a Tecnologia (Portuguese Foundation for Science and Technology)within project�FCOMP-01-0124-FEDER-037281�

thus the diversify of behaviors and context to be analyzed.Automated analysis of mice behavior constitutes a challengedue to a large number of factors, including the huge variabil-ity of the behavior tests conditions, or the generic problemof behavior recognition itself. Existing vision-based meth-ods typically use standard video images and extract motionfeatures, adapting existing work for recognition of humanactions and motion. Besides commercial systems such asPhenoTyper [3] by Noldus, HomeCageScan [4] by CleverSysor SmartCube [5] by PsychoGenics, it is also worth mention-ing here an open source software made available in 2010 [6].It uses a machine learning algorithm for classifying everyframe of a video sequence. The mentioned work extendedprevious video-tracking based approaches (e.g. [7]), in orderto allow the analysis of finer animal activities such as groom-ing or rearing. It is, however, possible to find some limitationin the current release, such as, lack of characterization of so-cial behavior, restrictions to camera pose, lighting conditions,mice color and high computational cost.

Regarding advances in computer vision highlight the po-tential advantages of depth-map images over color images forobject recognition [8]. However, earlier depth sensors wereexpensive and difficult to use [9]. The task has been greatlysimplified by RGB-D cameras, such as Microsoft’s Kinect. Itsdepth sensor is based on an infrared laser projector combinedwith a monochrome CMOS sensor, which provides depth in-formation at a frame rate of up to 30 fps. The convenience ofsuch devices is not compromise-free. It is possible to recog-nize a significant presence of noise in its data, owing to someillumination changes, but mainly due to surface texture andorientation of the observed scene [10].

In this paper we introduce a general-purpose, automated,quantitative tool for mouse behavioral assessment, by theemployment of depth-map images. Accordingly, taking openfield test [11] and observation of mice within their homecage [12] scenarios into account, we present a morphology-based feature extraction approach, as a mouse body shapedescriptor, for motor function evaluation. Furthermore, fea-ture contribution to recognition of behaviors of interest isevaluated and is proposed the establishment of temporal con-text through feature-specific time window for observation.

978-1-4799-5751-4/14/$31.00 ©2014 IEEE ICIP 20142229

Fig. 1. Mice behavior analysis framework: acquisition of data from a depth sensor; per-frame feature extraction; supervisedbehavior classification and final report generation with both behavior recognition and direct feature representation.

2. MATERIALS AND METHODS

There are already several databases related to animal behavioranalysis [13], nonetheless, as much as possible to ascertain,none of which comprehends a depth-map video acquisition.Thereupon, baring our proposed framework summarized inFig. 1 in mind, we video recorded a singly housed mice, usinga Kinect properly calibrated [14]. Considering both open fieldtest arena and home cage housing, Kinect was placed perpen-dicularly to the apparatus floor. Two mice behaving differ-ently were used for these experiments. Each acquisition lastaround 6 minutes long, corresponding to a grand total of over1 hour of recording. A single observer manually annotatedthe videos. Each frame was labeled with a behavior of inter-est: walking (ambulation), resting (inactivity or nearly com-plete stillness), micro-movements (sweeping the forelimbs orhind limbs across the face or torso), rearing (upright postureand forelimbs off the ground), and other (residual categoryfor unspecified condition). The annotations were thereafterconfirmed by an expert.

2.1. Image Segmentation

Concurrently with variations of the protocol used for behavioranalysis regarding both considered scenarios, the floor of eachapparatus should be recognized as a dynamic background.Mainly due to the existence of varying depth bedding cover-ing the entire floor in the home cage, and casual introductionof pellets as well as presence of urine and/or faeces throughan open field test [15].

It was considered an adaptive background model (ABM)taking into account the fact that the mouse moves itselfthrough the scene. Given a frame sequence from a fixedcamera, the detection of all the foreground objects can beaccomplished as the difference between the current frameand an image of the scene’s background model.

For this purpose, the background model considers a pre-viously acquired empty view of the scene that is updatedwith a moving per-pixel median for each experiment run. As

a gold standard of global thresholding, Otsu’s method [16]was tested. Other used method was the Gaussian mixturebackground model (GMBM) [17], which takes into accountthe pixel’s recent history. It was also considered a localbackground model segmentation (LBM) approach introducedin [18]. The aforementioned method proposes that a densityimage is defined by transforming the depth information onthe XY plane to the XZ plane. In the new image, each columnis the histogram of the corresponding column in the originalimage. The threshold curve is then computed as the shortestpath from one margin to another of the density image, wherethe cost of each pixel is its frequency value.

2.2. Feature Extraction

Given the goal of measuring motor activity, baring the pre-viously mentioned behaviors of interest in mind, our depth-map approach proposes a global description of mouse’s bodyshape, hypothesizing that the temporal context of even arough body approximation would capture information dis-criminative enough for the desired behavior analysis. Thefollowing steps describe the method of extraction of the char-acteristics considered and illustrated in Fig. 2. The first stepis to fit an ellipse to the segmentation mask. Major andminor axis are extracted and considered to be the mouselength (h) and width (w), respectively. The ellipse cen-ter of mass, C(t), was used for the calculation of speed(V (t) = C(t) − C(t − 1)). The velocity is then projected tothe mice minor and major fitting ellipse axis (V (t)h, V (t)w).Finally the angle formed by mouse major axis and the cagefloor (θ) is computed. For that, the mouse point cloud is pro-jected into a plane along z-axis and parallel to mouse majoraxis.

Following the aforementioned methodology, it is pro-posed an approach to efficiently follow the mouse extremitiesthrough a video. The premise assumes that for a particu-lar video sequence, a mouse does not reverse its directionbetween consecutive frames, nor does it walk backwards ina stable and continuous way in time. In that way, one can

978-1-4799-5751-4/14/$31.00 ©2014 IEEE ICIP 20142230

Fig. 2. Illustration of the five per-frame features extracted in both depth-map image space (#1) and mouse point cloud lateralprojection (#2). The considered feature are: Vh - velocity component within mouse fitting ellipse major axis; Vw - velocitycomponent within mouse fitting ellipse minor axis; θ - angle of elevation of the mouse; h - mouse length; w - mouse width.

establish a correspondence between mouse extremities, bychecking for consistency of the angles measured between themouse major axis and the image horizontal axis (Fig. 3). Todetermine if the monitored angles corresponds to forwardor backward orientation, we compute the median principalvelocity component for an initial 5s of video.

Fig. 3. Example for determination of mouse orientation.

2.3. Behavior Classification

In this context we selected one of the most popular algo-rithm for classifying data of various types into prescribed cat-egories: decision trees [19]. This approach has the advantageof producing decision rules that can be easily interpreted bya human without compromising computational cost. Addi-tionally, it enables its integration into online system able tooperate on the monitored environment and control active el-ements (as doors or automatic dispensers) based on the clas-sification. Keeping the decision trees tool, it was pursued toestablish a temporal context to the features. Instead of a setof measures for each time instant (frame), or in a window asnarrow as the previous frame we studied the concatenation offeatures that comprise previous values promoting a notion ofweak trajectory and movement history. The optimal windowsize is selected by cross validation in the training set.

3. RESULTS

3.1. Segmentation

In order to test the different segmentation methods, we per-formed an additional manual annotation of mice spatial lo-calization in 41 depth-map images randomly selected from

different videos in different conditions of acquisition. Resultsfor depth based mouse segmentation are present in Table 1.The ABM methodology demonstrates to be the most robust.The LBM method showed issues dealing with complex back-grounds and uniformity of cost criteria for different situationsas open field and home cage arenas dictating its poor results.Since in our samples, the mice area was much smaller thanthe background, the presence of different objects at differentdepths (substrate materials covering the bottom of the cage)caused wrong classifications by Otsu’s method. Occasionaloccurrences of long periods of immobility lead to the failuresegmentation by GMBM method. The LBM method showedissues dealing with complex backgrounds and uniformity ofcost criteria for different situations as open field and homecage. Overall, the depth data sensed at the border of micewas very noisy. This could be caused not only due to thesmall size of our object of interest, but also due to the influ-ence of phenomenons such as specularity, or reflection thatundermine the projected dot pattern in which Kinect dependson to construct depth-maps.

In order to address this issue, assumptions were estab-lished over mouse velocity and length/width ratio. Additionalannotation work was performed for a representative video, inorder to establish maximum velocity, and minimum mouselength/width ratio. This enabled labeling frames as uncertainif the established assumptions were not verified. The segmen-tation method ABM combined with the feature verificationmetrics was able to keep tracking of the mice in the testedvideos (around one hour of recording). It presented a hit rateof 100% (Number of times the center of mass of segmenta-tion mask was successfully detected inside a manually anno-tated bounding box referred to the total number of frames).

Table 1. Results for depth based mouse segmentation for the41 depth-map images manually segment (in % pixels).

GMM Otsu LBM ABMTrue Positive Rate 0.30 1.00 0.56 0.68False Positive Rate 0.00 0.95 0.02 0.00

978-1-4799-5751-4/14/$31.00 ©2014 IEEE ICIP 20142231

Table 2. Confusion matrices for best decision trees with and without temporal context. Both tables present the results for thebest combination of two videos for training and testing for the remaining against manual annotation.

(a) Without temporal contextXXXXXXXXXXTrue

Predicted Walking Rearing Resting Other Micro

Walking 0.773 0.044 0.070 0.074 0.039Rearing 0.177 0.632 0.051 0.085 0.054Resting 0.068 0.001 0.824 0.082 0.025Other 0.260 0.049 0.376 0.194 0.121Micro 0.309 0.081 0.179 0.086 0.345

(b) With temporal contextXXXXXXXXXXTrue

Predicted Walking Rearing Resting Other Micro

Walking 0.809 0.030 0.035 0.086 0.041Rearing 0.159 0.685 0.009 0.097 0.050Resting 0.050 0.000 0.794 0.083 0.073Other 0.252 0.048 0.316 0.210 0.174Micro 0.291 0.074 0.097 0.151 0.387

Nonetheless the segmentation was found loosely incomplete.Especially in the home cage scenario which videos presentedan average of 19% of its area with unresolved pixels against8% in the open field apparatus.

3.2. Classification

Given the scope of this work, looking to explore the field oflearning and recognition of behavior itself, and due the afore-mentioned noisy nature of the depth data, we chose to here-upon focus attention in the situation of open field apparatus.The following results consider the features extracted from 4videos (∼6 min each) relying on the segmentation approachthat scored best in the study on segmentation techniques fordepth-map images.

We studied the best combination of features by trainingclassifiers considering all possible sub-set of features. Per-formance was estimated using a cross validation procedure,whereby 2 videos were used to train the system, and perfor-mance was evaluated on the remaining videos, against manualannotations (Table 3). All features seem to improve the over-all classification and thus considered to contribute positivelyto the behavior recognition task.

The evaluation of the use of different time windows foreach of the features was performed by simple adding a pre-vious occurrence of each feature and recomputing a decisiontree. It were considered from 1 up to 75 previous instances.It used the same evaluation as described above for the studyof feature selection. The evaluation was performed for eachfeature individually. Using the criteria of the minimum clas-sification error, it were established the windows of 23, 7, 5,52 and 1 for the features Vh, Vw, θ, h, w , respectively. In Ta-ble 2(a) and Table 2(b) are presented the confusion matricesfor best decision tree with five features against manual anno-tation without and with temporal context. Both present theresults for the best combination of two videos for training and

Table 3. Best feature combinations: misclassification errorusing decision trees.

Nr. of features 1 2 3 4 5Error 0.448 0.358 0.352 0.350 0.346

Features Vh Vh,θ Vh,θ,h Vh,θ,h,Vw Vh,θ,h,Vw ,w

testing for other videos.Although with the dataset introduced in this work, no fare

comparison with existing systems can be established, someremarks can be achieved. The previously mentioned work [6]achieved a 77.3% accuracy across frames for a set of 1.6h ofside-view video of a home cage apparatus. Such system con-siders 8 behaviors (drink, eat, groom, hang, micro movement,rear, rest, walk), given manual annotation of water source,feeder and cage walls as well as verification of mouse andbackground color constraints.

The best performance results of our trained depth-mapbased system, with a cross validation on 4 videos (corre-sponding to about 25 min) for the recognition of 4 behaviors(excluding other from the total 5) was 66.9%. We wantto draw special attention to the results for the for micro-movement behavior that greatly undermines our system’scross behavior accuracy. With all due caveat, if we paidattention only to the remaining behaviors (walking, resting,rearing) our cross behavior accuracy would be 76.3% against73.0% of Jhuang et al. [6] method.

4. CONCLUSIONS

We present a method for recognition of behavior for a singlyhoused mouse capable of identifying walking, resting, rearingand micro-movement occurrences. It relies on the informa-tion acquired by a Kinect device from which are extractedspatial temporal features which are passed to a previouslytrained classifier. Its 67% of cross behavior classificationaccuracy obtained with depth-map videos from an open fieldapparatus laid the typically noisy and incomplete nature ofKinect data bare. Such impact can, however, be mainly no-ticed for small scale, high-speed model behaviors as micro-movement occurrences. For remaining macro movements(walking, resting, rearing), our depth-maps approach provedto be relevant, offering a highly interpretable and online-ready solution for mice behavioral analysis. Such results letus to consider that joint depth and color image processing cancontribute to overall improvement of the behavior recognitionsystem. Namely, with extraction of information from colordata regarding small detail elements such as ears, tip of noseand claws.

978-1-4799-5751-4/14/$31.00 ©2014 IEEE ICIP 20142232

5. REFERENCES

[1] Eric S Lander, “Initial impact of the sequencing of thehuman genome,” Nature, vol. 470, no. 7333, pp. 187–197, 2011.

[2] John F Cryan and Andrew Holmes, “The ascent ofmouse: advances in modelling human depression andanxiety,” Nature reviews Drug discovery, vol. 4, no. 9,pp. 775–790, 2005.

[3] Noldus Information Technology, PhenoTyper, (last ac-cessed June 4, 2014), http://www.noldus.com/animal-behavior-research/products/phenotyper.

[4] CleverSys Inc, HomeCageScan, (last accessed June4, 2014), http://www.cleversysinc.com/products/software/homecagescan.

[5] PsychoGenics, SmartCube, (last accessed June4, 2014), http://www.psychogenics.com/smartcube.html.

[6] Hueihan Jhuang, Estibaliz Garrote, Xinlin Yu, VinitaKhilnani, Tomaso Poggio, Andrew D Steele, andThomas Serre, “Automated home-cage behavioural phe-notyping of mice,” Nature communications, vol. 1, pp.68, 2010.

[7] Paulo Aguiar, Luıs Mendonca, and Vasco Galhardo,“Opencontrol: a free opensource software for videotracking and automated control of behavioral mazes,”Journal of neuroscience methods, vol. 166, no. 1, pp.66–72, 2007.

[8] Daniel Weinland, Remi Ronfard, and Edmond Boyer,“A survey of vision-based methods for action represen-tation, segmentation and recognition,” Computer Visionand Image Understanding, vol. 115, no. 2, pp. 224–241,2011.

[9] Elena Stoykova, A Aydin Alatan, Philip Benzie, Niko-laos Grammalidis, Sotiris Malassiotis, Joern Ostermann,Sergej Piekh, Ventseslav Sainov, Christian Theobalt,Thangavel Thevar, et al., “3-d time-varying scene cap-ture technologiesa survey,” IEEE Transactions on Cir-cuits and Systems for Video Technology, vol. 17, no. 11,pp. 1568–1586, 2007.

[10] Cha Zhang and Zhengyou Zhang, “Calibration betweendepth and color sensors for commodity depth cameras,”in IEEE International Conference on Multimedia andExpo. IEEE, 2011, pp. 1–6.

[11] S. C. Stanford, “The open field test: reinventing thewheel,” Journal of Psychopharmacology, vol. 21, no. 2,pp. 134–135, 2007.

[12] L De Visser, R Van Den Bos, WW Kuurman, MJH Kas,and BM Spruijt, “Novel approach to the behaviouralcharacterization of inbred mice: automated home cageobservations,” Genes, Brain and Behavior, vol. 5, no. 6,pp. 458–466, 2006.

[13] Xavier P Burgos-Artizzu, Piotr Dollar, Dayu Lin,David J Anderson, and Pietro Perona, “Social behav-ior recognition in continuous video,” in IEEE Con-ference on Computer Vision and Pattern Recognition.IEEE, 2012, pp. 1322–1329.

[14] C Herrera, Juho Kannala, Janne Heikkila, et al., “Jointdepth and color camera calibration with distortion cor-rection,” IEEE Transactions on Pattern Analysis andMachine Intelligence, vol. 34, no. 10, pp. 2058–2064,2012.

[15] A. Fawcett, “Animal Research ReviewPanel Guideline 22: Guidelines for theHousing of Mice in Scientific Institutions,”http://www.animalethics.org.au/ data/assets/pdf file/0004/249898/Guideline-22-mouse-housing.pdf, 2012.

[16] Nobuyuki Otsu, “A threshold selection method fromgray-level histograms,” Automatica, vol. 11, no. 285-296, pp. 23–27, 1975.

[17] Chris Stauffer and W Eric L Grimson, “Adaptive back-ground mixture models for real-time tracking,” in IEEEComputer Society Conference on Computer Vision andPattern Recognition. IEEE, 1999, vol. 2.

[18] Helder P Oliveira, Jaime S Cardoso, Andre T Mag-alhaes, and Maria J Cardoso, “A 3d low-cost solutionfor the aesthetic evaluation of breast cancer conserva-tive treatment,” Computer Methods in Biomechanicsand Biomedical Engineering: Imaging & Visualization,pp. 1–17, 2013.

[19] Christopher M Bishop et al., Pattern recognition andmachine learning, vol. 1, springer New York, 2006.

978-1-4799-5751-4/14/$31.00 ©2014 IEEE ICIP 20142233

Date post:	28-Jan-2022
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

A Depth-Map Approach for Automatic Mice Behavior Recognition

Documents