Cluster

Spatio-temporal local feature clusters

Iveel

Intro

• Overview • Video as a cloud of feature points• Clusters of feature points• Video representation • Classification• Decision making• Result

Overview

• In Bag-of-Features (BOF) representation, the spatio-temporal configuration of video is ignored

• Proposed approach is to integrate spatio-temporal structure in video representation.– Local features are grouped ( refered as cluster ) based on

their spatio-temporal proximity– Each group , or cluster, will be independently represented

as BOF, (refered as cluster-level BOF).

• It will allow to localize the action in the video segment.

Video

• A video segment can be viewed as a cloud of local features in 3D space (x,y,t) .

Local feature grouping

• Intuition: Closely localized features ( in spatio-temporal domain) are more likely to be correspond to a same object, and far ones are more unlikely.

• In order to exploit this idea, a tree cluster is used to group local features based on their spatio-temporal proximity.

In this example, local feature points grouped into two clusters ( red & blue )

Cluster-level BOF

• Once local features are grouped as a cluster, each cluster is represented using BOF approach ( will be referred as cluster-level BOF) . – A frequency histogram will be generated over local

descriptors which belong to a particular cluster.

Training & Learning

• At each scale, a SVM classifier is trained with cluster-level BOF.

Experimental study

• Action segments from TRECVID SED is used for training & testing.– 7 action class: CellToEar, Embrace, ObjectPut,

PeopleMeet, PeopleSplitUp, PersonRuns, Pointing.• Training : 210 video segments in total– 30 videos segments per action class

• Testing: 138 video segments in total– approx.20 video segments per action class

Experimental study

• The spatio-temporal bounding box is manually drawn for both test & training set segments.

Experiment 1- Cluster number vs performace

• The optimal number of cluster is studied. – In the experiment, 6 different cluster number are chosen:

1,2,4,8,16 and 32. – For example: If the cluster number is 16, then it means

that the video segment is divided into 16 sub-regions (cluster) and each has its own BOF histogram ( cluster-BOF) . Based on the bounding box information, the cluster-BOF is annotated.

Experiment 1- Cluster number vs performace : CellToEar

Experiment 1- Cluster number vs performace : Embrace

Experiment 1- Cluster number vs performace : ObjecPut

Experiment 1- Cluster number vs performace : PeopleMeet

Experiment 1- Cluster number vs performace : PeopleSplitUp

Experiment 1- Cluster number vs performace : PersonRuns

Experiment 1- Cluster number vs performace : Pointing

Conclusion

• The results is based on cluster-level BOF.• To give segment-based result, the proper

aggregation of cluster-BOFs, belong to same video-segment, is required. – The naïve approach is to assign an action class,

that has a highest vote from clusters, to its parent segment.

Date post:	21-May-2015
Category:	Technology
Upload:	dcu
View:	145 times
Download:	2 times

Cluster

Technology