Date post: | 21-May-2015 |
Category: |
Technology |
Upload: | dcu |
View: | 145 times |
Download: | 2 times |
Spatio-temporal local feature clusters
Iveel
Intro
• Overview • Video as a cloud of feature points• Clusters of feature points• Video representation • Classification• Decision making• Result
Overview
• In Bag-of-Features (BOF) representation, the spatio-temporal configuration of video is ignored
• Proposed approach is to integrate spatio-temporal structure in video representation.– Local features are grouped ( refered as cluster ) based on
their spatio-temporal proximity– Each group , or cluster, will be independently represented
as BOF, (refered as cluster-level BOF).
• It will allow to localize the action in the video segment.
Video
• A video segment can be viewed as a cloud of local features in 3D space (x,y,t) .
Local feature grouping
• Intuition: Closely localized features ( in spatio-temporal domain) are more likely to be correspond to a same object, and far ones are more unlikely.
• In order to exploit this idea, a tree cluster is used to group local features based on their spatio-temporal proximity.
In this example, local feature points grouped into two clusters ( red & blue )
Cluster-level BOF
• Once local features are grouped as a cluster, each cluster is represented using BOF approach ( will be referred as cluster-level BOF) . – A frequency histogram will be generated over local
descriptors which belong to a particular cluster.
Training & Learning
• At each scale, a SVM classifier is trained with cluster-level BOF.
Experimental study
• Action segments from TRECVID SED is used for training & testing.– 7 action class: CellToEar, Embrace, ObjectPut,
PeopleMeet, PeopleSplitUp, PersonRuns, Pointing.• Training : 210 video segments in total– 30 videos segments per action class
• Testing: 138 video segments in total– approx.20 video segments per action class
Experimental study
• The spatio-temporal bounding box is manually drawn for both test & training set segments.
Experiment 1- Cluster number vs performace
• The optimal number of cluster is studied. – In the experiment, 6 different cluster number are chosen:
1,2,4,8,16 and 32. – For example: If the cluster number is 16, then it means
that the video segment is divided into 16 sub-regions (cluster) and each has its own BOF histogram ( cluster-BOF) . Based on the bounding box information, the cluster-BOF is annotated.
Experiment 1- Cluster number vs performace : CellToEar
Experiment 1- Cluster number vs performace : Embrace
Experiment 1- Cluster number vs performace : ObjecPut
Experiment 1- Cluster number vs performace : PeopleMeet
Experiment 1- Cluster number vs performace : PeopleSplitUp
Experiment 1- Cluster number vs performace : PersonRuns
Experiment 1- Cluster number vs performace : Pointing
Conclusion
• The results is based on cluster-level BOF.• To give segment-based result, the proper
aggregation of cluster-BOFs, belong to same video-segment, is required. – The naïve approach is to assign an action class,
that has a highest vote from clusters, to its parent segment.