+ All Categories
Home > Documents > UCF101: A Dataset of 101 Human Actions Classes … · Pizza Tossing,Playing Guitar,Playing...

UCF101: A Dataset of 101 Human Actions Classes … · Pizza Tossing,Playing Guitar,Playing...

Date post: 29-Aug-2018
Category:
Upload: hoangphuc
View: 234 times
Download: 0 times
Share this document with a friend
7
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild Khurram Soomro, Amir Roshan Zamir and Mubarak Shah CRCV-TR-12-01 November 2012 Keywords: Action Dataset, UCF101, UCF50, Action Recognition Center for Research in Computer Vision University of Central Florida 4000 Central Florida Blvd. Orlando, FL 32816-2365 USA arXiv:1212.0402v1 [cs.CV] 3 Dec 2012
Transcript
Page 1: UCF101: A Dataset of 101 Human Actions Classes … · Pizza Tossing,Playing Guitar,Playing Piano,Playing ... Band Marching Haircut Head Massage Military Parade Salsa Spin Playing

UCF101: A Dataset of 101 Human ActionsClasses From Videos in The Wild

Khurram Soomro, Amir Roshan Zamir and Mubarak Shah

CRCV-TR-12-01

November 2012

Keywords: Action Dataset, UCF101, UCF50, Action Recognition

Center for Research in Computer Vision

University of Central Florida

4000 Central Florida Blvd.

Orlando, FL 32816-2365 USA

arX

iv:1

212.

0402

v1 [

cs.C

V]

3 D

ec 2

012

Page 2: UCF101: A Dataset of 101 Human Actions Classes … · Pizza Tossing,Playing Guitar,Playing Piano,Playing ... Band Marching Haircut Head Massage Military Parade Salsa Spin Playing

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

Khurram Soomro, Amir Roshan Zamir and Mubarak ShahCenter for Research in Computer Vision, Orlando, FL 32816, USA

{ksoomro, aroshan, shah}@cs.ucf.eduhttp://crcv.ucf.edu/data/UCF101.php

Abstract

We introduce UCF101 which is currently the largestdataset of human actions. It consists of 101 action classes,over 13k clips and 27 hours of video data. The databaseconsists of realistic user-uploaded videos containing cam-era motion and cluttered background. Additionally, we pro-vide baseline action recognition results on this new datasetusing standard bag of words approach with overall perfor-mance of 44.5%. To the best of our knowledge, UCF101is currently the most challenging dataset of actions due toits large number of classes, large number of clips and alsounconstrained nature of such clips.

1. IntroductionThe majority of existing action recognition datasets suf-

fer from two disadvantages: 1) The number of their classesis typically very low compared to the richness of performedactions by humans in reality, e.g. KTH [11], Weizmann [3],UCF Sports [10], IXMAS [12] datasets includes only 6, 9,9, 11 classes respectively. 2) The videos are recorded in un-realistically controlled environments. For instance, KTH,Weizmann, IXMAS are staged by actors; HOHA [7] andUCF Sports are composed of movie clips captured by pro-fessional filming crew. Recently, web videos have beenused in order to utilize unconstrained user-uploaded data toalleviate the second issue [6, 8, 9, 5]. However, the first dis-advantage remains unresolved as the largest existing datasetdoes not include more than 51 actions while several worksshowed that the number of classes play a crucial role in eval-uating an action recognition method [4, 9]. Therefore, wehave compiled a new dataset with 101 actions and 13320clips which is nearly twice bigger than the largest existingdataset in terms of number of actions and clips. (HMDB51[5] and UCF50 [9] are the currently the largest ones with6766 clips of 51 actions and 6681 clips of 50 actions re-spectively.)

The dataset is composed of web videos which arerecorded in unconstrained environments and typically in-

Apply Eye Makeup Baby Crawling

Haircut

Playing Dhol

Sky Diving Surfing

Shaving Beard Cricket Shot Rafting

Figure 1. Sample frames for 6 action classes of UCF101.

clude camera motion, various lighting conditions, partialocclusion, low quality frames, etc. Fig. 1 shows sampleframes of 6 action classes from UCF101.

2. Dataset Details

Action Classes: UCF101 includes total number of101 action classes which we have divided into five types:Human-Object Interaction, Body-Motion Only, Human-Human Interaction, Playing Musical Instruments, Sports.

UCF101 is an extension of UCF50 which included thefollowing 50 action classes: {Baseball Pitch, BasketballShooting, Bench Press, Biking, Billiards Shot, Breaststroke,Clean and Jerk, Diving, Drumming, Fencing, Golf Swing,High Jump, Horse Race, Horse Riding, Hula Hoop, JavelinThrow,, Juggling Balls, Jumping Jack, Jump Rope, Kayak-ing, Lunges, Military Parade, Mixing Batter, Nun chucks,Pizza Tossing, Playing Guitar, Playing Piano, PlayingTabla, Playing Violin, Pole Vault, Pommel Horse, Pull Ups,Punch, Push Ups, Rock Climbing Indoor, Rope Climbing,Rowing, Salsa Spins, Skate Boarding, Skiing, Skijet, Soc-cer Juggling, Swing, TaiChi, Tennis Swing, Throw Discus,

2

Page 3: UCF101: A Dataset of 101 Human Actions Classes … · Pizza Tossing,Playing Guitar,Playing Piano,Playing ... Band Marching Haircut Head Massage Military Parade Salsa Spin Playing

Hula Hoop Juggling Balls Jump Rope

Skate Boarding Pizza Tossing Nun Chucks Mixing Batter

Yo Yo

Apply Eye Makeup Blow Dry Hair Apply Lipstick Cutting In Kitchen Hammering

Knitting Mopping Floor Shaving Beard

Writing On Board

Typing

Brushing Teeth

Soccer Juggling

Walking with a Dog Swing Rope Climbing Push ups Trampoline Jumping Tai Chi Rock Climbing Indoor

Jumping Jack Lunges

Pull ups

Blowing Candles Body Weight Squats Handstand Pushups Handstand Walking

Wall Pushups

Baby Crawling

Military Parade Salsa Spin Band Marching Haircut Head Massage

Playing Tabla Playing Piano Playing Guitar

Drumming

Playing Violin

Playing Cello Playing Daf Playing Dhol

Playing Flute Playing Sitar

Bench Press Basketball

Baseball Pitch

Billiard Breaststroke

Clean and Jerk Diving Fencing

Golf Swing

Rowing Punch Pommel Horse Pole Vault

Kayaking Javelin Throw Horse Riding Horse Race High Jump

Skiing

Jetski Tennis Swing Throw Discus

Volleyball Spiking

Archery Balance Beam

Basketball Dunk Bowling

Front Crawl

Frisbee Catch Floor Gymnastics Field Hockey Penalty Cricket Shot Cricket Bowling Cliff Diving

Boxing-Speed Bag Boxing-Punching Bag

Hammer Throw Ice Dancing

Long Jump Parallel Bars Rafting Shotput

Sky Diving Soccer Penalty Still Rings

Biking

Uneven Bars

Table Tennis Shot Surfing Sumo Wrestling

Figure 2. 101 actions included in UCF101 shown with one sample frame. The color of frame borders specifies to which action type theybelong: Human-Object Interaction, Body-Motion Only, Human-Human Interaction, Playing Musical Instruments, Sports.

Trampoline Jumping, Volleyball Spiking, Walking with adog, Yo Yo}. The color class labels specify which prede-fined action type they belong to.

The following 51 new classes are introduced in UCF101:{Apply Eye Makeup, Apply Lipstick, Archery, Baby Crawl-ing, Balance Beam, Band Marching, Basketball Dunk, BlowDrying Hair, Blowing Candles, Body Weight Squats, Bowl-

ing,Boxing-Punching Bag, Boxing-Speed Bag, BrushingTeeth, Cliff Diving, Cricket Bowling, Cricket Shot, Cut-ting In Kitchen, Field Hockey Penalty, Floor Gymnastics,Frisbee Catch, Front Crawl, Hair cut, Hammering, Ham-mer Throw, Handstand Pushups, Handstand Walking, HeadMassage, Ice Dancing, Knitting, Long Jump, MoppingFloor, Parallel Bars, Playing Cello, Playing Daf, Playing

Page 4: UCF101: A Dataset of 101 Human Actions Classes … · Pizza Tossing,Playing Guitar,Playing Piano,Playing ... Band Marching Haircut Head Massage Military Parade Salsa Spin Playing

0

30

60

90

120

150

180A

pp

lyE

yeM

akeu

p

App

lyL

ipst

ick

Arc

her

y

Bab

yC

raw

lin

g

Bal

ance

Bea

m

Ban

dM

arch

ing

Bas

ebal

lPit

ch

Bas

ket

bal

l

Bas

ket

bal

lDun

k

Ben

chP

ress

Bik

ing

Bil

liar

ds

Blo

wD

ryH

air

Blo

win

gC

andle

s

Bo

dyW

eigh

tSqu

ats

Bow

ling

Boxin

gP

unch

ingB

ag

Boxin

gS

pee

dB

ag

Bre

astS

troke

Bru

shin

gT

eeth

Cle

anA

ndJe

rk

Cli

ffD

ivin

g

Cri

cket

Bo

wli

ng

Cri

cket

Sh

ot

Cutt

ingIn

Kit

chen

Div

ing

Dru

mm

ing

Fen

cing

Fie

ldH

ock

eyP

enal

ty

Flo

orG

ym

nas

tics

Fri

sbee

Cat

ch

Fro

ntC

raw

l

Golf

Sw

ing

Hai

rcut

Ham

mer

ing

Ham

mer

Th

row

Han

dst

and

Push

up

s

Han

dst

and

Wal

kin

g

Hea

dM

assa

ge

Hig

hJu

mp

Hors

eRac

e

Hors

eRid

ing

Hula

Hoo

p

IceD

anci

ng

Javel

inT

hro

w

Juggli

ngB

alls

Jum

pin

gJa

ck

Jum

pR

ope

Kay

akin

g

Knit

tin

g

Nu

mb

er o

f C

lip

s

0

30

60

90

120

150

180

Lon

gJu

mp

Lu

ng

es

Mil

itar

yP

arad

e

Mix

ing

Mop

pin

gF

loo

r

Nu

nch

uck

s

Par

alle

lBar

s

Piz

zaT

oss

ing

Pla

yin

gC

ello

Pla

yin

gD

af

Pla

yin

gD

ho

l

Pla

yin

gF

lute

Pla

yin

gG

uit

ar

Pla

yin

gP

iano

Pla

yin

gS

itar

Pla

yin

gT

abla

Pla

yin

gV

ioli

n

Pole

Vau

lt

Pom

mel

Hors

e

Pu

llU

ps

Pu

nch

Pu

shU

ps

Raf

ting

RockClimbi…

Rop

eCli

mbin

g

Row

ing

Sal

saS

pin

Shav

ingB

eard

Sh

otp

ut

Sk

ateB

oar

din

g

Sk

iin

g

Sk

ijet

Sky

Div

ing

Socc

erJu

ggli

ng

Socc

erP

enal

ty

Sti

llR

ings

Su

moW

rest

lin

g

Surf

ing

Sw

ing

TableTennis…

Tai

Chi

Ten

nis

Sw

ing

Thro

wD

iscu

s

TrampolineJ…

Ty

pin

g

Unev

enB

ars

VolleyballSp…

WalkingWit…

Wal

lPush

up

s

WritingOnB…

Yo

Yo

Nu

mb

er o

f C

lip

s

> 10.0 Sec 5.0 - 10.0 Sec 2.0 - 5.0 Sec 0.0 - 2.0 Sec

Figure 3. Number of clips per action class. The distribution of clip durations is illustrated by the colors.

Dhol, Playing Flute, Playing Sitar, Rafting, Shaving Beard,Shot put, Sky Diving, Soccer Penalty, Still Rings, SumoWrestling, Surfing, Table Tennis Shot, Typing, Uneven Bars,Wall Pushups, Writing On Board}. Fig. 2 shows a sampleframe for each action class of UCF101.

Clip Groups: The clips of one action class are dividedinto 25 groups which contain 4-7 clips each. The clips inone group share some common features, such as the back-ground or actors.

The bar chart of Fig. 3 shows the number of clips ineach class. The colors on each bar illustrate the durationsof different clips included in that class. The chart shown inFig. 4 illustrates the average clip length (green) and totalduration of clips (blue) for each action class.

The videos are downloaded from YouTube [2] and theirrelevant ones are manually removed. All clips have fixedframe rate and resolution of 25 FPS and 320× 240 respec-tively. The videos are saved in .avi files compressed us-ing DivX codec available in k-lite package [1]. The audiois preserved for the clips of the new 51 actions. Table 1summarizes the characteristics of the dataset.

Actions 101Clips 13320

Groups per Action 25Clips per Group 4-7

Mean Clip Length 7.21 secTotal Duration 1600 mins

Min Clip Length 1.06 secMax Clip Length 71.04 sec

Frame Rate 25 fpsResolution 320×240

Audio Yes (51 actions)

Table 1. Summary of Characteristics of UCF101

Naming Convention: The zipped file of the dataset(available at http://crcv.ucf.edu/data/UCF101.php ) includes 101 folders each containingthe clips of one action class. The name of each clip has thefollowing form:

v X gY cZ.avi

Page 5: UCF101: A Dataset of 101 Human Actions Classes … · Pizza Tossing,Playing Guitar,Playing Piano,Playing ... Band Marching Haircut Head Massage Military Parade Salsa Spin Playing

0

2

4

6

8

10

12

14

16

0

500

1000

1500

2000

2500

Apply

Ey

eMak

eup

Apply

Lip

stic

k

Arc

her

y

Bab

yC

raw

lin

g

Bal

ance

Bea

m

Ban

dM

arch

ing

Bas

ebal

lPit

ch

Bas

ket

bal

l

Bas

ket

bal

lDunk

Ben

chP

ress

Bik

ing

Bil

liar

ds

Blo

wD

ryH

air

Blo

win

gC

andle

s

Body

Wei

gh

tSquat

s

Bow

ling

Bo

xin

gP

unch

ingB

ag

Bo

xin

gS

pee

dB

ag

Bre

astS

trok

e

Bru

shin

gT

eeth

Cle

anA

ndJe

rk

Cli

ffD

ivin

g

Cri

cket

Bo

wli

ng

Cri

cket

Shot

Cu

ttin

gIn

Kit

chen

Div

ing

Dru

mm

ing

Fen

cing

Fie

ldH

ock

eyP

enal

ty

Flo

orG

ym

nas

tics

Fri

sbee

Cat

ch

Fro

ntC

raw

l

Go

lfS

win

g

Hai

rcut

Ham

mer

ing

Ham

mer

Thro

w

Han

dst

andP

ush

ups

Han

dst

andW

alkin

g

Hea

dM

assa

ge

Hig

hJu

mp

Hors

eRac

e

Hors

eRid

ing

Hula

Hoop

IceD

anci

ng

Javel

inT

hro

w

Juggli

ngB

alls

Jum

pin

gJa

ck

Jum

pR

ope

Kay

akin

g

Kn

itti

ng

Tim

e (s

ec)

Total Time

Average Clip Duration

0

2

4

6

8

10

12

14

16

0

500

1000

1500

2000

2500

Lo

ng

Jum

p

Lung

es

Mil

itar

yP

arad

e

Mix

ing

Mop

pin

gF

loor

Nu

nch

uck

s

Par

alle

lBar

s

Piz

zaT

oss

ing

Pla

yin

gC

ello

Pla

yin

gD

af

Pla

yin

gD

ho

l

Pla

yin

gF

lute

Pla

yin

gG

uit

ar

Pla

yin

gP

iano

Pla

yin

gS

itar

Pla

yin

gT

abla

Pla

yin

gV

ioli

n

Po

leV

ault

Po

mm

elH

ors

e

Pull

Ups

Pun

ch

Push

Ups

Raf

ting

Ro

ckC

lim

bin

gIn

do

or

Ro

peC

lim

bin

g

Ro

win

g

Sal

saS

pin

Shav

ingB

eard

Sho

tpu

t

Sk

ateB

oar

din

g

Sk

iin

g

Skij

et

Sky

Div

ing

Socc

erJu

ggli

ng

So

ccer

Pen

alty

Sti

llR

ing

s

Sum

oW

rest

ling

Su

rfin

g

Sw

ing

Tab

leT

enn

isS

ho

t

Tai

Chi

Ten

nis

Sw

ing

Th

row

Dis

cus

Tra

mpoli

neJ

um

pin

g

Ty

pin

g

Un

even

Bar

s

Vo

lley

bal

lSpik

ing

Wal

kin

gW

ithD

og

Wal

lPush

up

s

Wri

tin

gO

nB

oar

d

YoY

o

Tim

e (s

ec)

Total Time

Average Clip Duration

Figure 4. Total time of videos for each class is illustrated using the blue bars. The average length of the clips for each action is depicted ingreen.

where X, Y and Z represent action class label,group and clip number respectively. For instance,v ApplyEyeMakeup g03 c04.avi corresponds tothe clip 4 of group 3 of action class ApplyEyeMakeup.

3. Experimental Results

We performed an experiment using bag of words ap-proach which is widely accepted as a standard action recog-nition method to provide baseline results on UCF101.

From each clip, we extracted Harris3D corners (usingthe implementation by [7]) and computed 162 dimensionalHOG/HOF descriptors for each. We clustered a randomlyselected set of 100,000 space-time interest points (STIP) us-ing k-means to build the codebook. The size of our code-book is k=4000 which is shown to yield good results overa wide range of datasets. The descriptors were assigned totheir closest video words using nearest neighbor classifier,and each clip was represented by a 4000-dimensional his-togram of its words. Utilizing a leave-one-group-out 25-fold cross validation scenario, a SVM was trained using

the histogram vectors of the training folds. We employed anonlinear multiclass SVM with histogram intersection ker-nel and 101 classes each representing one action. For test-ing, a similar histogram representation for the query videowas computed and classified using the trained SVM. Thismethod yielded an overall accuracy of 44.5%; The confu-sion matrix for all 101 actions is shown in Fig. 5.

The accuracy for the predefined action types are:Sports (50.54%), Playing Musical Instrument (37.42%),Human-Object Interaction (38.52%), Body-Motion Only(36.26%), Human-Human Interaction (44.14%). Sports ac-tions achieve the highest accuracy since performing sportstypically requires distinctive motions which makes the clas-sification easier. Moreover, the background in sports clipsare generally less cluttered compared to other action types.Unlike Sports Actions, Human-Object Interaction clips typ-ically have a highly cluttered background. Additionally, theinformative motions typically occupy a small portion of themotions in the clips which explains the low recognition ac-curacy of this action class.

Page 6: UCF101: A Dataset of 101 Human Actions Classes … · Pizza Tossing,Playing Guitar,Playing Piano,Playing ... Band Marching Haircut Head Massage Military Parade Salsa Spin Playing

Dataset Number of Actions Clips Background Camera Motion Release Year ResourceKTH [11] 6 600 Static Slight 2004 Actor Staged

Weizmann [3] 9 81 Static No 2005 Actor StagedUCF Sports [10] 9 182 Dynamic Yes 2009 TV, Movies

IXMAS [12] 11 165 Static No 2006 Actor StagedUCF11 [6] 11 1168 Dynamic Yes 2009 YouTubeHOHA [7] 12 2517 Dynamic Yes 2009 Movies

Olympic [8] 16 800 Dynamic Yes 2010 YouTubeUCF50 [9] 50 6681 Dynamic Yes 2010 YouTube

HMDB51 [5] 51 6766 Dynamic Yes 2011 Movies, YouTube, WebUCF101 101 13320 Dynamic Yes 2012 YouTube

Table 2. Summary of Major Action Recognition Datasets

We recommend a 25-fold cross validation experimentalsetup using all the videos in the dataset to keep consistencyof the reported tests on UCF101; the baseline results pro-vided in this section were computed using the same sce-nario.

4. Related DatasetsUCF Sports, UCF11, UCF50 and UCF101 are the four

action datasets compiled by UCF in chronological order;each one includes its precursor. We made two minor mod-ifications in the portion of UCF101 which includes UCF50videos: the number of groups is fixed to 25 for all the ac-tions, and each group includes up to 7 clips. Table 2 showsa list of existing action recognition datasets with detailedcharacteristics of each. Note that UCF101 is remarkablylarger than the rest.

5. ConclusionWe introduced UCF101 which is the most challeng-

ing dataset for action recognition compared to the exist-ing ones. It includes 101 action classes and over 13k clipswhich makes it outstandingly larger than other datasets.UCF101 is composed of unconstrained videos downloadedfrom YouTube which feature challenges such as poor light-ing, cluttered background and severe camera motion. Weprovided baseline action recognition results on this newdataset using standard bag of words method with overallaccuracy of 44.5%.

References[1] K-lite codec package. http://codecguide.com/. 4[2] Youtube. http://www.youtube.com/. 4[3] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri.

Actions as space-time shapes, 2005. International Confer-ence on Computer Vision (ICCV). 2, 6

[4] G. Johansson, S. Bergstrom, and W. Epstein. Perceivingevents and objects, 1994. Lawrence Erlbaum Associates. 2

[5] H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre.Hmdb: A large video database for human motion recogni-tion, 2011. International Conference on Computer Vision(ICCV). 2, 6

[6] J. Liu, J. Luo, and M. Shah. Recognizing realistic actionsfrom videos in the wild, 2009. IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR). 2, 6

[7] M. Marszaek, I. Laptev, and C. Schmid. Actions in context,2009. IEEE Conference on Computer Vision and PatternRecognition (CVPR). 2, 5, 6

[8] J. Niebles, C. Chen, and L. Fei-Fei. Modeling temporalstructure of decomposable motion segments for activity clas-sication, 2010. European Conference on Computer Vision(ECCV). 2, 6

[9] K. Reddy and M. Shah. Recognizing 50 human action cat-egories of web videos, 2012. Machine Vision and Applica-tions Journal (MVAP). 2, 6

[10] M. Rodriguez, J. Ahmed, and M. Shah. Action mach: Aspatiotemporal maximum average correlation height lter foraction recognition, 2008. IEEE Conference on ComputerVision and Pattern Recognition (CVPR). 2, 6

[11] C. Schuldt, I. Laptev, and B. Caputo. Recognizing human ac-tions: A local svm approach, 2004. International Conferenceon Pattern Recognition (ICPR). 2, 6

[12] D. Weinland, E. Boyer, and R. Ronfard. Action recognitionfrom arbitrary views using 3d exemplars, 2007. InternationalConference on Computer Vision (ICCV). 2, 6

Page 7: UCF101: A Dataset of 101 Human Actions Classes … · Pizza Tossing,Playing Guitar,Playing Piano,Playing ... Band Marching Haircut Head Massage Military Parade Salsa Spin Playing

Archery

Baseball Pitch

Basketball Dunk

Biking

Bowling

Boxing Speed Bag

Clean and Jerk

Cricket Bowling

Diving

Field Hockey Penalty

Frisbee Catch

Golf Swing

High Jump

Horse Riding

Javelin Throw

Long Jump

Pole Vault

Punch

Rowing

Skiing

Sky Diving

Still Rings

Surfing

Tennis Swing

Uneven Bars

Drumming

Playing Piano

Playing Violin

Playing Daf

Playing Flute

Apply Eye Makeup

Blow Dry Hair

Cutting In Kitchen

Hula Hoop

Jump Rope

Mixing Batter

Nun chucks

Shaving Beard

Soccer Juggling

Writing On Board

Baby Crawling

BodyWeight Squats

Handstand Walking

Lunges

Push Ups

Rope Climbing

Tai Chi

Walking with a Dog

Haircut

Military Parade

Balance Beam

Basketball Shooting

Bench Press

Billiards Shot

Boxing Punching Bag

Breaststroke

Cliff Diving

Cricket Shot

Fencing

Floor Gymnastics

Front Crawl

Hammer Throw

Horse Race

Ice Dancing

Kayaking

Parallel Bars

Pommel Horse

Rafting

Shotput

Skijet

Soccer Penalty

SumoWrestling

Table Tennis Shot

Throw Discus

Volleyball Spiking

Playing Guitar

Playing Tabla

Playing Cello

Playing Dhol

Playing Sitar

Apply Lipstick

Brushing Teeth

Hammering

Juggling Balls

Knitting

Mopping Floor

Pizza Tossing

Skate Boarding

Typing

Yo Yo

Blowing Candles

Handstand Pushups

Jumping Jack

Pull Ups

Rock Climbing Indoor

Swing

Trampoline Jumping

Arc

hery

Bas

ebal

l Pitc

h

Bas

ketb

all D

unk

Bik

ing

Bow

ling

Box

ing

Spee

d B

ag

Cle

an a

nd Je

rk

Cric

ket B

owlin

g

Div

ing

Fiel

d H

ocke

y Pe

nalty

Fris

bee

Cat

ch

Gol

f Sw

ing

Hig

h Ju

mp

Hor

se R

idin

g

Jave

lin T

hrow

Long

Jum

p

Pole

Vau

lt

Punc

h

Row

ing

Skiin

g

SkyD

ivin

g

Still

Rin

gs

Surfi

ng

Tenn

is S

win

g

Une

ven

Bar

s

Dru

mm

ing

Play

ing

Pian

o

Play

ing

Vio

lin

Play

ing

Daf

Play

ing

Flut

e

App

ly E

ye M

akeu

p

Blo

w D

ry H

air

Cut

ting

In K

itche

n

Hul

a H

oop

Jum

p R

ope

Mix

ing

Bat

ter

Nun

chu

cks

Shav

ing

Bea

rd

Socc

er Ju

gglin

g

Writ

ing

On

Boa

rd

Bab

y C

raw

ling

Bod

y W

eigh

t Squ

ats

Han

dsta

nd W

alki

ng

Lung

es

Push

Ups

Rop

e C

limbi

ng

Tai C

hi

Wal

king

with

a D

og

Hai

rcut

Mili

tary

Par

ade

Bal

ance

Bea

m

Bas

ketb

all S

hoot

ing

Ben

ch P

ress

Bill

iard

s Sho

t

Box

ing

Punc

hing

Bag

Bre

asts

troke

Clif

f Div

ing

Cric

ket S

hot

Fenc

ing

Floo

r Gym

nast

ics

Fron

t Cra

wl

Ham

mer

Thr

ow

Hor

se R

ace

Ice

Dan

cing

Kay

akin

g

Para

llel B

ars

Pom

mel

Hor

se

Raf

ting

Shot

put

Skije

t

Socc

er P

enal

ty

Sum

o W

rest

ling

Tabl

e Te

nnis

Sho

t

Thro

w D

iscu

s

Volle

ybal

l Spi

king

Play

ing

Gui

tar

Play

ing

Tabl

a

Play

ing

Cel

lo

Play

ing

Dho

l

Play

ing

Sita

r

App

ly L

ipst

ick

Bru

shin

g Te

eth

'Ham

mer

ing

Jugg

ling

Bal

ls

Kni

tting

Mop

ping

Flo

or

Pizz

a To

ssin

g

Skat

e B

oard

ing

Typi

ng

Yo Y

o

Blo

win

g C

andl

es

Han

dsta

nd P

ushu

ps

Jum

ping

Jack

Pull

Ups

Roc

k C

limbi

ng In

door

Swin

g

Tram

polin

e Ju

mpi

ng

Band Marching

Head Massage

Salsa Spins

Wall Pushups

Ban

d M

arch

ing

Hea

d M

assa

ge

Sals

a Sp

ins

Wal

l Pus

hups

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Figure 5. Confusion table of baseline action recognition results using bag of words approach on UCF101. The drawn lines separate differenttypes of actions; 1-50: Sports, 51-60: Playing Musical Instrument, 61-80: Human-Object Interaction, 81-96: Body-Motion Only, 97-101:Human-Human Interaction.


Recommended