+ All Categories
Home > Entertainment & Humor > ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

Date post: 03-Dec-2014
Category:
Upload: zukun
View: 856 times
Download: 0 times
Share this document with a friend
Description:
 
Popular Tags:
40
Human Action Recognition by Learning Bases of Action Attributes and Parts Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas Guibas, and Li Fei-Fei 1 Stanford University
Transcript
Page 1: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

Human Action Recognition by Learning Bases of Action

Attributes and Parts

Bangpeng Yao, Xiaoye Jiang, Aditya Khosla,

Andy Lai Lin, Leonidas Guibas, and Li Fei-Fei

1

Stanford University

Page 2: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

2

Action Classification in Still Images

Low level feature

Yao & Fei-Fei, 2010Koniusz et al., 2010Delaitre et al., 2010Yao et al., 2011

Riding bike

Page 3: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

3

Action Classification in Still Images

Riding a bikeSitting on a bike seatWearing a helmetPeddling the pedals…

- Semantic concepts – Attributes

Low level feature

Yao & Fei-Fei, 2010Koniusz et al., 2010Delaitre et al., 2010Yao et al., 2011

High-level representationRiding bike

Page 4: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

4

Action Classification in Still Images

- Semantic concepts – Attributes- Objects

Riding a bikeSitting on a bike seatWearing a helmetPeddling the pedals…

Low level feature

Yao & Fei-Fei, 2010Koniusz et al., 2010Delaitre et al., 2010Yao et al., 2011

High-level representationRiding bike

Page 5: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

5

Action Classification in Still Images

- Semantic concepts – Attributes- Objects- Human poses

Parts

Riding a bikeSitting on a bike seatWearing a helmetPeddling the pedals…

Low level feature

Yao & Fei-Fei, 2010Koniusz et al., 2010Delaitre et al., 2010Yao et al., 2011

High-level representationRiding bike

Page 6: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

6

Action Classification in Still Images

- Semantic concepts – Attributes- Objects- Human poses- Contexts of attributes & parts

Parts

Riding a bikeSitting on a bike seatWearing a helmetPeddling the pedals…

Riding

Low level feature

Yao & Fei-Fei, 2010Koniusz et al., 2010Delaitre et al., 2010Yao et al., 2011

High-level representationRiding bike

Page 7: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

7

Low level feature

Yao & Fei-Fei, 2010Koniusz et al., 2010Delaitre et al., 2010Yao et al., 2011

- Semantic concepts – Attributes- Objects- Human poses- Contexts of attributes & parts

High-level representation

Parts

riding a bike

wearing a helmet

Peddling the pedal

sitting on bike seat

Farhadi et al., 2009Lampert et al., 2009Berg et al., 2010Parikh & Grauman, 2011

Gupta et al., 2009Yao & Fei-Fei, 2010Torresani et al., 2010Li et al., 2010

Yang et al., 2010Maji et al., 2011Liu et al., 2011

Incorporate human knowledge; More understanding of image content; More discriminative classifier.

Action Classification in Still Images

Riding bike

Page 8: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

• Intuition: Action Attributes and Parts

• Algorithm: Learning Bases of Attributes

and Parts

• Experiments: PASCAL VOC & Stanford

40 Actions

• Conclusion

Outline

8

Page 9: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

• Intuition: Action Attributes and Parts

• Algorithm: Learning Bases of Attributes

and Parts

• Experiments: PASCAL VOC & Stanford

40 Actions

• Conclusion

Outline

9

Page 10: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

10

Action Attributes and Parts

Attributes:

… …

semantic descriptions of human actions

Page 11: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

11

Action Attributes and Parts

Attributes:

… …

semantic descriptions of human actions

Riding bike Not

riding bike

Lampert et al., 2009Berg et al., 2010

Discriminative classifier, e.g. SVM

Page 12: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

12

Action Attributes and Parts

Attributes:

… …

Parts-Objects:

… …

Parts-Poselets:

… …

A pre-trained detector

Object Bank, Li et al., 2010Poselet, Bourdev & Malik, 2009

Page 13: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

13

Action Attributes and Parts

Attributes:

… …

Parts-Objects:

… …

Parts-Poselets:

… …

Attribute classification

Object detection

Poselet detection

a: Image feature vector

Page 14: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

14

Action Attributes and Parts

Attributes:

… …

Parts-Objects:

… …

Parts-Poselets:

… …

Attribute classification

Object detection

Poselet detection

a: Image feature vector

Action bases Φ

Page 15: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

15

Action Attributes and Parts

Attributes:

… …

Parts-Objects:

… …

Parts-Poselets:

… …

a: Image feature vector

Action bases Φ

Page 16: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

16

Action Attributes and Parts

Attributes:

… …

Parts-Objects:

… …

Parts-Poselets:

… …

a: Image feature vector

Action bases Φ

Page 17: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

17

Action Attributes and Parts

Attributes:

… …

Parts-Objects:

… …

Parts-Poselets:

… …

Action bases

Bases coefficients w

Φ

a: Image feature vector

a Φw

Page 18: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

18

Action Attributes and Parts

Attributes:

… …

Parts-Objects:

… …

Parts-Poselets:

… …

Action bases

Bases coefficients w

Φ

a: Image feature vector

a Φw

• Sparse• Encodes context• Robust to initially weak detections

Page 19: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

• Intuition: Action Attributes and Parts

• Algorithm: Learning Bases of

Attributes and Parts

• Experiments: PASCAL VOC & Stanford

40 Actions

• Conclusion

Outline

19

Page 20: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

20

Bases of Atr. & Parts: Training

w

Φa

a Φw

• Input: 1, , Na a

• Output: 1, , MΦ Φ Φ

1, , NW w wsparse

2

2 1,1

1min ,

2

N

i i ii

Φ W

a Φw w

2

1 2s.t. , 1

2j jj

Φ Φ

L1 regularization, sparsity of W

Elastic net, sparsity of [Zou & Hasti, 2005]

Accurate approximation

• Jointly estimate and :Φ W

• Optimization: stochastic gradient descent.

Φ

Page 21: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

21

Bases of Atr. & Parts: Testing

w

Φa

a Φw

• Input: a

• Output:

1, , MΦ Φ Φ

w sparse

• Estimate w:

• Optimization: stochastic gradient descent.

2

2 1

1min

2

wa Φw w

L1 regularization, sparsity of WAccurate approximation

Page 22: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

• Intuition: Action Attributes and Parts

• Algorithm: Learning Bases of Attributes

and Parts

• Experiments: PASCAL VOC & Stanford

40 Actions

• Conclusion

Outline

22

Page 23: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

23

PASCAL VOC 2010 Action Dataset

Figure credit: Ivan Laptev

• 9 classes, 50-100 trainval / testing images per class

14 attributes – trained from the trainval images;27 objects – taken from Li et al, NIPS 2010;150 poselets – taken from Bourdev & Malik, ICCV 2009.

Page 24: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

24

VOC 2010: Classification Result

1 2 3 4 5 6 7 8 9

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Phoning Playing instrument

Reading Riding bike

Riding horse

Running Taking photo

Using computer

Walking

Ave

rag

e p

reci

sio

n

Our method, use “a”

Poselet, Maji et al, 2011

SURREY_MKUCLEAR_DOSP

w

Φa

Page 25: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

25

w

Φa

1 2 3 4 5 6 7 8 9

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Phoning Playing instrument

Reading Riding bike

Riding horse

Running Taking photo

Walking

Our method, use “a”Our method, use “w”

Poselet, Maji et al, 2011

SURREY_MKUCLEAR_DOSP

Ave

rag

e p

reci

sio

n

Using computer

VOC 2010: Classification Result

Page 26: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

26

w

Φa

1 2 3 4 5 6 7 8 9

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Phoning Playing instrument

Reading Riding bike

Riding horse

Running Taking photo

Walking

Our method, use “a”Our method, use “w”

Poselet, Maji et al, 2011

SURREY_MKUCLEAR_DOSP

Ave

rag

e p

reci

sio

n

Using computer

400 action bases

attributesobjects

poselets

VOC 2010: Analysis of Bases

Page 27: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

27

w

Φa

1 2 3 4 5 6 7 8 9

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Phoning Playing instrument

Reading Riding bike

Riding horse

Running Taking photo

Walking

Our method, use “a”Our method, use “w”

Poselet, Maji et al, 2011

SURREY_MKUCLEAR_DOSP

Ave

rag

e p

reci

sio

n

Using computer

400 action bases

attributesobjects

poselets

VOC 2010: Analysis of Bases

Page 28: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

28

w

Φa

1 2 3 4 5 6 7 8 9

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Phoning Playing instrument

Reading Riding bike

Riding horse

Running Taking photo

Walking

Our method, use “a”Our method, use “w”

Poselet, Maji et al, 2011

SURREY_MKUCLEAR_DOSP

Ave

rag

e p

reci

sio

n

Using computer

400 action bases

attributesobjects

poselets

VOC 2010: Analysis of Bases

Page 29: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

29

VOC 2010: Control Experiment

w

ΦaA+O+P A+O A+P O+P

0.45

0.5

0.55

0.6

0.65

0.7

Mea

n av

erag

e pr

ecis

ion

Use “a”

Use “w”

A: attributeO: objectP: poselet

Page 30: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

30

PASCAL VOC 2011 Result

• Our method ranks the first in nine out of ten classes in comp10.

Others’ best in comp9

Others’ best in comp10

Our method

Jumping 71.6 59.5 66.7

Phoning 50.7 31.3 41.1

Playing instrument 77.5 45.6 60.8

Reading 37.8 27.8 42.2

Riding bike 88.8 84.4 90.5

Riding horse 90.2 88.3 92.2

Running 87.9 77.6 86.2

Taking photo 25.7 31.0 28.8

Using computer 58.9 47.4 63.5

Walking 59.5 57.6 64.2

Page 31: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

31

PASCAL VOC 2011 Result

Others’ best in comp9

Others’ best in comp10

Our method

Jumping 71.6 59.5 66.7

Phoning 50.7 31.3 41.1

Playing instrument 77.5 45.6 60.8

Reading 37.8 27.8 42.2

Riding bike 88.8 84.4 90.5

Riding horse 90.2 88.3 92.2

Running 87.9 77.6 86.2

Taking photo 25.7 31.0 28.8

Using computer 58.9 47.4 63.5

Walking 59.5 57.6 64.2

• Our method achieves the best performance in five out of ten classes if we consider both comp9 and comp10.

Page 32: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

32

Stanford 40 Actions

Applauding Blowing bubbles

Brushing teeth

Calling Cleaning floor

Climbing wall

Cooking Cutting trees

Cutting vegetables

Drinking Feeding horse

Fishing Fixing bike

Gardening Holding umbrella

Jumping

Playing guitar

Playing violin

Pouring liquid

Pushing cart

Reading Repairing car

Riding bike

Riding horse

Rowing Running Shooting arrow

Smoking cigarette

Taking photo

Texting message

Throwing frisbee

Using computer

Using microscope

Using telescope

Walking dog

Washing dishes

Watching television

Waving hands

Writing on board

Writing on paper

http://vision.stanford.edu/Datasets/40actions.html

• 40 actions classes, 9532 real world images from Google, Flickr, etc.

Page 33: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

33

Stanford 40 Actions

Applauding Blowing bubbles

Brushing teeth

Calling Cleaning floor

Climbing wall

Cooking Cutting trees

Cutting vegetables

Drinking Feeding horse

Fishing Fixing bike

Gardening Holding umbrella

Jumping

Playing guitar

Playing violin

Pouring liquid

Pushing cart

Reading Repairing car

Riding bike

Riding horse

Rowing Running Shooting arrow

Smoking cigarette

Taking photo

Texting message

Throwing frisbee

Using computer

Using microscope

Using telescope

Walking dog

Washing dishes

Watching television

Waving hands

Writing on board

Writing on paper

http://vision.stanford.edu/Datasets/40actions.html

• 40 actions classes, 9532 real world images from Google, Flickr, etc.

Riding bike

Fixing bike

Page 34: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

34

Stanford 40 Actions

Applauding Blowing bubbles

Brushing teeth

Calling Cleaning floor

Climbing wall

Cooking Cutting trees

Cutting vegetables

Drinking Feeding horse

Fishing Fixing bike

Gardening Holding umbrella

Jumping

Playing guitar

Playing violin

Pouring liquid

Pushing cart

Reading Repairing car

Riding bike

Riding horse

Rowing Running Shooting arrow

Smoking cigarette

Taking photo

Texting message

Throwing frisbee

Using computer

Using microscope

Using telescope

Walking dog

Washing dishes

Watching television

Waving hands

Writing on board

Writing on paper

http://vision.stanford.edu/Datasets/40actions.html

• 40 actions classes, 9532 real world images from Google, Flickr, etc.

Writing on board

Writing on paper

Page 35: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

35

Stanford 40 Actions

Applauding Blowing bubbles

Brushing teeth

Calling Cleaning floor

Climbing wall

Cooking Cutting trees

Cutting vegetables

Drinking Feeding horse

Fishing Fixing bike

Gardening Holding umbrella

Jumping

Playing guitar

Playing violin

Pouring liquid

Pushing cart

Reading Repairing car

Riding bike

Riding horse

Rowing Running Shooting arrow

Smoking cigarette

Taking photo

Texting message

Throwing frisbee

Using computer

Using microscope

Using telescope

Walking dog

Washing dishes

Watching television

Waving hands

Writing on board

Writing on paper

http://vision.stanford.edu/Datasets/40actions.html

• 40 actions classes, 9532 real world images from Google, Flickr, etc.

Drinking Gardening

Smoking Cigarette

Page 36: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

36

Stanford 40 Actions: Result• We use 45 attributes, 81 objects, and 150 poselets.• Compare our method with the Locality-constrained Linear Coding (LLC, Wang et al, CVPR 2010) baseline.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Riding

a h

orse

Rowing

a b

oat

Riding

a b

ike

Climbin

g m

ount

ain

Jum

ping

Cleanin

g th

e flo

or

Wal

king

a do

g

Shoot

ing a

n ar

row

Playin

g gu

itar

Fishin

g

Holding

up

an u

mbr

ella

Runni

ng

Throw

ing

a fri

sbee

Writ

ing

on a

boa

rd

Wat

chin

g TV

Cuttin

g tre

es

Feedin

g a

hors

e

Garde

ning

Writ

ing

on a

boo

k

Repai

ring

a ca

r

Look

ing th

ru a

micr

osco

pe

Cuttin

g ve

geta

bles

Blowing

bub

bles

Playin

g vio

lin

Brush

ing te

eth

Repai

ring

a bi

ke

Pushin

g a

cart

Using

a co

mpu

ter

Appla

uding

Cookin

g

Smok

ing c

igare

tte

Look

ing th

ru a

teles

cope

Was

hing

dishe

s

Drinkin

g

Calling

Wav

ing h

ands

Pourin

g liq

uid

Readi

ng a

boo

k

Taking

pho

tos

Textin

g m

essa

ge

LLC

Our Method

Ave

rage

pre

cisi

on

Page 37: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

37

Stanford 40 Actions: Result

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Riding

a h

orse

Rowing

a b

oat

Riding

a b

ike

Climbin

g m

ount

ain

Jum

ping

Cleanin

g th

e flo

or

Wal

king

a do

g

Shoot

ing a

n ar

row

Playin

g gu

itar

Fishin

g

Holding

up

an u

mbr

ella

Runni

ng

Throw

ing

a fri

sbee

Writ

ing

on a

boa

rd

Wat

chin

g TV

Cuttin

g tre

es

Feedin

g a

hors

e

Garde

ning

Writ

ing

on a

boo

k

Repai

ring

a ca

r

Look

ing th

ru a

micr

osco

pe

Cuttin

g ve

geta

bles

Blowing

bub

bles

Playin

g vio

lin

Brush

ing te

eth

Repai

ring

a bi

ke

Pushin

g a

cart

Using

a co

mpu

ter

Appla

uding

Cookin

g

Smok

ing c

igare

tte

Look

ing th

ru a

teles

cope

Was

hing

dishe

s

Drinkin

g

Calling

Wav

ing h

ands

Pourin

g liq

uid

Readi

ng a

boo

k

Taking

pho

tos

Textin

g m

essa

ge

LLC

Our Method

Ave

rage

pre

cisi

on

Page 38: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

• Intuition: Action Attributes and Parts

• Algorithm: Learning Bases of Attributes

and Parts

• Experiments: PASCAL VOC & Stanford

40 Actions

• Conclusion

Outline

38

Page 39: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

39

Conclusion

Attributes:

… …

Parts-Objects:

… …

Parts-Poselets:

… …

Action bases

Bases coefficients w

Φ

a: Image feature vector

a Φw

Page 40: ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

40

Acknowledgement


Recommended