+ All Categories
Home > Documents > Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision •...

Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision •...

Date post: 03-Jun-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
27
Learning to Rotate 3D Objects: Weaklysupervised Disentangling with Recurrent Transformations Jimei Yang 1,3 , Scott Reed 2 , MingHsuan Yang 1 and Honglak Lee 2 1 UC Merced 2 U Mich Ann Arbor 3 Adobe Research
Transcript
Page 1: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

Learning  to  Rotate  3D  Objects:Weakly-­‐supervised  Disentangling  with  

Recurrent  Transformations

Jimei Yang1,3,  Scott  Reed2,  Ming-­‐Hsuan Yang1 and  Honglak Lee21UC  Merced

2U  Mich Ann  Arbor3Adobe  Research

Page 2: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

3D  Vision  from  A  Single  Image

• 3D  object  recognition,  Asthana et  al.  ICCV  2011

• 3D  object  manipulation,  Banerjee,  et  al.  SIGGRAPH  2014

Page 3: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

3D  Vision  from  A  Single  Image

• Challenges:– Partial  observability  inherent  in  projecting  a  3D  object  onto  the  image  space,  and

– Ill-­‐posedness of  inferring  object  shape  and  pose• Classic  approach– 3D  object  reconstruction

• Our  approach  …

Page 4: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

Human  3D  Vision

https://psychlopedia.wikispaces.com/mental+rotation

Page 5: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

Human  3D  Vision

https://psychlopedia.wikispaces.com/mental+rotation

Page 6: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

Human  3D  Vision

• Mental  rotation  of  three  dimensional  objects,  Shepard  and  Metzler,  Science,  1971– People  have  the  ability  to  rotate  two  objects  in  their  consciousness  to  decide  whether  they  are  actually  the  same  object  in  different  perspectives

– The  greater  the  angle  that  an  object  is  rotated  the  longer  it  takes  for  people  to  identify

Page 7: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

3D  Vision  from  A  Single  Image

• Solution  inspired  by  mental  rotation– Jointly  model  3D  recognition  and  view  synthesis– Learn  distributed  representations  (neural  networks)  instead  of  recovering  3D  model

Page 8: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

Deep  Convolutional  Networks

• Convolutional  networks  have  demonstrated  remarkable  ability  of  recognizing  and  generating  objects– Discriminative  CNNs,  Krizhevsky et  al.  NIPS  2012– Generative  CNNs,  Dosovitskiy et  al.  CVPR  2015

Page 9: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

Discriminative  CNNs

• Given  an  image,  the  discriminative  CNNs  produce  high-­‐level  feature  representations

Page 10: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

Generative  CNNs

• Given  high-­‐level  abstract  representations,  the  generative  CNNs  produce  object  images

Page 11: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

Action-­‐driven  Convolutional  Encoder-­‐Decoder  Networks

• Input:  an  image  of  3D  object• Output:  its  rotated  view• Latent  units:  pose  and  identity• Action  units:  [100],[010],[001]• Transforming  autoencoder,  Hinton  et  al.• What-­‐where  autoencoder,  Zhao  et  al.

15o 30o

[001]

Page 12: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

Recurrent  Convolutional  Encoder-­‐Decoder  Networks

• To  enable  long-­‐term  rotations,  we  allow  the  pose  units  to  be  recurrent  (45o à {[001],[001],[001]})

• And  fix  the  identity  units  across  all  the  views  in  a  sequence

Page 13: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

Curriculum  Training• We  present  sequences  of  continuous  views  from  the  same  object  (training  as  pose  manifold  traversal)

• We  gradually  increase  the  difficulty  of  training  by  increasing  the  trajectory  length  (Bengio,  et  al.  ICML  2009)

One-­‐step  rotationRNN1

Two-­‐step  rotationRNN2

Four-­‐step  rotationRNN4

Page 14: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

Multi-­‐PIE  Faces

• Data  (Gross  et  al.  IVC  2010)– 337  people,  7  viewpoints  from  -­‐45o to  45o– 200  people  for  training,  137  people  for  test– 80x60x3  pixels  per  image

• Models:  RNN1,  RNN2,  RNN4  and  RNN6• Comparisons– 3D  face  morphable model  for  pose  normalization  (Zhu  et  al.  CVPR  2015)

– Discriminative  CNN  for  face  recognition

Page 15: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

3D  View  Synthesis  for  Novel  Objects

Page 16: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

Comparing  to  3D  Face  MorphableModel  for  Pose  Normalization

• Zhu  et  al.,  CVPR  2015

Page 17: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

Comparing  to  3D  Face  MorphableModel  for  Pose  Normalization

• Zhu  et  al.,  CVPR  2015

Page 18: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

Cross-­‐View  Face  Recognition

• In  the  test  set,  one  view  as  gallery  and  the  other  views  as  probes

• Results  are  measured  by  matching  success  rates  at  different  angle  offsets  between  gallery  and  probe  views

• Compared  to  discriminative  CNN– Train  a  5-­‐layered  CNN  with  face  identity  labels– Extract  features  from  the  layer  before  labels  for  matching

Page 19: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

Cross-­‐View  Face  Recognition

Average  success  rates:  RNN:  93.3CNN:  92.6

Page 20: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

3D  Chairs• Data  (Aubry et  al.  CVPR  2014)– 809  chair  instances,  rendered  from  31  azimuth  angles  and  2  elevation  angles

– 500  instances  for  training,  409  instances  for  test– 64x64x3  pixels  per  image

• Models:  RNN1,  RNN2,  RNN4,  RNN8,  RNN16• KNN  baseline– Extract  “fc7”  features  using  VGG-­‐16  CNN  to  retrieve  K  nearest  neighbors  in  the  training  set

– Given  the  target  rotation  angles,  calculate  the  means  of  corresponding  views  of  retrieved  K  nearest  neighbors

Page 21: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

Comparing  RNNs

• Perform  16-­‐step  rotations  using  RNNs  at  different  curriculum  training  stages

Page 22: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

Comparing  RNNs  with  KNNs

KNN

RNN

KNN

RNN

KNN

RNN

KNN

RNN

Page 23: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

Comparing  RNNs  with  KNNs

Page 24: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

Cross-­‐View  Chair  Recognition• The  same  setup  as  the  Multi-­‐PIE  dataset• Compared  to  VGG-­‐16  CNN

Average  success  rates:  RNN:  56.8CNN:  52.2

Page 25: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

Class  Interpolation  and  View  Synthesis

• Given  two  chair  images  of  same  view  from  different  classes,  – the  encoder  is  used  to  compute  their  identity  units  z1id,  z2id   and  pose  units  z1pose,  z2pose ,  

– the  interpolation  is  given  by  b  =  [0.0,  0.2,  …,  0.8,  1.0]zid =  b  *z1id +  (1-­‐b)  *  z2id,zpose =  b  *z1pose +  (1-­‐b)  *  z2pose  ,  

– Zid and  zpose are  fed  into  the  recurrent  decoder  to  render  novel  images

Page 26: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

Class  Interpolation  and  View  Synthesis

• Each  column:  pose  manifold  traversal• Each  row:  style  manifold  traversal

Page 27: Learning(to(Rotate(3D( Objects: Weakly8supervised ......Human(3D(Vision • Mental(rotation(of(three(dimensional(objects, Shepard(and(Metzler,Science,1971 – People(have(the(ability(to(rotate(two(objects(in(their

Concluding  Remarks

• High-­‐quality  3D  view  synthesis  is  achieved  with  general  deep  convolutional  networks  from  a  single  image

• Disentangled  representations  are  learned  with  recurrent  transformations  without  class  labels.– Cross-­‐view  object  recognition– Chair  interpolation

• Curriculum  strategy  helps  with  RNN  training


Recommended