+ All Categories
Home > Documents > New Geometric Disentanglement for Generative Latent Shape Modelstaumen/pubs/iccv-2019-poster.pdf ·...

New Geometric Disentanglement for Generative Latent Shape Modelstaumen/pubs/iccv-2019-poster.pdf ·...

Date post: 21-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
1
Geometric Disentanglement for Generative Latent Shape Models Tristan Aumentado-Armstrong, Stavros Tsogkas, Allan Jepson, Sven Dickinson University of Toronto Vector Institute for AI Samsung AI Center, Toronto Factorizing pose & shape Background: Intrinsic Geometry Geometrically Disentangled VAE: Model Architecture Results: Pose-Aware Retrieval References Laplace-Beltrami Operator (LBO): Δ g (ℎ) ≈ : - Captures intrinsic geometry (vector signature). - Isometry invariant (e.g., ~ignores articulation). [1]: Achlioptas et al, Learning Representations and Generative Models for 3D Point Clouds, ICLR, 2018. [2]: Esmaeili et al, Structured Disentangled Representations, AISTATS, 2019. Hierarchically Factorized (HF) VAE loss [2]: Penalty on Jacobian of each latent group with respect to another: VAE Loss Function SMAL and SMPL have separate shape and pose parameters, so we can compute separate retrieval errors for each ( and ). Ideally, should have high and low ; should have high and low . Comparisons: full AE and VAE latent vectors. Errors SMAL 0.641 0.743 0.975 0.645 0.938 0.983 0.983 0.993 SMPL 0.856 0.922 0.997 0.928 0.577 0.726 0.709 0.947 Generation/Sampling Autoencoding VAE Operations Failure Modes Datasets: point clouds derived from MNIST, DYNA, SMAL, and SMPL. Interpolations: visualizing movement in and independently largely shows the latter controls intrinsic shape, while the former controls pose. Latent digit subspace traversal can roughly separate shape and style. Each row traverses the marked set. Results: Latent Manipulations Latent interpolations between blue-coloured shapes (SMAL & SMPL). Note: upper-right and lower-left shapes are latent pose/deformation transfers. Vertical: movement in ; horizontal: movement in . Enables vision, graphics, & robotics tasks (e.g., pose-invariant recognition, manipulation, constrained inference, retrieval, pose transfer). Spectral methods also factorize pose and shape by separating extrinsic from intrinsic geometry. Goal : disentangle extrinsic pose and intrinsic shape in the latent representation of a 3D point cloud, without annotations. Changing intrinsic shape Metric-altering deformations; they often change the object class/identity. Changing extrinsic pose Often intra-class deformations: rigid transforms, articulation, style-like. Observed trade-off between reconstruction fidelity, prior matching, and disentanglement. Query Retrieval with Retrieval with Predicted Spectrum Rotation VAE Space , , =: AE Space (PointNet) Input: Point cloud Output: Reconstructed point cloud Similar body shape, different pose Close in space Unintuitive intermediates Incomplete disentanglement Encoding failure Model: two-level VAE (as in [1]). Pretrain AE space independently. Disentanglement with constricted dim() forces pose into . Encode into Vary rigid pose Vary non- rigid pose Vary intrinsic shape
Transcript
  • Geometric Disentanglement for Generative Latent Shape Models

    Tristan Aumentado-Armstrong, Stavros Tsogkas, Allan Jepson, Sven Dickinson

    University of Toronto Vector Institute for AI Samsung AI Center, Toronto

    Factorizing pose & shape

    Background: Intrinsic Geometry

    Geometrically Disentangled VAE: Model Architecture Results: Pose-Aware Retrieval

    References

    • Laplace-Beltrami Operator (LBO): Δg• 𝐿𝐵𝑂𝑆𝑝𝑒𝑐𝑡𝑟𝑢𝑚(𝑠ℎ𝑎𝑝𝑒) ≈ መ𝜆:

    - Captures intrinsic geometry (vector signature).

    - Isometry invariant (e.g., ~ignores articulation).

    [1]: Achlioptas et al, Learning Representations and Generative Models for

    3D Point Clouds, ICLR, 2018.

    [2]: Esmaeili et al, Structured Disentangled Representations, AISTATS, 2019.

    Hierarchically Factorized (HF) VAE loss [2]:

    Penalty on Jacobian of each latent

    group with respect to another:

    VAE Loss

    Function

    • SMAL and SMPL have separate shape 𝛽 and

    pose 𝜃 parameters, so we can compute separate

    retrieval errors for each (𝐸𝛽 and 𝐸𝜃).

    • Ideally, 𝑧𝐼 should have high 𝐸𝜃 and low 𝐸𝛽; 𝑧𝐸should have high 𝐸𝛽 and low 𝐸𝜃.

    • Comparisons: full AE 𝑋 and VAE 𝑧 latent vectors.

    Errors 𝑋 𝑧 𝑧𝐸 𝑧𝐼

    SMAL𝐸𝛽 0.641 0.743 0.975 0.645

    𝐸𝜃 0.938 0.983 0.983 0.993

    SMPL𝐸𝛽 0.856 0.922 0.997 0.928

    𝐸𝜃 0.577 0.726 0.709 0.947

    Generation/Sampling

    Autoencoding

    VAE Operations

    Failure Modes

    Datasets: point clouds derived from MNIST, DYNA, SMAL, and SMPL.

    Interpolations: visualizing movement in 𝑧𝐸 and 𝑧𝐼 independently largely

    shows the latter controls intrinsic shape, while the former controls pose.

    Latent digit subspace traversal can

    roughly separate shape and style.

    Each row traverses the marked set.

    𝑧

    𝑧𝐼

    𝑧𝐸

    𝑧

    𝑧𝐼𝑧𝐸

    Results: Latent Manipulations

    Latent interpolations between blue-coloured shapes (SMAL & SMPL). Note: upper-right and lower-left

    shapes are latent pose/deformation transfers. Vertical: movement in 𝑧𝐸 ; horizontal: movement in 𝑧𝐼.

    𝑧𝐼

    𝑧𝐸

    • Enables vision, graphics, & robotics tasks (e.g.,

    pose-invariant recognition, manipulation,

    constrained inference, retrieval, pose transfer).

    • Spectral methods also factorize pose and shape

    by separating extrinsic from intrinsic geometry.

    Goal: disentangle extrinsic pose and

    intrinsic shape in the latent representation

    of a 3D point cloud, without annotations.

    Changing intrinsic shapeMetric-altering deformations; they

    often change the object class/identity.

    Changing extrinsic poseOften intra-class deformations: rigid

    transforms, articulation, style-like.

    Observed trade-off between reconstruction fidelity,

    prior matching, and disentanglement.

    Query Retrieval with 𝑧𝐸 Retrieval with 𝑧𝐼

    Predicted

    Spectrum

    Rotation

    VAE Space

    𝑧𝑅, 𝑧𝐸 , 𝑧𝐼 =: 𝑧

    AE Space

    (PointNet)

    Input: Point cloud 𝑃

    Output: Reconstructed point cloud 𝑃

    Similar body shape, different pose ≈ Close in መ𝜆 space

    Unintuitive

    intermediates

    Incomplete

    disentanglement

    Encoding

    failure

    • Model: two-level VAE (as in [1]).

    • Pretrain AE space independently.

    • Disentanglement with constricted

    dim(𝑧) forces pose into 𝑧𝐸.

    Encode

    into 𝑧𝑧𝐸

    𝑧𝐼

    𝑧𝑅

    Vary rigid

    pose

    Vary non-

    rigid pose

    Vary intrinsic

    shape


Recommended