+ All Categories
Home > Documents > TWO-STREA MCON VOLUTIONA LNETWORKS FO … · TWO-STREA MCON VOLUTIONA LNETWORKS FO RDYNAMICTEXTUR...

TWO-STREA MCON VOLUTIONA LNETWORKS FO … · TWO-STREA MCON VOLUTIONA LNETWORKS FO RDYNAMICTEXTUR...

Date post: 03-Aug-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
1
[1] Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. Texture synthesis using convolutional neural networks. NIPS 2015. [2] Dmitry Ulyanov, Vadim Lebedev, Andrea Vedaldi, and Victor S. Lempitsky. Texture networks: Feed-forward synthesis of textures and stylized images. ICML 2016. [3] Konstantinos G. Derpanis and Richard P. Wildes. Spacetime texture representation and recognition based on a spatiotemporal orientation analysis. PAMI 2012. [4] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014. fireplace_1 (original) fireplace_1 (synthesized) fish (original) appearance stream only both streams Dynamic Texture Synthesis Dynamics Style Transfer synthesized output dynamics target appearance target Dynamics Stream network trained for optical flow prediction in an appearance-invariant manner. Models the dynamics of the input dynamic texture. Dynamics Stream encode encode encode input (Nx2xHxWx1) contrast norm conv (3x3) 64 filters conv (1x1) 2 filters ReLU flow decode aEPE target flow channel concat (x2) (x4) conv (2x11x11) 32 filters rectify max stride 1 pool (5x5) conv (1x1) 64 filters L 1 norm encode (x½) (x½) (VGG19 [4]) Overall Architecture Iteratively coerce an initial Gaussian noise sequence such that its spatiotemporal statistics from each stream match those of an inputs dynamic texture. This is done by optimizing (3) w.r.t. the spacetime volume (initially noise). (3) (2) where is the number of ConvNet layers being used in the appearance stream, models the synthesized texture dynamics, and models the target texture dynamics averaged across time. (1) where is the number of ConvNet layers being used in the dynamics stream, is the number of generated frames, is the Frobenius norm, is the Gram matrix that models the synthesized texture appearance, and models the target texture appearance averaged across time. 1. Motivated by the recent successes in texture synthesis using ConvNets [1, 2], we present a novel, two-stream model of dynamic texture synthesis to capture both appearance and dynamics. 2. A novel network architecture (which is motivated by the spacetime oriented energy model of [3]) designed to compute optical flow in an appearance-invariant manner, serving as the dynamics stream of our dynamic texture synthesis model. 3. A two-stream model that enables dynamics style transfer, where the appearance and dynamics from different sources can be combined to generate a novel texture. We introduce a two-stream model for dynamic texture synthesis based on pre-trained convolutional networks (ConvNets) that target two independent tasks: object recognition and optical flow prediction. Given an input dynamic texture, the object recognition ConvNet models the per-frame appearance of the input texture, while the optical flow ConvNet models its dynamics. To generate a novel texture, a noise sequence is optimized to match the feature statistics from each stream of the input texture. TWO-STREAM CONVOLUTIONAL NETWORKS FOR DYNAMIC TEXTURE SYNTHESIS MatthewTesfaldet, MarcusA.Brubaker - York University; Konstantinos G. Derpanis - Ryerson University {mtesfald, mab}@eecs.yorku.ca, [email protected]
Transcript
Page 1: TWO-STREA MCON VOLUTIONA LNETWORKS FO … · TWO-STREA MCON VOLUTIONA LNETWORKS FO RDYNAMICTEXTUR ESYNTHESIS Matth ewTesfaldet,MarcusA.Bruba ker-YorkUniversity; KonstantinosG.De rpanis-RyersonUniversity

[1] Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. Texture synthesis using convolutional neural networks. NIPS 2015.[2] Dmitry Ulyanov, Vadim Lebedev, Andrea Vedaldi, and Victor S. Lempitsky. Texture networks: Feed-forward synthesis of textures and stylized images. ICML 2016.[3] Konstantinos G. Derpanis and Richard P. Wildes. Spacetime texture representation and recognition based on a spatiotemporal orientation analysis. PAMI 2012.[4] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014.

fireplace_1(original)

fireplace_1(synthesized)

fish(original)

appearancestream only

both streams

Dynamic Texture Synthesis Dynamics Style Transfer

synthesized output

dynamics targetappearancetarget

Dynamics Stream network trained for optical flow prediction inan appearance-invariant manner. Models the dynamics of theinput dynamic texture.

Dynamics Stream

encode

encode

encode

input(Nx2xHxWx1)

contrastnorm

conv(3x3)

64 filters

conv(1x1)2 filters

ReLU flow

decode

aEPE

targetflow

channelconcat

(x2)

(x4)

conv(2x11x11)32 filters

rectifymax

stride 1

pool(5x5)

conv(1x1)

64 filters

L1norm

encode

(x½)

(x½)

(VGG19 [4])

Overall Architecture

Iteratively coerce an initial Gaussian noise sequence such thatits spatiotemporal statistics from each stream match those of aninputs dynamic texture. This is done by optimizing (3) w.r.t. thespacetime volume (initially noise).

(3)

(2) where is the number ofConvNet layers being used in the

appearance stream, models the synthesized texture dynamics, and modelsthe target texture dynamics averaged across time.

(1) where is the number of ConvNetlayers being used in the dynamics

stream, is the number of generated frames, is the Frobenius norm, isthe Gram matrix that models the synthesized texture appearance, and modelsthe target texture appearance averaged across time.

1. Motivated by the recent successes in texture synthesis usingConvNets [1, 2], we present a novel, two-stream model ofdynamic texture synthesis to capture both appearanceand dynamics.

2. A novel network architecture (which is motivated by thespacetime oriented energy model of [3]) designed to computeoptical flow in an appearance-invariant manner, servingas the dynamics stream of our dynamic texture synthesismodel.

3. A two-stream model that enables dynamics styletransfer, where the appearance and dynamics from differentsources can be combined to generate a novel texture.

We introduce a two-stream model for dynamic texture synthesis based on pre-trained convolutional networks (ConvNets) that targettwo independent tasks: object recognition and optical flow prediction. Given an input dynamic texture, the object recognitionConvNet models the per-frame appearance of the input texture, while the optical flow ConvNet models its dynamics. To generatea novel texture, a noise sequence is optimized to match the feature statistics from each stream of the input texture.

TWO-STREAM CONVOLUTIONAL NETWORKSFOR DYNAMIC TEXTURE SYNTHESISMatthewTesfaldet, MarcusA.Brubaker - York University; Konstantinos G. Derpanis - Ryerson University{mtesfald, mab}@eecs.yorku.ca, [email protected]

Recommended