+ All Categories
Home > Documents > Generic Deep Networks with Wavelet Scattering · Generic Deep Networks with Wavelet Scattering...

Generic Deep Networks with Wavelet Scattering · Generic Deep Networks with Wavelet Scattering...

Date post: 02-May-2020
Category:
Upload: others
View: 9 times
Download: 0 times
Share this document with a friend
1
Generic Deep Networks with Wavelet Scattering Edouard Oyallon, St ´ ephane Mallat and Laurent Sifre DATA, D ´ epartement Informatique, Ecole Normale Sup ´ erieure DATA Scattering network as Deep architecture We build a 2 layers network without training and which achieves similar performances with a convolutional network pretrained on ImageNet (Alex CNN [1]). Via groups acting on images, scattering network creates a representation Φ invariants to: rotation translation. Other properties: discriminability of colors stability to small deformations [2]. Deep scattering representation A scattering transform is the cascading of linear wavelet transform W n and modulus non-linearities .: x // Pooling W 1 // U 1 x Pooling // W 2 // U 2 x // Pooling ... S 0 x S 1 x S 2 x Pooling is Average-Pooling (Avg) or the Max-Pooling (Max), defined on blocks of size 2 J . The first linear operator is a convolutional wavelet transform along space: U 1 x(u1 , j 1 )= x ψ 1 θ 1 ,j 1 (u) -j 1 θ 1 Complex wavelets. Phase is given by color, amplitude by contrast. The second linear operator is a wavelet transform along angles and space applied on U 1 and performed with a separable convolution : U 2 x = U 1 x ψ 2 θ 2 ,j 2 ,k 2 ψ 2 θ 2 ,j 2 ,k 2 (u1 )= ψ 1 θ 2 ,j 2 (u) ψ b k 2 (θ 1 ) u 1 u 2 θ 1 θ 1 ψ 1 w 1 Convolution of U 1 x using separable wavelets Scattering coefficients are then Sx ={S 0 x, S 1 x, S 2 x} Color discriminability Image x is separated into 3 color channels, x Y , x U , x V . The final image representation is the aggregation of the scattering coefficients of each channels: Φx ={Sx Y , Sx U , Sx V } Cat Y channel U channel V channel Classifier {x 1 , ..., x M } // Φ // {Φx 1 , ..., Φx N } // Standardization // Linear SVM Computation of the representations. Standardization: normalization of the mean and variance. Fed to a linear kernel SVM. Numerical results 5 splits on Caltech-101 and Caltech-256. Image inputs: 256 × 256, J = 6, 8 angles, final descriptor size is 1.1 × 10 5 . Samples from Caltech-101 and Caltech-256 Caltech-101 (101 classes, 10 4 images) Architecture Layers Accuracy Alex CNN 1 44.8 ± 0.8 Scattering, Avg 1 54.6 ± 1.2 Scattering, Max 1 55.0 ± 0.6 LLC 2 73.4 Alex CNN 2 66.2 ± 0.5 Scattering, Avg 2 68.9 ± 0.5 Scattering,Max 2 68.7 ± 0.5 Alex CNN 7 85.5 ± 0.4 Caltech-256 (256 classes, 3.10 4 images) Architecture Layers Accuracy Alex CNN 1 24.6 ± 0.4 Scattering, Avg 1 23.5 ± 0.5 Scattering, Max 1 25.6 ± 0.2 LLC 2 47.7 Alex CNN 2 39.6 ± 0.3 Scattering, Avg 2 39.0 ± 0.5 Scattering, Max 2 37.2 ± 0.5 Alex CNN 7 72.6 ± 0.2 Comparison with other architecture LLC[3] is a two layers architecture with SIFT + unsupervised dictionary learning (specific to the dataset). Scattering performs similarly to Alex CNN on 2 layers [4]. Main differences with Alex CNN No learning step AvgMax No contrast normalization Complex wavelets instead of real filters Modulus (l 2 -pooling) instead of ReLu Separable filters (tensor structure). Open questions Predefined VS learned W 1 W 2 ... W N SVM Hardcoded Learned Until which depth n N can we avoid learning? Max Pooling VS Avg Pooling Conclusion & future work Scattering network provides an efficient initialization of the first two layers of a network. Optimizing scale invariance. Designing a third layer? Contacts Website of the team: http://www.di.ens.fr/data/ Edouard Oyallon, [email protected] References [1]A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, pages 1106–1114, 2012. [2] S. Mallat. Group invariant scattering. Communications on Pure and Applied Mathematics, 65(10):1331–1398, 2012. [3]J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 3360–3367. IEEE, 2010. [4]M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional neural networks. arXiv preprint arXiv:1311.2901, 2013.
Transcript
Page 1: Generic Deep Networks with Wavelet Scattering · Generic Deep Networks with Wavelet Scattering Edouard Oyallon, Stephane Mallat and Laurent Sifre´ DATA, Departement Informatique,

Generic Deep Networks with Wavelet ScatteringEdouard Oyallon, Stephane Mallat and Laurent Sifre

DATA, Departement Informatique, Ecole Normale SuperieureDATA

Scattering network as Deep architecture

▸We build a 2 layers network without training and which achieves similarperformances with a convolutional network pretrained on ImageNet (AlexCNN [1]).

▸Via groups acting on images, scatteringnetwork creates a representation Φinvariants to:▸ rotation▸ translation.

▸Other properties:▸ discriminability of colors▸ stability to small deformations [2].

Deep scattering representation

▸A scattering transform is the cascading of linear wavelet transform Wn and

modulus non-linearities ∣.∣:

x //

Pooling��

∣W1∣ //U1xPooling��

// ∣W2∣ //U2x //

Pooling��

...

S0x S1x S2xPooling is Average-Pooling (Avg) or the Max-Pooling (Max), defined onblocks of size 2J.▸ The first linear operator is a convolutional wavelet transform along space:

U1x(u, θ1, j1) = ∣ x ⋆ ψ1θ1,j1 ∣ (u)

−j1

θ1

Complex wavelets. Phase is given by color, amplitude by contrast.▸ The second linear operator is a wavelet transform along angles and spaceapplied on U1 and performed with a separable convolution ⍟:

U2x = ∣ U1x ⍟ ψ2θ2,j2,k2

ψ2θ2,j2,k2

(u, θ1) = ψ1θ2,j2(u) ψ

bk2(θ1)

u1 u2

θ1 θ1

ψ1w1

Convolution of U1x using separablewavelets

▸ Scattering coefficients are then

Sx = {S0x,S1x,S2x}

Color discriminability

Image x is separated into 3 color channels, xY, xU, xV. The final imagerepresentation is the aggregation of the scattering coefficients of eachchannels:

Φx = {SxY,SxU,SxV}

Cat Y channel U channel V channel

Classifier

{x1, ..., xM} // Φ //{Φx1, ...,ΦxN} // Standardization // Linear SVM

▸Computation of the representations.▸Standardization: normalization of the mean and variance.▸Fed to a linear kernel SVM.

Numerical results

5 splits on Caltech-101 and Caltech-256.Image inputs: 256 × 256, J = 6, 8 angles, final descriptor size is 1.1 × 105.

Samples from Caltech-101 and Caltech-256

Caltech-101 (101 classes, 104 images)

Architecture Layers AccuracyAlex CNN 1 44.8 ± 0.8

Scattering, Avg 1 54.6 ± 1.2Scattering, Max 1 55.0 ± 0.6

LLC 2 73.4Alex CNN 2 66.2 ± 0.5

Scattering, Avg 2 68.9 ± 0.5Scattering,Max 2 68.7 ± 0.5

Alex CNN 7 85.5 ± 0.4

Caltech-256 (256 classes, 3.104 images)

Architecture Layers AccuracyAlex CNN 1 24.6 ± 0.4

Scattering, Avg 1 23.5 ± 0.5Scattering, Max 1 25.6 ± 0.2

LLC 2 47.7Alex CNN 2 39.6 ± 0.3

Scattering, Avg 2 39.0 ± 0.5Scattering, Max 2 37.2 ± 0.5

Alex CNN 7 72.6 ± 0.2

Comparison with other architecture

▸LLC[3] is a two layers architecture with SIFT + unsupervised dictionarylearning (specific to the dataset).

▸Scattering performs similarly to Alex CNN on 2 layers [4].

Main differences with Alex CNN

▸No learning step▸Avg≈Max▸No contrast normalization

▸Complex wavelets instead of real filters▸Modulus (l2-pooling) instead of ReLu▸Separable filters (tensor structure).

Open questions

▸Predefined VS learned

∣W1∣ ∣W2∣ ... ∣WN∣ SVM

Hardcoded LearnedUntil which depth n ≤ N can we avoid learning?

▸Max Pooling VS Avg Pooling

Conclusion & future work

▸Scattering network provides an efficient initialization of the first two layers ofa network.

▸Optimizing scale invariance.▸Designing a third layer?

Contacts

▸Website of the team: http://www.di.ens.fr/data/▸Edouard Oyallon, [email protected]

References

[1] A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification withdeep convolutional neural networks. In Advances in Neural InformationProcessing Systems 25, pages 1106–1114, 2012.

[2] S. Mallat. Group invariant scattering. Communications on Pure andApplied Mathematics, 65(10):1331–1398, 2012.

[3] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Locality-constrainedlinear coding for image classification. In Computer Vision and PatternRecognition (CVPR), 2010 IEEE Conference on, pages 3360–3367. IEEE,2010.

[4] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutionalneural networks. arXiv preprint arXiv:1311.2901, 2013.

Recommended