Shallow vs. Deep Sum-Product...

Shallow vs. Deep Sum-Product Networks

Sai Kiran Burle

Electrical and Computer EngineeringUniversity of Illinois, Urbana-Champaign

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 1 / 19

References

Shallow vs. Deep Sum-Product Networks. Olivier Delalleau, YoshuaBengio


Outline

1 IntroductionMotivationSum-product NetworkMain Results

2 Outline of proof


Introduction Motivation

Outline


2 Outline of proof



Motivation

Deep learning algorithms are based on multiple levels ofrepresentation, corresponding to a deep circuit

It has been suggested that deep architectures are more powerful inthe sense of being able to more efficiently represent highly-varyingfunctions

In the context of this presentation, ”efficiency” refers to

Lower memory usage.Lower computation cost.



Motivation







Motivation







Motivation

There are multiple claims that polynomials represented by deepsumproduct networks would be more efficient, but no proof

This work aims at showing families of circuits for which a deeparchitecture can be exponentially more efficient than a shallow one, inthe context of polynomials



Motivation

There are multiple claims that polynomials represented by deepsumproduct networks would be more efficient, but no proof

This work aims at showing families of circuits for which a deeparchitecture can be exponentially more efficient than a shallow one, inthe context of polynomials


Introduction Sum-product Network

Outline


2 Outline of proof



Sum-Product Network

Definition 1

A Sum-Product Network is a network composed of units that eithercompute the product of their inputs or a weighted sum of their inputs(where the weights are strictly positive).

Definition 2

The depth of a Sum-Product Network is the length of the longest pathfrom the input unit to the output unit.



Sum-Product Network

Definition 1


Definition 2




Sum-Product Network

Definition 1


Definition 2




Sum-Product Network

Definition 3

A Sum-Product Network is called shallow if the network contains only asingle hidden layer. (i.e. a depth equal to two)

Definition 4

A Sum-Product Network is called deep if the network contains more thanone hidden layer. (i.e. a depth of at least three)



Sum-Product Network

Definition 3

A Sum-Product Network is called shallow if the network contains only asingle hidden layer. (i.e. a depth equal to two)

Definition 4

A Sum-Product Network is called deep if the network contains more thanone hidden layer. (i.e. a depth of at least three)


Introduction Main Results

Outline


2 Outline of proof



Main results

Theorem 5

A certain class of functions F of n inputs can be represented using a deepnetwork with O(n) units, whereas it would require O(2

√n) units for a

shallow network.

Theorem 6

For a certain class of functions G of n inputs, the deep sum-productnetwork with depth k can be represented with O(nk) units, whereas itwould require O((n − 1)k) units for a shallow network.



Main results

Theorem 5

A certain class of functions F of n inputs can be represented using a deepnetwork with O(n) units, whereas it would require O(2

√n) units for a

shallow network.

Theorem 6

For a certain class of functions G of n inputs, the deep sum-productnetwork with depth k can be represented with O(nk) units, whereas itwould require O((n − 1)k) units for a shallow network.


Outline of proof

The family F

F is a class of functions with n inputs, built from deep sumproductnetworks that alternate layers of product and sum units with two inputseach.


Outline of proof

The family F

F is a class of functions with n inputs, built from deep sumproductnetworks that alternate layers of product and sum units with two inputseach.


Outline of proof

The family F

The basic idea we use here is that composing layers (i.e. using a deeparchitecture) is equivalent to using a factorized representation of thepolynomial function computed by the network.

Such a factorized representation can be exponentially more compactthan its expansion as a sum of products (which can be associated to ashallow network with product units in its hidden layer and a sum unitas output).


Outline of proof

The family F

Lemma 7

The number of products in the sum computed in the output unit of anetwork computing a function in F is 2

√n−1

Lemma 8

Any shallow sum-product network computing f ∈ F must have a “sum”unit as output.

Lemma 9

Any shallow sum-product network computing f ∈ F must have onlymultiplicative units in its hidden layer.


Outline of proof

The family F

Corollary 10

Any shallow sum-product network computing f ∈ F must have at least2√n−1 hidden units.

This proves Theorem 5.


Outline of proof

The family G

Definition 11

Networks in family G also alternate sum and product layers, but their unitshave as inputs all units from the previous layer except one.


Outline of proof

The family G

Definition 11

Networks in family G also alternate sum and product layers, but their unitshave as inputs all units from the previous layer except one.


Outline of proof

The family G

Lemma 12

The output g of a sum-product network in G, with n inputs and k layers,when expanded as a sum of products, contains all products of variables ofthe form Πn

t=1xαtt such that αt ∈ N and

∑t αt = (n − 1)k .

Corollary 13

Any shallow sum-product network computing g ∈ G must have at least(n − 1)k hidden units.

This proves Theorem 6.


Summary

Summary

Some deep sum-product networks with n inputs and depth log n canrepresent with O(n) units what would require O(2

√n) units for a

depth-2 network.

Some deep sum-product networks with n inputs and depth k canrepresent with O(nk) units what would require O((n − 1)k) units fora depth-2 network.

Future work

Finding more general parameterization of functions leading to similarresults would be an interesting topic.Another open question is whether it is possible to represent suchfunctions only approximately.


Summary

Questions?


Date post:	10-Jul-2020
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

Shallow vs. Deep Sum-Product...

Documents