Shallow vs. Deep Sum-Product Networks
Sai Kiran Burle
Electrical and Computer EngineeringUniversity of Illinois, Urbana-Champaign
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 1 / 19
References
Shallow vs. Deep Sum-Product Networks. Olivier Delalleau, YoshuaBengio
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 2 / 19
Outline
1 IntroductionMotivationSum-product NetworkMain Results
2 Outline of proof
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 3 / 19
Introduction Motivation
Outline
1 IntroductionMotivationSum-product NetworkMain Results
2 Outline of proof
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 4 / 19
Introduction Motivation
Motivation
Deep learning algorithms are based on multiple levels ofrepresentation, corresponding to a deep circuit
It has been suggested that deep architectures are more powerful inthe sense of being able to more efficiently represent highly-varyingfunctions
In the context of this presentation, ”efficiency” refers to
Lower memory usage.Lower computation cost.
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 5 / 19
Introduction Motivation
Motivation
Deep learning algorithms are based on multiple levels ofrepresentation, corresponding to a deep circuit
It has been suggested that deep architectures are more powerful inthe sense of being able to more efficiently represent highly-varyingfunctions
In the context of this presentation, ”efficiency” refers to
Lower memory usage.Lower computation cost.
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 5 / 19
Introduction Motivation
Motivation
Deep learning algorithms are based on multiple levels ofrepresentation, corresponding to a deep circuit
It has been suggested that deep architectures are more powerful inthe sense of being able to more efficiently represent highly-varyingfunctions
In the context of this presentation, ”efficiency” refers to
Lower memory usage.Lower computation cost.
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 5 / 19
Introduction Motivation
Motivation
There are multiple claims that polynomials represented by deepsumproduct networks would be more efficient, but no proof
This work aims at showing families of circuits for which a deeparchitecture can be exponentially more efficient than a shallow one, inthe context of polynomials
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 6 / 19
Introduction Motivation
Motivation
There are multiple claims that polynomials represented by deepsumproduct networks would be more efficient, but no proof
This work aims at showing families of circuits for which a deeparchitecture can be exponentially more efficient than a shallow one, inthe context of polynomials
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 6 / 19
Introduction Sum-product Network
Outline
1 IntroductionMotivationSum-product NetworkMain Results
2 Outline of proof
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 7 / 19
Introduction Sum-product Network
Sum-Product Network
Definition 1
A Sum-Product Network is a network composed of units that eithercompute the product of their inputs or a weighted sum of their inputs(where the weights are strictly positive).
Definition 2
The depth of a Sum-Product Network is the length of the longest pathfrom the input unit to the output unit.
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 8 / 19
Introduction Sum-product Network
Sum-Product Network
Definition 1
A Sum-Product Network is a network composed of units that eithercompute the product of their inputs or a weighted sum of their inputs(where the weights are strictly positive).
Definition 2
The depth of a Sum-Product Network is the length of the longest pathfrom the input unit to the output unit.
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 8 / 19
Introduction Sum-product Network
Sum-Product Network
Definition 1
A Sum-Product Network is a network composed of units that eithercompute the product of their inputs or a weighted sum of their inputs(where the weights are strictly positive).
Definition 2
The depth of a Sum-Product Network is the length of the longest pathfrom the input unit to the output unit.
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 8 / 19
Introduction Sum-product Network
Sum-Product Network
Definition 3
A Sum-Product Network is called shallow if the network contains only asingle hidden layer. (i.e. a depth equal to two)
Definition 4
A Sum-Product Network is called deep if the network contains more thanone hidden layer. (i.e. a depth of at least three)
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 9 / 19
Introduction Sum-product Network
Sum-Product Network
Definition 3
A Sum-Product Network is called shallow if the network contains only asingle hidden layer. (i.e. a depth equal to two)
Definition 4
A Sum-Product Network is called deep if the network contains more thanone hidden layer. (i.e. a depth of at least three)
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 9 / 19
Introduction Main Results
Outline
1 IntroductionMotivationSum-product NetworkMain Results
2 Outline of proof
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 10 / 19
Introduction Main Results
Main results
Theorem 5
A certain class of functions F of n inputs can be represented using a deepnetwork with O(n) units, whereas it would require O(2
√n) units for a
shallow network.
Theorem 6
For a certain class of functions G of n inputs, the deep sum-productnetwork with depth k can be represented with O(nk) units, whereas itwould require O((n − 1)k) units for a shallow network.
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 11 / 19
Introduction Main Results
Main results
Theorem 5
A certain class of functions F of n inputs can be represented using a deepnetwork with O(n) units, whereas it would require O(2
√n) units for a
shallow network.
Theorem 6
For a certain class of functions G of n inputs, the deep sum-productnetwork with depth k can be represented with O(nk) units, whereas itwould require O((n − 1)k) units for a shallow network.
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 11 / 19
Outline of proof
The family F
F is a class of functions with n inputs, built from deep sumproductnetworks that alternate layers of product and sum units with two inputseach.
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 12 / 19
Outline of proof
The family F
F is a class of functions with n inputs, built from deep sumproductnetworks that alternate layers of product and sum units with two inputseach.
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 12 / 19
Outline of proof
The family F
The basic idea we use here is that composing layers (i.e. using a deeparchitecture) is equivalent to using a factorized representation of thepolynomial function computed by the network.
Such a factorized representation can be exponentially more compactthan its expansion as a sum of products (which can be associated to ashallow network with product units in its hidden layer and a sum unitas output).
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 13 / 19
Outline of proof
The family F
Lemma 7
The number of products in the sum computed in the output unit of anetwork computing a function in F is 2
√n−1
Lemma 8
Any shallow sum-product network computing f ∈ F must have a “sum”unit as output.
Lemma 9
Any shallow sum-product network computing f ∈ F must have onlymultiplicative units in its hidden layer.
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 14 / 19
Outline of proof
The family F
Corollary 10
Any shallow sum-product network computing f ∈ F must have at least2√n−1 hidden units.
This proves Theorem 5.
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 15 / 19
Outline of proof
The family G
Definition 11
Networks in family G also alternate sum and product layers, but their unitshave as inputs all units from the previous layer except one.
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 16 / 19
Outline of proof
The family G
Definition 11
Networks in family G also alternate sum and product layers, but their unitshave as inputs all units from the previous layer except one.
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 16 / 19
Outline of proof
The family G
Lemma 12
The output g of a sum-product network in G, with n inputs and k layers,when expanded as a sum of products, contains all products of variables ofthe form Πn
t=1xαtt such that αt ∈ N and
∑t αt = (n − 1)k .
Corollary 13
Any shallow sum-product network computing g ∈ G must have at least(n − 1)k hidden units.
This proves Theorem 6.
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 17 / 19
Summary
Summary
Some deep sum-product networks with n inputs and depth log n canrepresent with O(n) units what would require O(2
√n) units for a
depth-2 network.
Some deep sum-product networks with n inputs and depth k canrepresent with O(nk) units what would require O((n − 1)k) units fora depth-2 network.
Future work
Finding more general parameterization of functions leading to similarresults would be an interesting topic.Another open question is whether it is possible to represent suchfunctions only approximately.
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 18 / 19
Summary
Questions?
Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 19 / 19