+ All Categories
Home > Documents > Shallow vs. Deep Sum-Product...

Shallow vs. Deep Sum-Product...

Date post: 10-Jul-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
28
Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer Engineering University of Illinois, Urbana-Champaign Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 1 / 19
Transcript
Page 1: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Shallow vs. Deep Sum-Product Networks

Sai Kiran Burle

Electrical and Computer EngineeringUniversity of Illinois, Urbana-Champaign

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 1 / 19

Page 2: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

References

Shallow vs. Deep Sum-Product Networks. Olivier Delalleau, YoshuaBengio

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 2 / 19

Page 3: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Outline

1 IntroductionMotivationSum-product NetworkMain Results

2 Outline of proof

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 3 / 19

Page 4: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Introduction Motivation

Outline

1 IntroductionMotivationSum-product NetworkMain Results

2 Outline of proof

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 4 / 19

Page 5: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Introduction Motivation

Motivation

Deep learning algorithms are based on multiple levels ofrepresentation, corresponding to a deep circuit

It has been suggested that deep architectures are more powerful inthe sense of being able to more efficiently represent highly-varyingfunctions

In the context of this presentation, ”efficiency” refers to

Lower memory usage.Lower computation cost.

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 5 / 19

Page 6: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Introduction Motivation

Motivation

Deep learning algorithms are based on multiple levels ofrepresentation, corresponding to a deep circuit

It has been suggested that deep architectures are more powerful inthe sense of being able to more efficiently represent highly-varyingfunctions

In the context of this presentation, ”efficiency” refers to

Lower memory usage.Lower computation cost.

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 5 / 19

Page 7: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Introduction Motivation

Motivation

Deep learning algorithms are based on multiple levels ofrepresentation, corresponding to a deep circuit

It has been suggested that deep architectures are more powerful inthe sense of being able to more efficiently represent highly-varyingfunctions

In the context of this presentation, ”efficiency” refers to

Lower memory usage.Lower computation cost.

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 5 / 19

Page 8: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Introduction Motivation

Motivation

There are multiple claims that polynomials represented by deepsumproduct networks would be more efficient, but no proof

This work aims at showing families of circuits for which a deeparchitecture can be exponentially more efficient than a shallow one, inthe context of polynomials

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 6 / 19

Page 9: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Introduction Motivation

Motivation

There are multiple claims that polynomials represented by deepsumproduct networks would be more efficient, but no proof

This work aims at showing families of circuits for which a deeparchitecture can be exponentially more efficient than a shallow one, inthe context of polynomials

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 6 / 19

Page 10: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Introduction Sum-product Network

Outline

1 IntroductionMotivationSum-product NetworkMain Results

2 Outline of proof

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 7 / 19

Page 11: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Introduction Sum-product Network

Sum-Product Network

Definition 1

A Sum-Product Network is a network composed of units that eithercompute the product of their inputs or a weighted sum of their inputs(where the weights are strictly positive).

Definition 2

The depth of a Sum-Product Network is the length of the longest pathfrom the input unit to the output unit.

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 8 / 19

Page 12: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Introduction Sum-product Network

Sum-Product Network

Definition 1

A Sum-Product Network is a network composed of units that eithercompute the product of their inputs or a weighted sum of their inputs(where the weights are strictly positive).

Definition 2

The depth of a Sum-Product Network is the length of the longest pathfrom the input unit to the output unit.

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 8 / 19

Page 13: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Introduction Sum-product Network

Sum-Product Network

Definition 1

A Sum-Product Network is a network composed of units that eithercompute the product of their inputs or a weighted sum of their inputs(where the weights are strictly positive).

Definition 2

The depth of a Sum-Product Network is the length of the longest pathfrom the input unit to the output unit.

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 8 / 19

Page 14: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Introduction Sum-product Network

Sum-Product Network

Definition 3

A Sum-Product Network is called shallow if the network contains only asingle hidden layer. (i.e. a depth equal to two)

Definition 4

A Sum-Product Network is called deep if the network contains more thanone hidden layer. (i.e. a depth of at least three)

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 9 / 19

Page 15: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Introduction Sum-product Network

Sum-Product Network

Definition 3

A Sum-Product Network is called shallow if the network contains only asingle hidden layer. (i.e. a depth equal to two)

Definition 4

A Sum-Product Network is called deep if the network contains more thanone hidden layer. (i.e. a depth of at least three)

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 9 / 19

Page 16: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Introduction Main Results

Outline

1 IntroductionMotivationSum-product NetworkMain Results

2 Outline of proof

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 10 / 19

Page 17: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Introduction Main Results

Main results

Theorem 5

A certain class of functions F of n inputs can be represented using a deepnetwork with O(n) units, whereas it would require O(2

√n) units for a

shallow network.

Theorem 6

For a certain class of functions G of n inputs, the deep sum-productnetwork with depth k can be represented with O(nk) units, whereas itwould require O((n − 1)k) units for a shallow network.

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 11 / 19

Page 18: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Introduction Main Results

Main results

Theorem 5

A certain class of functions F of n inputs can be represented using a deepnetwork with O(n) units, whereas it would require O(2

√n) units for a

shallow network.

Theorem 6

For a certain class of functions G of n inputs, the deep sum-productnetwork with depth k can be represented with O(nk) units, whereas itwould require O((n − 1)k) units for a shallow network.

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 11 / 19

Page 19: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Outline of proof

The family F

F is a class of functions with n inputs, built from deep sumproductnetworks that alternate layers of product and sum units with two inputseach.

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 12 / 19

Page 20: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Outline of proof

The family F

F is a class of functions with n inputs, built from deep sumproductnetworks that alternate layers of product and sum units with two inputseach.

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 12 / 19

Page 21: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Outline of proof

The family F

The basic idea we use here is that composing layers (i.e. using a deeparchitecture) is equivalent to using a factorized representation of thepolynomial function computed by the network.

Such a factorized representation can be exponentially more compactthan its expansion as a sum of products (which can be associated to ashallow network with product units in its hidden layer and a sum unitas output).

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 13 / 19

Page 22: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Outline of proof

The family F

Lemma 7

The number of products in the sum computed in the output unit of anetwork computing a function in F is 2

√n−1

Lemma 8

Any shallow sum-product network computing f ∈ F must have a “sum”unit as output.

Lemma 9

Any shallow sum-product network computing f ∈ F must have onlymultiplicative units in its hidden layer.

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 14 / 19

Page 23: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Outline of proof

The family F

Corollary 10

Any shallow sum-product network computing f ∈ F must have at least2√n−1 hidden units.

This proves Theorem 5.

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 15 / 19

Page 24: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Outline of proof

The family G

Definition 11

Networks in family G also alternate sum and product layers, but their unitshave as inputs all units from the previous layer except one.

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 16 / 19

Page 25: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Outline of proof

The family G

Definition 11

Networks in family G also alternate sum and product layers, but their unitshave as inputs all units from the previous layer except one.

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 16 / 19

Page 26: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Outline of proof

The family G

Lemma 12

The output g of a sum-product network in G, with n inputs and k layers,when expanded as a sum of products, contains all products of variables ofthe form Πn

t=1xαtt such that αt ∈ N and

∑t αt = (n − 1)k .

Corollary 13

Any shallow sum-product network computing g ∈ G must have at least(n − 1)k hidden units.

This proves Theorem 6.

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 17 / 19

Page 27: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Summary

Summary

Some deep sum-product networks with n inputs and depth log n canrepresent with O(n) units what would require O(2

√n) units for a

depth-2 network.

Some deep sum-product networks with n inputs and depth k canrepresent with O(nk) units what would require O((n − 1)k) units fora depth-2 network.

Future work

Finding more general parameterization of functions leading to similarresults would be an interesting topic.Another open question is whether it is possible to represent suchfunctions only approximately.

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 18 / 19

Page 28: Shallow vs. Deep Sum-Product Networksswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide6.pdf · Shallow vs. Deep Sum-Product Networks Sai Kiran Burle Electrical and Computer

Summary

Questions?

Sai Kiran Burle Shallow vs. Deep Sum-Product Networks 19 / 19


Recommended