+ All Categories
Home > Documents > Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating...

Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating...

Date post: 20-May-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
30
Machine Learning Neural Networks: Introduction 1 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others
Transcript
Page 1: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

MachineLearning

NeuralNetworks:Introduction

1BasedonslidesandmaterialfromGeoffreyHinton,RichardSocher,DanRoth,Yoav Goldberg,ShaiShalev-Shwartz andShaiBen-David,andothers

Page 2: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Wherearewe?

Generallearningprinciples• Overfitting• Mistake-boundlearning• PAClearning,samplecomplexity• Hypothesischoice&VCdimensions• Trainingandgeneralizationerrors• RegularizedEmpiricalLoss

Minimization• BayesianLearning

Learningalgorithms• DecisionTrees• Perceptron• AdaBoost• SupportVectorMachines• NaïveBayes• LogisticRegression

4

Producelinearclassifiers

Page 3: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

NeuralNetworks

• Whatisaneuralnetwork?

• Predictingwithaneuralnetwork

• Trainingneuralnetworks

• Practicalconcerns

6

Page 4: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Thislecture

• Whatisaneuralnetwork?– Thehypothesisclass– Structure,expressiveness

• Predictingwithaneuralnetwork

• Trainingneuralnetworks

• Practicalconcerns

7

Page 5: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Wehaveseenlinearthresholdunits

11

features

dotproduct

threshold

Predictionsgn(&'( + *) = sgn(∑./0/ + *)

Learningvariousalgorithmsperceptron,SVM,logisticregression,…

ingeneral,minimizeloss

Butwheredotheseinputfeaturescomefrom?

Whatifthefeatureswereoutputsofanotherclassifier?

Page 6: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Featuresfromclassifiers

12

Page 7: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Featuresfromclassifiers

13

Page 8: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Featuresfromclassifiers

14

Eachoftheseconnectionshavetheirownweightsaswell

Page 9: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Featuresfromclassifiers

15

Page 10: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Featuresfromclassifiers

16

Thisisatwolayerfeedforwardneuralnetwork

Page 11: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Featuresfromclassifiers

17

Theoutputlayer

ThehiddenlayerTheinputlayer

Thisisatwolayerfeedforwardneuralnetwork

Thinkofthehiddenlayeraslearningagoodrepresentationoftheinputs

Page 12: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Featuresfromclassifiers

19

Thedotproductfollowedbythethresholdconstitutesaneuron

Fiveneuronsinthispicture(fourinhiddenlayerandoneoutput)

Thisisatwolayerfeedforwardneuralnetwork

Page 13: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Butwheredotheinputscomefrom?

20

Whatiftheinputsweretheoutputsofaclassifier?Theinputlayer

Wecanmakeathree layernetwork….Andsoon.

Page 14: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Letustrytoformalizethis

21

Page 15: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Neuralnetworks

Arobustapproachforapproximatingreal-valued,discrete-valuedorvectorvaluedfunctions

Amongthemosteffectivegeneralpurpose supervisedlearningmethodscurrentlyknown

Especiallyforcomplexandhardtointerpretdatasuchasreal-worldsensorydata

TheBackpropagationalgorithmforneuralnetworkshasbeenshownsuccessfulinmanypracticalproblems

Acrossvariousapplicationdomains

22

Page 16: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Artificialneurons

Functionsthatverylooselymimicabiologicalneuron

Aneuronacceptsacollectionofinputs(avectorx)andproducesanoutputby:

1. Applyingadotproductwithweightsw andaddingabiasb2. Applyinga(possiblynon-linear)transformationcalledanactivation

25

123423 = activation(&'( + *)

Page 17: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Artificialneurons

Functionsthatverylooselymimicabiologicalneuron

Aneuronacceptsacollectionofinputs(avectorx)andproducesanoutputby:

1. Applyingadotproductwithweightsw andaddingabiasb2. Applyinga(possiblynon-linear)transformationcalledanactivation

27

Dotproduct

Thresholdactivation

Otheractivationsarepossible

123423 = activation(&'( + *)

Page 18: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Activationfunctions

Nameoftheneuron Activationfunction:activation(;)Linearunit ;Threshold/sign unit sgn(;)

Sigmoidunit1

1 + exp(−;)Rectifiedlinearunit(ReLU) max(0, ;)Tanh unit tanh(;)

28

123423 = activation(&'( + *)

Manymoreactivationfunctionsexist(sinusoid,sinc,gaussian,polynomial…)

Alsocalledtransferfunctions

Page 19: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Aneuralnetwork

Afunctionthatconvertsinputstooutputsdefinedbyadirectedacyclicgraph

– Nodesorganizedinlayers,correspondtoneurons

– Edgescarryoutputofoneneurontoanother,associatedwithweights

• Todefineaneuralnetwork,weneedtospecify:– Thestructureofthegraph

• Howmanynodes,theconnectivity– Theactivationfunctiononeachnode– Theedgeweights

30

Input

Hidden

Output

wFGH

wFGI

Page 20: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Aneuralnetwork

Afunctionthatconvertsinputstooutputsdefinedbyadirectedacyclicgraph

– Nodesorganizedinlayers,correspondtoneurons

– Edgescarryoutputofoneneurontoanother,associatedwithweights

• Todefineaneuralnetwork,weneedtospecify:– Thestructureofthegraph

• Howmanynodes,theconnectivity– Theactivationfunctiononeachnode– Theedgeweights

31

Input

Hidden

Output

wFGH

wFGI

Page 21: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Aneuralnetwork

Afunctionthatconvertsinputstooutputsdefinedbyadirectedacyclicgraph

– Nodesorganizedinlayers,correspondtoneurons

– Edgescarryoutputofoneneurontoanother,associatedwithweights

• Todefineaneuralnetwork,weneedtospecify:– Thestructureofthegraph

• Howmanynodes,theconnectivity– Theactivationfunctiononeachnode– Theedgeweights

32

CalledthearchitectureofthenetworkTypicallypredefined,partofthedesignoftheclassifier

Input

Hidden

Output

wFGH

wFGI

Page 22: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Aneuralnetwork

Afunctionthatconvertsinputstooutputsdefinedbyadirectedacyclicgraph

– Nodesorganizedinlayers,correspondtoneurons

– Edgescarryoutputofoneneurontoanother,associatedwithweights

• Todefineaneuralnetwork,weneedtospecify:– Thestructureofthegraph

• Howmanynodes,theconnectivity– Theactivationfunctiononeachnode– Theedgeweights

33

CalledthearchitectureofthenetworkTypicallypredefined,partofthedesignoftheclassifier

Learnedfromdata

Input

Hidden

Output

wFGH

wFGI

Page 23: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Abriefhistoryofneuralnetworks

• 1943:McCulloughandPittsshowedhowlinearthresholdunitscancomputelogicalfunctions

• 1949:Hebbsuggestedalearningrulethathassomephysiologicalplausibility

• 1950s:Rosenblatt,thePeceptron algorithmforasinglethresholdneuron

• 1969:MinskyandPapert studiedtheneuronfromageometricalperspective

• 1980s:Convolutionalneuralnetworks(Fukushima,LeCun),thebackpropagationalgorithm(various)

• Early2000s-today:Morecompute,moredata,deepernetworks

34Seealso:http://people.idsia.ch/~juergen/deep-learning-overview.html

very

Page 24: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Whatfunctionsdoneuralnetworksexpress?

35

Page 25: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Asingleneuronwiththresholdactivation

36

Prediction=sgn(b+w1 x1 +w2x2)

++

++

+ +++

-- --

-- -- --

---- --

--

b+w1 x1 +w2x2=0

Page 26: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Twolayers,withthresholdactivations

37

Ingeneral,convexpolygons

FigurefromShaiShalev-Shwartz andShaiBen-David,2014

Page 27: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Threelayerswiththresholdactivations

38

Ingeneral,unionsofconvexpolygons

FigurefromShaiShalev-Shwartz andShaiBen-David,2014

Page 28: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Neuralnetworksareuniversalfunctionapproximators

• Anycontinuousfunctioncanbeapproximatedtoarbitraryaccuracyusingonehiddenlayerofsigmoidunits[Cybenko 1989]

• Approximationerrorisinsensitivetothechoiceofactivationfunctions[DasGupta etal1993]

• Twolayerthreshold networkscanexpressanyBooleanfunction– Exercise:Provethis

• VCdimensionofthresholdnetworkwithedgesE:JK = L(|N|log|N|)

• VCdimensionofsigmoidnetworkswithnodesVandedgesE:– Upperbound:Ο J H N H

– Lowerbound:Ω N H

39

Exercise:Showthatifwehaveonlylinearunits,thenmultiplelayersdoesnotchangetheexpressiveness

Page 29: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Neuralnetworksareuniversalfunctionapproximators

• Anycontinuousfunctioncanbeapproximatedtoarbitraryaccuracyusingonehiddenlayerofsigmoidunits[Cybenko 1989]

• Approximationerrorisinsensitivetothechoiceofactivationfunctions[DasGupta etal1993]

• Twolayerthreshold networkscanexpressanyBooleanfunction– Exercise:Provethis

• VCdimensionofthresholdnetworkwithedgesE:JK = L(|N|log|N|)

• VCdimensionofsigmoidnetworkswithnodesVandedgesE:– Upperbound:Ο J H N H

– Lowerbound:Ω N H

40

Page 30: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Neuralnetworksareuniversalfunctionapproximators

• Anycontinuousfunctioncanbeapproximatedtoarbitraryaccuracyusingonehiddenlayerofsigmoidunits[Cybenko 1989]

• Approximationerrorisinsensitivetothechoiceofactivationfunctions[DasGupta etal1993]

• Twolayerthreshold networkscanexpressanyBooleanfunction– Exercise:Provethis

• VCdimensionofthresholdnetworkwithedgesE:JK = L(|N|log|N|)

• VCdimensionofsigmoidnetworkswithnodesVandedgesE:– Upperbound:Ο J H N H

– Lowerbound:Ω N H

41

Exercise:Showthatifwehaveonlylinearunits,thenmultiplelayersdoesnotchangetheexpressiveness


Recommended