Exploring Complexity Reduction in Deep Learning · 2021. 6. 4. · Exploring Complexity Reduction...

Post on 12-Aug-2021

4 views 0 download

transcript

ExploringComplexityReductioninDeepLearning

SouryaDey

PhDCandidate,UniversityofSouthernCaliforniaAdvisors:PeterA.BeerelandKeithM.Chugg

B.Tech,InstrumentationEngineering,IITKGP,2014

January3,2020

Outline

Pre-Defined Sparsity

Reduce complexity of neural networks with minimal performance

degradation

UniversityofSouthernCalifornia

Overview

Neuralnetworks(NNs)arekeymachinelearningtechnologies

➢ Artificialintelligence➢ Self-drivingcars➢ Speechrecognition➢ FaceID➢ andmoresmartstuff…

SouryaDey �4

Basic working of an artificial neural network

Nodes/Neuronsinalayer

Basic working of an artificial neural network

Nodes/Neuronsinalayer

Edges/Connectionsinajunction

Basic working of an artificial neural network

Nodes/Neuronsinalayer

Edges/Connectionsinajunction

1

0.5

-2

-4.2

0.3

1.3-5

0

0

-0.7

3

-2.2

6.4-0.5

Weights

Basic working of an artificial neural network

Nodes/Neuronsinalayer

Edges/Connectionsinajunction

1

0.5

-2

-4.2

0.3

1.3-5

0

0

-0.7

3

-2.2

6.4-0.5

Weights

Basic working of an artificial neural network

Nodes/Neuronsinalayer

Edges/Connectionsinajunction

4.8 3.5

2 1

1

0.5

-2

-4.2

0.3

1.3-5

0

0

-0.7

3

-2.2

6.4-0.5

Weights

Basic working of an artificial neural network

Nodes/Neuronsinalayer

Edges/Connectionsinajunction

4.8 3.5

2 1

1

0.5

-2

-4.2

0.3

1.3-5

0

0

-0.7

3

-2.2

6.4-0.5

WeightsFeedforward

Cost

Basic working of an artificial neural network

InferenceTraining

Nodes/Neuronsinalayer

Edges/Connectionsinajunction

4.8 3.5

2 1

1

0.5

-2

-4.2

0.3

1.3-5

0

0

-0.7

3

-2.2

6.4-0.5

WeightsFeedforward

Backpropagation

Cost

Basic working of an artificial neural network

Training

Nodes/Neuronsinalayer

Edges/Connectionsinajunction

4.8 3.5

2 1

Weights2

0.4

-5

-5.9

0.9

1.4-4

0

1

-1.9

7

-4.7

2.5-1.1

Feedforward

Backpropagation

Update

Cost

Basic working of an artificial neural network

Training

Nodes/Neuronsinalayer

Edges/Connectionsinajunction

4.8 3.5

2 1

Weights2

0.4

-5

-5.9

0.9

1.4-4

0

1

-1.9

7

-4.7

2.5-1.1

Feedforward

Backpropagation

Update

Cost

Basic working of an artificial neural network

Weightsdominatecomplexity–theyareallusedinall3operations

UniversityofSouthernCalifornia

Motivation behind our work

TrainingcantakeweeksonCPUCloudGPUresourcesareexpensive

Fullyconnected(FC)MultilayerPerceptron(MLP)

TypicaldeepCNN

Modernneuralnetworkssufferfromparameterexplosion

SouryaDey �6

UniversityofSouthernCalifornia

Our Work: Pre-defined Sparsity

Pre-defineasparseconnectionpatternpriortotrainingUsethissparsenetworkforbothtrainingandinference

SouryaDey �7

UniversityofSouthernCalifornia

Our Work: Pre-defined Sparsity

Pre-defineasparseconnectionpatternpriortotrainingUsethissparsenetworkforbothtrainingandinference

SouryaDey �7

UniversityofSouthernCalifornia

Our Work: Pre-defined Sparsity

Pre-defineasparseconnectionpatternpriortotrainingUsethissparsenetworkforbothtrainingandinference

SouryaDey �7

UniversityofSouthernCalifornia

Our Work: Pre-defined Sparsity

Pre-defineasparseconnectionpatternpriortotrainingUsethissparsenetworkforbothtrainingandinference

SouryaDey �7

UniversityofSouthernCalifornia

Our Work: Pre-defined Sparsity

Pre-defineasparseconnectionpatternpriortotrainingUsethissparsenetworkforbothtrainingandinference

StructuredConstraints:Fixedin-,out-degreesforeverynode

SouryaDey �7

UniversityofSouthernCalifornia

Our Work: Pre-defined Sparsity

Pre-defineasparseconnectionpatternpriortotrainingUsethissparsenetworkforbothtrainingandinference

OverallDensitycomparedtoFC

StructuredConstraints:Fixedin-,out-degreesforeverynode

SouryaDey �7

UniversityofSouthernCalifornia

Our Work: Pre-defined Sparsity

Pre-defineasparseconnectionpatternpriortotrainingUsethissparsenetworkforbothtrainingandinference

OverallDensitycomparedtoFC

StructuredConstraints:Fixedin-,out-degreesforeverynode

Reducedtrainingandinferencecomplexity

SouryaDey �7

UniversityofSouthernCalifornia

Motivation behind pre-defined sparsity

InaFCnetwork,mostweightsareverysmallinmagnitudeaftertrainingSouryaDey �8

UniversityofSouthernCalifornia

Pre-defined sparsity performance on MLPs

SouryaDey �9

Startingwithonly20%ofparametersreducestestaccuracybyjust1%

UniversityofSouthernCalifornia

Pre-defined sparsity performance on MLPs

SouryaDey �9

Startingwithonly20%ofparametersreducestestaccuracybyjust1%

MNISThandwrittendigits

Reutersnewsarticles

TIMITphonemes

CIFARimages

MorsesymbolsS.Dey,K.M.ChuggandP.A.Beerel,“MorseCodeDatasetsforMachineLearning,”inICCCNT2018.WonBestPaperaward.https://github.com/usc-hal/morse-dataset

Analysis and Applications

Deep dive into pre-defined sparsity

for MLPs, and a corresponding application

UniversityofSouthernCalifornia

Designing pre-defined sparse networks

Apre-definedsparseconnectionpatternisahyperparametertobe

setpriortotraining

Findtrendsandguidelinestooptimizepre-definedsparsepatterns

SouryaDey �11

S.Dey,K.Huang,P.A.BeerelandK.M.Chugg,"Pre-DefinedSparseNeuralNetworkswithHardwareAcceleration,"inIEEEJournalonEmergingandSelectedTopicsinCircuitsandSystems,vol.9,no.2,pp.332-345,June2019.

UniversityofSouthernCalifornia

Individual junction densities

Latterjunctions(closertotheoutput)needtobedenserSouryaDey �12

UniversityofSouthernCalifornia

Individual junction densities

Eachcurvekeeps!2fixedandvaries!netbyvarying!1

Forthesame!net,!2>!1improvesperformance

SouryaDey �13

Mostlysimilartrendsobservedfordeepernetworks

UniversityofSouthernCalifornia

Highredundancy

Lowredundancy

Dataset redundancy

SouryaDey �14

UniversityofSouthernCalifornia

Highredundancy

Lowredundancy

Dataset redundancy

MNISTwithdefault784features

MNISTreducedto200featuresWiderspread

Lessredundancy=>LesssparsificationpossibleSouryaDey �14

UniversityofSouthernCalifornia

Effect of redundancy on sparsity

Reducingredundancyleadstoincreasedperformancedegradationonsparsification

SouryaDey �15

UniversityofSouthernCalifornia

‘Large sparse’ vs ‘small dense’ networks

Asparsernetworkwithmorehiddennodeswilloutperformadensernetworkwithlesshiddennodes,whenbothhavesamenumberofweights

SouryaDey �16

UniversityofSouthernCalifornia

‘Large sparse’ vs ‘small dense’ networksNetworkswithsamenumberofparametersgofrombadtogoodas#nodesinhiddenlayersisincreased

SouryaDey �17

UniversityofSouthernCalifornia

Regularization

SouryaDey �18

Regularizedcost

Originalunregularizedcost(likecross-entropy)

Regularizationterm

UniversityofSouthernCalifornia

Regularization

SouryaDey �18

Regularizedcost

Originalunregularizedcost(likecross-entropy)

Regularizationterm

Pre-definedsparsenetworksneedsmallerλ(asdeterminedbyvalidation)

Pre-definedsparsityreducestheoverfittingproblemstemmingfromover-parametrizationinbignetworks

OverallDensity λ

100% 1.1x10-4

40% 5.5x10-5

11% 0

ExampleforMNIST2-junctionnetworks

SlowTraining

HardwareIntensivez

Flexibility

Degreeofparallelism(z)=Numberofweightsprocessedinparallelinajunction

S.Dey,Y.Shao,K.M.ChuggandP.A.Beerel,“Acceleratingtrainingofdeepneuralnetworksviasparseedgeprocessing,”in26thInternationalConferenceonArtificialNeuralNetworks(ICANN)Part1,pp.273-280.Springer,Sep2017.

Application: A hardware architecture for on-device training and inference

Degreeofparallelism(z)=Numberofweightsprocessedinparallelinajunction

Connectionsdesignedforclash-freememoryaccessestopreventstalling

S.Dey,P.A.BeerelandK.M.Chugg,“Interleaverdesignfordeepneuralnetworks,”in51stAnnualAsilomarConferenceonSignals,Systems,andComputers(ACSSC),pp.1979-1983,Oct2017.

z=3

Application: A hardware architecture for on-device training and inference

Degreeofparallelism(z)=Numberofweightsprocessedinparallelinajunction

Connectionsdesignedforclash-freememoryaccessestopreventstalling

PrototypeimplementedonFPGA

S.Dey,D.Chen,Z.Li,S.Kundu,K.Huang,K.M.ChuggandP.A.Beerel,“AHighlyParallelFPGAImplementationofSparseNeuralNetworkTraining,”in2018InternationalConferenceonReconfigurableComputingandFPGAs(ReConFig),pp.1-4,Dec2018.Expandedpre-printversionavailableatarXiv:1806.01087.

Application: A hardware architecture for on-device training and inference

Model Search

Automate the design of CNNs with good performance and

low complexity

UniversityofSouthernCalifornia

Model search is ongoing research, hence currently not available publicly

SouryaDey �21

Thank you!

https://souryadey.github.io/