Exploring Complexity Reduction in Deep Learning · 2021. 6. 4. · Exploring Complexity Reduction...

transcript

ExploringComplexityReductioninDeepLearning

SouryaDey

PhDCandidate,UniversityofSouthernCaliforniaAdvisors:PeterA.BeerelandKeithM.Chugg

B.Tech,InstrumentationEngineering,IITKGP,2014

January3,2020

Outline

Pre-Defined Sparsity

Reduce complexity of neural networks with minimal performance

degradation

UniversityofSouthernCalifornia

Overview

Neuralnetworks(NNs)arekeymachinelearningtechnologies

➢ Artificialintelligence➢ Self-drivingcars➢ Speechrecognition➢ FaceID➢ andmoresmartstuff…

SouryaDey �4

Basic working of an artificial neural network

Nodes/Neuronsinalayer

Edges/Connectionsinajunction

6.4-0.5

Weights

6.4-0.5

Weights

4.8 3.5

6.4-0.5

Weights

4.8 3.5

6.4-0.5

WeightsFeedforward

InferenceTraining

4.8 3.5

6.4-0.5

WeightsFeedforward

Backpropagation

Training

4.8 3.5

Weights2

2.5-1.1

Feedforward

Backpropagation

Update

Training

4.8 3.5

Weights2

2.5-1.1

Feedforward

Backpropagation

Update

Weightsdominatecomplexity–theyareallusedinall3operations

Motivation behind our work

TrainingcantakeweeksonCPUCloudGPUresourcesareexpensive

Fullyconnected(FC)MultilayerPerceptron(MLP)

TypicaldeepCNN

Modernneuralnetworkssufferfromparameterexplosion

SouryaDey �6

Our Work: Pre-defined Sparsity

Pre-defineasparseconnectionpatternpriortotrainingUsethissparsenetworkforbothtrainingandinference

SouryaDey �7

StructuredConstraints:Fixedin-,out-degreesforeverynode

SouryaDey �7

OverallDensitycomparedtoFC

SouryaDey �7

OverallDensitycomparedtoFC

Reducedtrainingandinferencecomplexity

SouryaDey �7

Motivation behind pre-defined sparsity

InaFCnetwork,mostweightsareverysmallinmagnitudeaftertrainingSouryaDey �8

Pre-defined sparsity performance on MLPs

SouryaDey �9

Startingwithonly20%ofparametersreducestestaccuracybyjust1%

Pre-defined sparsity performance on MLPs

SouryaDey �9

Startingwithonly20%ofparametersreducestestaccuracybyjust1%

MNISThandwrittendigits

Reutersnewsarticles

TIMITphonemes

CIFARimages

MorsesymbolsS.Dey,K.M.ChuggandP.A.Beerel,“MorseCodeDatasetsforMachineLearning,”inICCCNT2018.WonBestPaperaward.https://github.com/usc-hal/morse-dataset

Analysis and Applications

Deep dive into pre-defined sparsity

for MLPs, and a corresponding application

Designing pre-defined sparse networks

Apre-definedsparseconnectionpatternisahyperparametertobe

setpriortotraining

Findtrendsandguidelinestooptimizepre-definedsparsepatterns

SouryaDey �11

S.Dey,K.Huang,P.A.BeerelandK.M.Chugg,"Pre-DefinedSparseNeuralNetworkswithHardwareAcceleration,"inIEEEJournalonEmergingandSelectedTopicsinCircuitsandSystems,vol.9,no.2,pp.332-345,June2019.

Individual junction densities

Latterjunctions(closertotheoutput)needtobedenserSouryaDey �12

Individual junction densities

Eachcurvekeeps!2fixedandvaries!netbyvarying!1

Forthesame!net,!2>!1improvesperformance

SouryaDey �13

Mostlysimilartrendsobservedfordeepernetworks

Highredundancy

Lowredundancy

Dataset redundancy

SouryaDey �14

Highredundancy

Lowredundancy

Dataset redundancy

MNISTwithdefault784features

MNISTreducedto200featuresWiderspread

Lessredundancy=>LesssparsificationpossibleSouryaDey �14

Effect of redundancy on sparsity

Reducingredundancyleadstoincreasedperformancedegradationonsparsification

SouryaDey �15

‘Large sparse’ vs ‘small dense’ networks

Asparsernetworkwithmorehiddennodeswilloutperformadensernetworkwithlesshiddennodes,whenbothhavesamenumberofweights

SouryaDey �16

‘Large sparse’ vs ‘small dense’ networksNetworkswithsamenumberofparametersgofrombadtogoodas#nodesinhiddenlayersisincreased

SouryaDey �17

Regularization

SouryaDey �18

Regularizedcost

Originalunregularizedcost(likecross-entropy)

Regularizationterm

Regularization

SouryaDey �18

Regularizedcost

Originalunregularizedcost(likecross-entropy)

Regularizationterm

Pre-definedsparsenetworksneedsmallerλ(asdeterminedbyvalidation)

Pre-definedsparsityreducestheoverfittingproblemstemmingfromover-parametrizationinbignetworks

OverallDensity λ

100% 1.1x10-4

40% 5.5x10-5

ExampleforMNIST2-junctionnetworks

SlowTraining

HardwareIntensivez

Flexibility

Degreeofparallelism(z)=Numberofweightsprocessedinparallelinajunction

S.Dey,Y.Shao,K.M.ChuggandP.A.Beerel,“Acceleratingtrainingofdeepneuralnetworksviasparseedgeprocessing,”in26thInternationalConferenceonArtificialNeuralNetworks(ICANN)Part1,pp.273-280.Springer,Sep2017.

Application: A hardware architecture for on-device training and inference

Connectionsdesignedforclash-freememoryaccessestopreventstalling

S.Dey,P.A.BeerelandK.M.Chugg,“Interleaverdesignfordeepneuralnetworks,”in51stAnnualAsilomarConferenceonSignals,Systems,andComputers(ACSSC),pp.1979-1983,Oct2017.

Connectionsdesignedforclash-freememoryaccessestopreventstalling

PrototypeimplementedonFPGA

S.Dey,D.Chen,Z.Li,S.Kundu,K.Huang,K.M.ChuggandP.A.Beerel,“AHighlyParallelFPGAImplementationofSparseNeuralNetworkTraining,”in2018InternationalConferenceonReconfigurableComputingandFPGAs(ReConFig),pp.1-4,Dec2018.Expandedpre-printversionavailableatarXiv:1806.01087.

Model Search

Automate the design of CNNs with good performance and

low complexity

Model search is ongoing research, hence currently not available publicly

SouryaDey �21

Thank you!

https://souryadey.github.io/

Exploring Complexity Reduction in Deep Learning · 2021. 6. 4. · Exploring Complexity Reduction...

Documents