+ All Categories
Home > Documents > Neural Network-Based Accelerators for Transcendental...

Neural Network-Based Accelerators for Transcendental...

Date post: 06-Feb-2018
Category:
Upload: truongduong
View: 217 times
Download: 1 times
Share this document with a friend
26
Introduction Neural Nets as Approximators Implementation Evaluation Summary Neural Network-Based Accelerators for Transcendental Function Approximation Schuyler Eldridge * Florian Raudies David Zou * Ajay Joshi * * Department of Electrical and Computer Engineering, Boston University Center for Computational Neuroscience and Neural Technology, Boston University [email protected] May 22, 2014 This work was supported by a NASA Office of the Chief Technologist’s Space Technology Research Fellowship. Eldridge, Raudies, Zou, and Joshi Boston University 1/19
Transcript
Page 1: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

Neural Network-Based Accelerators forTranscendental Function Approximation

Schuyler Eldridge∗ Florian Raudies† David Zou∗

Ajay Joshi∗

∗Department of Electrical and Computer Engineering, Boston University†Center for Computational Neuroscience and Neural Technology, Boston

[email protected]

May 22, 2014This work was supported by a NASA Office of the ChiefTechnologist’s Space Technology Research Fellowship.

Eldridge, Raudies, Zou, and Joshi Boston University 1/19

Page 2: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

Library-Level Approximation Overview

Technology Scaling Trends

Figure 1: Trends in CMOS technology [Moore et al., 2011 Salishan]

Eldridge, Raudies, Zou, and Joshi Boston University 2/19

Page 3: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

Library-Level Approximation Overview

Accelerators to the Rescue?

Energy Efficient Accelerators...

Lessen the utilization crunch of Dark Silicon

Are cheap due to plentiful transistor counts

Are typically special-purpose

Approaches to General Purpose Acceleration

QsCores – Dedicated hardware for frequent codepatterns [Venkatesh et al., 2011 MICRO]

NPU – Neural network-based approximation of coderegions [Esmaeilzadeh et al., 2012 MICRO]

Eldridge, Raudies, Zou, and Joshi Boston University 3/19

Page 4: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

Library-Level Approximation Overview

Neural Networks (NNs) as General-Purpose Accelerators

The good and the bad...

NNs are general-purposeapproximators [Cybenko, 1989 Math. Control Signal, Hornik, 1991 Neural Networks]

But... NNs are still approximate

Approximation may be acceptable

Modern recognition, mining, and synthesis (RMS)benchmarks are robust [Chippa et al., 2013 DAC]

Eldridge, Raudies, Zou, and Joshi Boston University 4/19

Page 5: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

Library-Level Approximation Overview

Library-Level Approximation with NN-Based Accelerators

Big Idea

Use NNs to approximate library-level functions

cos, exp, log, pow, and sin

Explore the design space of NN topologies

Define and use an energy–delay–error product (EDEP) metric

Evaluate energy–performance improvements

Use an energy–delay product (EDP) metric

Evaluate accuracy of...

NN-based accelerators vs. a traditional approachApplications using NN-based accelerators

Eldridge, Raudies, Zou, and Joshi Boston University 5/19

Page 6: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

NN OverviewNN-Based Input–Output Scaling

Multilayer Perceptron (MLP) NN Primer

I1

X1

Ii

Xi

. . . bias

H1 H2 Hh bias. . .

O1

Y1

. . . Oo

Yo

Hidden

Input

Output

Figure 2: NN with i × h × o nodes.

φ

(n∑

k=1

xkwk

)

x3x2x1 . . . xn

y

w1 w2 w3wn

Figure 3: One neuron

Equations

y =φ

(n∑

k=1

xkwk

)

φsigmoid =1

1 + e−2x

φlinear =x

Eldridge, Raudies, Zou, and Joshi Boston University 6/19

Page 7: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

NN OverviewNN-Based Input–Output Scaling

NN-Based Approximation Requires Input–Output Scaling

Approximating Unbounded Functions on Bounded Domains

NNs cannot handle unbounded inputs

Input–output scaling can extend the effective domain andrange of the approximated function

This approach is suitable when...

A small region is representative of the whole functionThere exist easya operations to scale inputs and outputs

Specifically, we use the CORDIC [Volder, 1959 IRE Tran. Comput.] scalingsidentified by Walther [Walther, 1971 AFIPS]

aBy “easy”, I mean multiplication with a constant, addition, bitshifts, androunding.

Eldridge, Raudies, Zou, and Joshi Boston University 7/19

Page 8: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

NN OverviewNN-Based Input–Output Scaling

Walther’s Scaling Approach [Walther, 1971 AFIPS] for exp x

Neural NetworkDomain

Other Domains Scaled toNeural Network Domain

exp(q log 2− d) = 2q exp(−d)q = b x

log 2+ 1c

d = x − q log 2

−3 log 2 − log 2 log 2 3 log 2

1

2

4

8

x

y = exp x

Figure 4: Graphical scaling for exp x

Scaling Steps

1 Scale inputs ontoNN domain

2 NN approximatesfunction

3 Scale outputsonto full range

Similar Scalings Exist

cos x and sin x

log x

Eldridge, Raudies, Zou, and Joshi Boston University 8/19

Page 9: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

NN OverviewNN-Based Input–Output Scaling

Walther’s Scaling Approach [Walther, 1971 AFIPS] for exp x

exp(q log 2− d) = 2q exp(−d)q = b x

log 2+ 1c

d = x − q log 2

x̂ = q log 2− d

−3 log 2 − log 2 log 2 3 log 2

1

2

4

8

x

y = exp x

Figure 4: Graphical scaling for exp x

Scaling Steps

1 Scale inputs ontoNN domain

2 NN approximatesfunction

3 Scale outputsonto full range

Similar Scalings Exist

cos x and sin x

log x

Eldridge, Raudies, Zou, and Joshi Boston University 8/19

Page 10: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

NN OverviewNN-Based Input–Output Scaling

Walther’s Scaling Approach [Walther, 1971 AFIPS] for exp x

exp(q log 2− d) = 2q exp(−d)q = b x

log 2+ 1c

d = x − q log 2

x̂ = q log 2− d

−d

−3 log 2 − log 2 log 2 3 log 2

1

2

4

8

x

y = exp x

Figure 4: Graphical scaling for exp x

Scaling Steps

1 Scale inputs ontoNN domain

2 NN approximatesfunction

3 Scale outputsonto full range

Similar Scalings Exist

cos x and sin x

log x

Eldridge, Raudies, Zou, and Joshi Boston University 8/19

Page 11: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

NN OverviewNN-Based Input–Output Scaling

Walther’s Scaling Approach [Walther, 1971 AFIPS] for exp x

exp(q log 2− d) = 2q exp(−d)q = b x

log 2+ 1c

d = x − q log 2

x̂ = q log 2− d

expNN (−d)

−3 log 2 − log 2 log 2 3 log 2

1

2

4

8

x

y = exp x

Figure 4: Graphical scaling for exp x

Scaling Steps

1 Scale inputs ontoNN domain

2 NN approximatesfunction

3 Scale outputsonto full range

Similar Scalings Exist

cos x and sin x

log x

Eldridge, Raudies, Zou, and Joshi Boston University 8/19

Page 12: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

NN OverviewNN-Based Input–Output Scaling

Walther’s Scaling Approach [Walther, 1971 AFIPS] for exp x

exp(q log 2− d) = 2q exp(−d)q = b x

log 2+ 1c

d = x − q log 2

x̂ = q log 2− d

2q expNN (−d)

−3 log 2 − log 2 log 2 3 log 2

1

2

4

8

x

y = exp x

Figure 4: Graphical scaling for exp x

Scaling Steps

1 Scale inputs ontoNN domain

2 NN approximatesfunction

3 Scale outputsonto full range

Similar Scalings Exist

cos x and sin x

log x

Eldridge, Raudies, Zou, and Joshi Boston University 8/19

Page 13: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

NN OverviewNN-Based Input–Output Scaling

Walther’s Scaling Approach [Walther, 1971 AFIPS] for exp x

exp(q log 2− d) = 2q exp(−d)q = b x

log 2+ 1c

d = x − q log 2

x̂ = q log 2− d

expNN (x̂)

−3 log 2 − log 2 log 2 3 log 2

1

2

4

8

x

y = exp x

Figure 4: Graphical scaling for exp x

Scaling Steps

1 Scale inputs ontoNN domain

2 NN approximatesfunction

3 Scale outputsonto full range

Similar Scalings Exist

cos x and sin x

log x

Eldridge, Raudies, Zou, and Joshi Boston University 8/19

Page 14: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

ArchitectureDesign Space Exploration

Fixed Point Accelerator Architecture for 1× 3× 1 NN

Figure 5: Block diagram of an NN-based accelerator architectureEldridge, Raudies, Zou, and Joshi Boston University 9/19

Page 15: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

ArchitectureDesign Space Exploration

NN Topology Evaluation – Design Space Exploration

Candidate NN Topologies

Fixed point

1–15 hidden nodes

6–10 fractional bits

NN Evaluation Criteria

Energy

Performance

Accuracy

Energy–Delay–Error Product (EDEP)

Optimal NN topology minimizes EDEP

EDEP = energy × latency in cycles

frequency× mean squared error

Eldridge, Raudies, Zou, and Joshi Boston University 10/19

Page 16: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

ArchitectureDesign Space Exploration

NN Topology Evaluation – Results

Table 1: MSE and energy consumption of the NN-based acceleratorimplementation of transcendental functions.

Func. NN MSE (×10−4) Energy (pJ) Area (um2) Freq. (MHz)

cos h1 b6 9 8 1300 340sin h1 b6 7 8 1300 340

exp h3 b7 2 25 3600 340log h3 b7 1 25 3600 340pow h3 b7 432 102 3600 340

Evaluation Notes

Evaluated with a 45nm predictive technology model (PTM)

pow computed using exp and log:ab = exp (b log a)

Eldridge, Raudies, Zou, and Joshi Boston University 11/19

Page 17: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

ArchitectureDesign Space Exploration

NN Topology Evaluation Results

−5 0 5

−1

0

1

x

cos x

−5 0 5

0

200

400

exp x

0 2 4 6

−2

0

2

log x

Function Value (left y axis) Squared Error (right y axis)

−5 0 5

0

20

40

60

pow(2, x)

−5 0 5

−1

0

1

x

sin x

10−3

100

103

Figure 6: NN-based functions and their errors. Note: Error is plotted ona log scale using the right y axis.

Evaluation Notes

Functions well approximated by their NNs

Due to input–output scaling, error is proportional to outputvalue

Eldridge, Raudies, Zou, and Joshi Boston University 12/19

Page 18: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

Traditional glibc – Instruction Counts and EnergyEDP ReductionsAccuracy Trade-Offs

Evaluation Approach

Approach – Energy

Determine traditional glibc instruction breakdown

Determine energy/instruction in 45nm PTM

Determine glibc energy/function

Compare traditional and NN-based execution using EDP

Approach – Accuracy

Replace all transcendental function calls with NNs

Evaluate application output accuracy

Eldridge, Raudies, Zou, and Joshi Boston University 13/19

Page 19: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

Traditional glibc – Instruction Counts and EnergyEDP ReductionsAccuracy Trade-Offs

Traditional glibc Instruction Breakdown

Table 2: Mean floating point instruction counts.

Func. addsd addss mulsd mulss subsd subss Total Instructions

cos 7 0 12 0 8 0 115cosf 0 3 0 10 0 7 103exp 11 0 14 0 6 0 160expf 5 1 5 1 2 1 218log 18 0 12 0 5 0 227logf 0 8 0 11 0 4 143pow 32 0 31 0 21 0 338powf 0 23 0 35 0 26 355sin 8 0 11 0 6 0 109sinf 0 3 0 9 0 5 97

Abbreviations Used

ss or, e.g., cosf ≡ single precision

sd or, e.g., cos ≡ double precision

Eldridge, Raudies, Zou, and Joshi Boston University 14/19

Page 20: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

Traditional glibc – Instruction Counts and EnergyEDP ReductionsAccuracy Trade-Offs

Traditional glibc energy/instruction

Table 3: Parameters of traditional glibc implementations of floatingpoint instructions.

Instruction Area (um2) Freq. (MHz) Energy (pJ)

addss 640 390 1addsd 1500 390 2mulss 6500 280 36mulsd 16200 140 80

Evaluation Notes

Evaluated in the NCSU 45nm predictive technology model

For scale, one NN-based exp function uses 25 pJ

Latency of one cycle

Eldridge, Raudies, Zou, and Joshi Boston University 15/19

Page 21: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

Traditional glibc – Instruction Counts and EnergyEDP ReductionsAccuracy Trade-Offs

Traditional glibc energy/function

Table 4: Mean floating pointenergy

Function Energy (pJ)

cos 967cosf 365exp 1158expf 453log 995logf 415pow 2561powf 1292sin 909sinf 311

Observation

Energy consumption is 2orders of magnitude higherthan NN-basedimplementation

Eldridge, Raudies, Zou, and Joshi Boston University 16/19

Page 22: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

Traditional glibc – Instruction Counts and EnergyEDP ReductionsAccuracy Trade-Offs

NN Approximators for EDP Reductions

Table 5: NN-based EDP issignificantly lower than glibc.Data is normalized to sin EDP,3 × 10−19.

EDP in Multiples of sin EDP

Func. EDP-NN EDP-Single EDP-Double

cos 1 55 161exp 4 1052 269log 4 86 328pow 31 666 1256sin 1 44 144

Table 6: Applications that spend mostof their cycles computing transcendentalfunctions see large EDP improvements.

Normalized EDP

Benchmark Transcendental Cycles Single Double

blackscholes 46% 56% 55%swaptions 39% 62% 61%bodytrack 2% 98% 98%canneal 1% 99% 99%

Approximating Transcendental Functions

Energy-delay product is 68x lower vs. glibc

Mean squared error is 9 × 10−3

Application improvements follow Amdahl’s law

Eldridge, Raudies, Zou, and Joshi Boston University 17/19

Page 23: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

Traditional glibc – Instruction Counts and EnergyEDP ReductionsAccuracy Trade-Offs

NN-Based Accelerators in Applications – Accuracy

Table 7: Application output MSEand percent error using NN-basedaccelerators.

Benchmark MSE (×10−1) E[|%error|]

blackscholes 4.00 25%bodytrack 2.00 30%ferret 0.01 2%swaptions 60.00 37%

canneal 2.89×108 0.0025%

MSE and Percent Error

Qualitatively low error

canneal has 1 large output,hence high MSE and lowpercent error

Eldridge, Raudies, Zou, and Joshi Boston University 18/19

Page 24: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

IntroductionNeural Nets as Approximators

ImplementationEvaluationSummary

Library-Level NN-Based Accelerator Summary

Results

Accelerators demonstrate EDP reductions...

68x lower EDP than glibc

78% of the EDP of traditional applications

Library-level approximation is a suitable target for NN-basedacceleration

Work in this area can be improved by enabling NN-basedaccelerators to approximate additional functions andapplications through...

Extensions to additional librariesCapabilities to automatically identify and approximatefunctions

Eldridge, Raudies, Zou, and Joshi Boston University 19/19

Page 25: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

References

Appendix Contents

1 References

Eldridge, Raudies, Zou, and Joshi Boston University 20/19

Page 26: Neural Network-Based Accelerators for Transcendental ...people.bu.edu/schuye/files/glsvlsi2014-eldridge-presentation.pdf · Introduction Neural Nets as Approximators Implementation

References

Moore, C. (2011). Data Exascale-Class Computer Systems. Presented at The Salishan Conference on High

Speed Computing.

Venkatesh, G. et al. (2011). Qscores: trading dark silicon for scalable energy efficiency with quasi-specific

cores. In MICRO.

Esmaeilzadeh, H. et al. (2012). Neural acceleration for general-purpose approximate programs. In MICRO.

Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Math. Control Signal,

2(4):303–314.

Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks. Neural Networks,

4(2):251–257.

Chippa, V. K., Chakradhar, S. T., Roy, K., and Raghunathan, A. (2013). Analysis and characterization of

inherent application resilience for approximate computing. In DAC.

Volder, J. E. (1959). The cordic trigonometric computing technique. IRE Tran. Comput., EC-8(3):330 –334.

Walther, S. (1971). A unified algorithm for elementary functions. AFIPS.

Chen, T., Chen, Y., Duranton, M., Guo, Q., Hashmi, A., Lipasti, M., Nere, A., Qiu, S., Sebag, M., and

Temam, O. (2012). Benchnn: On the broad potential application scope of hardware neural networkaccelerators. In IISWC.

Li, B., Shan, Y., Hu, M., Wang, Y., Chen, Y., and Yang, H. (2013). Memristor-based approximated

computation. In ISLPED.

Eldridge, Raudies, Zou, and Joshi Boston University 21/19


Recommended