On-chip Learning Neural Network Hardware Implementation for Real-time Control Prof. Dr. Martin...

Post on 17-Jan-2016

220 views 0 download

Tags:

transcript

On-chip Learning Neural Network Hardware

Implementation for Real-time Control

Prof. Dr. Martin BrookeBortecene Terlemez

Current Status

Simulation Two frequency simulation Added noise simulation

Experiments 1 second suppression Long runs

Simulation SetupD

elay

1.5

ms

Del

ay li

ne

error error

UnstableCombustion Model

xu

Software Simulation of Neural Network Chip

uxxb

bxx

2

2 )(2

One Frequency Plant without Control

One Frequency Result

Time (second)

Engi

ne P

ress

ure

NN

Wei

ght

1 4 2 3 4 2 1 3 2 3 4 3 5 1 5 4

f = 400Hzb =

Two Frequency Results

Time (Second)

NN W

eight

Engi

ne P

ress

ure

f = 400Hz 700Hzb =

10 % Added Noise Results

f=400Hz=0.005b=1

Uncontrolled Engine

Neural Network Controlled Engine

Continuously Changing Plant Parameters (1 point/ second)

Continuously Changing Plant Parameters (50 points/ second)

Experimental Setup

Chip ControlSignals

5

Digital Output

1

Analog Input

Chip Output

Chip Input

Analog Output

8

National InstrumentAT-MIO-16E

National InstrumentAT-AO-10

Current to Voltage Conversion

Short run-time

f = 400 Hz

Long run-time

f = 400 Hz

Experimental Conclusions

Suppression of Oscillation in less than few seconds.

Continuous Adaptation.

Issues

Competing technology status General Purpose HW vs Dedicated

HW Controller Initialization

How to find optimum weights? How to set the weights?

Dedicated NN Hardware

Serial Digital [1] Partially Parallel Digital [2] Fully Parallel Digital [3] Fully Parallel Analog [4]

References

[1] Torsten Lehmann, Erik Bruun, and Casper Dietrich, “Mixed Analog/Digital Matrix-Vector Multiplier for Neural Network Synapses.” Analog Integrated Circuits and Signal Processing, 9, pp. 55-63, 1996.

[2] Antonio J. Montalvo, Ronald S. Gyurcsik, and John J. Paulos, “An Analog VLSI Neural Network with On-Chip Perturbation Learning”, IEEE Journal of Solid-State Circuits, Vol. 32, No. 4, April 1997.

[3] S. Neusser and B. Hofflinger, "Parallel Digital Neural Hardware for Controller Design", Mathematics and Computers in Simulation, Vol. 41, Pp. 149-160, 1996.

[4] Maurizio Valle, Daniele D. Caviglia, and Ciacomo M. Bisio, “An Experimental Analog VLSI Neural Network with On-Chip Back-Propagation Learning”, Analog Integrated Circuits and Signal Processing, 9, pp. 231-245, 1996.

Time for One Forward Propagation

Time 1x1 10x10 100x100 1,000x1,000

Serial Digital 40 4000 400,000 40,000,000

Partially Parallel Digital 309 3090 30,900 309,000

Fully Parallel Digital 1770 1770 1770 1770

Fully Parallel Analog 100 100 100 100

(Time: Number of Gate Delays)

Area

Gate Numbers 1x1 10x10 100x100 1,000x1,000

Serial Digital 82,500 82,500 82,500 82,500

Partially Parallel Digital 55,000 550,000 5,500,000 55,000,000

Fully Parallel Digital 140 14,000 1,400,000 140,000,000

Fully Parallel Analog 17 1,700 170,000 17,000,000

Gate Numbers 1x1 10x10 100x100 1,000x1,000

Serial Digital 82,500 82,500 82,500 82,500

Partially Parallel Digital 55,000 550,000 5,500,000 55,000,000

Fully Parallel Digital 140 14,000 1,400,000 140,000,000

Fully Parallel Analog 17 1,700 170,000 17,000,000

Gate Numbers 1x1 10x10 100x100 1,000x1,000

Serial Digital 82,500 82,500 82,500 82,500

Partially Parallel Digital 55,000 550,000 5,500,000 55,000,000

Fully Parallel Digital 140 14,000 1,400,000 140,000,000

Fully Parallel Analog 17 1,700 170,000 17,000,000

(Area: Number of Transistors)

Today’s Technology - 0.35 m CMOS

Time 1x1 10x10 100x100 1,000x1,000

Serial Digital 7.472 747.2 74720 7,472,000

Partially Parallel Digital 57.72 577.2 5772 57720

Fully Parallel Digital 330.63 330.63 330.63 330.63

Fully Parallel Analog 18.68 18.68 18.68 18.68

Area 1x1 10x10 100x100 1,000x1,000

Serial Digital 2.476 2.476 2.476 2.476

Partially Parallel Digital 2.68 26.8 268 2680

Fully Parallel Digital 0.362 36.2 3,620 362,000

Fully Parallel Analog 0.000868 0.0868 8.68 868

Speed (ns)

Chip Area(mm2)

Area and Time Requirement for 0.35-m CMOS Process

1.00E+00

1.00E+01

1.00E+02

1.00E+03

1.00E+04

1.00E+05

1.00E+06

1.00E+07

1.E-04 1.E-02 1.E+00 1.E+02 1.E+04 1.E+06

Area (mmxmm)

Tim

e (n

s)

Serial DigitalPartially Parallel DigitalFully Parallel DigitalFully Parallel Analog1x110x10100x1001000x1000

Area and Time Estimation for 70-nm CMOS Process

Time 1x1 10x10 100x100 1,000x1,000

Serial Digital 2.596 259.6 25,960 2,596,000

Partially Parallel Digital 20.05 200.5 2,005 20,050

Fully Parallel Digital 114.87 114.87 114.87 114.87

Fully Parallel Analog 6.49 6.49 6.49 6.49

Area 1x1 10x10 100x100 1,000x1,000

Serial Digital 0.099 0.099 0.099 0.099

Partially Parallel Digital 0.1027 1.027 10.27 102.7

Fully Parallel Digital 0.01448 1.448 144.8 14,480

Fully Parallel Analog 0.00003472 0.003472 0.3472 34.72

Speed (ns)

Chip Area(mm2)

Area and Time Requirement for 70-nm CMOS Process

1.00E+00

1.00E+01

1.00E+02

1.00E+03

1.00E+04

1.00E+05

1.00E+06

1.00E+07

1.E-05 1.E-04 1.E-03 1.E-02 1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05

Area (mmxmm)

Tim

e (

ns

)

Serial Digital

Partially Parallel Digital

Fully Parallel Digital

Fully Parallel Analog

1x1

10x10

100x100

1000x1000

Controller Initialization

How to find weights Simulation

Is this good enough? Recorded training

Simulation

Problem : Current chips are volatile Solution : FPGA

Time (Second)

NN W

eight

Engi

ne P

ress

ure

Recorded Simulation (current chip)

Error Decreases f = 400Hz = 0.0b = 0.1

•Error Decrease Signal

•Random Sequence

Controller Initialization

How to set weights Recorded simulation/training (current

chips) permanent analog weight Digital weight storage (FPGA, custom)

Permanent Weight Storage

Kahngand Sze

(?)

FG Devicesand Circuits

log

1989 1999

ETANN

Brooke,et.al

Shib ata/ Ohm i

Yang

STL S

AFGA

AdaptiveRet ina

ISD- Voic eRecorder

Digital Non-volitile Memories

1967...... .

EEPROM - FLASH

Past EEPROM NN

Permanent weight version of current chip

Permanent Analog Weight: Floating-Gate MOS

n nn pp

(n-well)

Regular CMOS

Floating Gate MOS

RWC module with Floating-Gate MOS

Digital Weight Storage

Custom digital chips Field Programmable Gate Arrays (FPGA)

Custom digital chips

13 bit programmable DAC 6-8 bits probably enough Expensive/slow to develop

Field Programmable Gate Arrays (FPGA)

Reconfigurable Flexible Low-cost design cycle

1992: First ANN on FPGA 30 of XC3090 (8000 gates each) used Each neuron with 14 synapses:2 FPGA + 1 EPROM

Today: very high density FPGAs with partial dynamic reconfiguration made possible ( >3 million gates)

RRANN

Run-time Reconfigurable Artificial Neural Networks (RRANN)

Time sharing the limited computing resource.

Conclusion

FPGA technology ready Faster development Plan to adapt current test setups

Plan to attempt weight initialization Recorded simulation/ training (current

chips) Digital weights (FPGA)