MID-PROTOTYPING OF ARTIFICIAL NEURAL NETWORKS
BY
ROGER &W. NG
A Thesis Subrnitted to the Facuity of Graduate Studies in Partial Fuifiliment of the Recphments
for the Degree of
MASTER OF SCIENCE
Department of Electncal and Cornputter Engineering University of Manitoba
Wepeg, Canada
National Library Bibliothèque nationale du Canada
Acquisitions and Acquisitions et Bibliographie Services services bibliographiques
395 Wellington Street 395, nie Wellington OctawaON K 1 A W W O N K 1 A W Canada Canada
The author has granted a non- exclusive licence dowing the National Ll'brary of Canada to reproduce, loan, distri'bute or sen copies of this thesis in microform, paper or electronic formats.
The author retahs ownership of the copyright in this thesis. Neither the thesis nor substantial extracts fkom it may be printed or otherwise reproduced without the author's permission.
L'auteur a accordé une licence non exclusive permettant a la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfiche/nlm, de reproduction sur papier ou sur format électronique.
L'auteur conserve la propriete du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.
A Thesis submitted to the F a d t y of Graduate Studics of the Univezsity of Manitoba in partid fulfillmcnt of the nqPircments of the degree of
Pamission has b e n granted to the UBRARY OF TEE UNIVERSITY OF MANITOBA to lend or d i copies of this thesis, to the NATIONAL LIBRARY OF CANADA to microfilin this thesis and to Icnd or dl copies of the nIm, and UBRARY MICROFILMS to publish an abstrrct of this thesis.
The author -es otha publiation rights, and natha the thesis nor extensive m e t s from it -y be pdnted or 0th-wise reproâuced without the authods written pumirsion.
I hereby declare that I am the sole author of this thesis.
I authorize the University of Manitoba to lend this thesis to other institutions or
individuals for the purpose of scholarly research.
Roger K W. Ng
1 fiirthermore authorize the University of Manitoba to reproduce this thesis by
photocopying or by other means. in total or in part. at the request of other
institutions or individuais for the purpose of scholariy research.
Roger K W. Ng
The University of Manitoba requires the signatures of ail persons using or
photocopying this thesis. Please sign. and give address and date.
ABSTRACT
This thesis explores Field Programmable Gate Grray [FPGA) implemenbtlons of
artIficial neural networks employhg pulse-code arithmetic. Pulse-code arithme-
tic uses values encoded as probabiiitic pulse streams. Artifidai neural networks
employing pulse-code arithmetic require only simple digital logic gates to per-
form multiplication and addition which are the essential operations for these
networks. As such, pulse-code techniques offer considerable potential to con-
struct vexy large neural networks us- FPGA technology. The imp1ementation
results presented in this thesis show that each neuron and synapse element use
an average of 23 CLBs on Xilinx XC4000 series FPGAs for the XOR pmblem. One
of the advantages of using FPGAs for implementing neural networks is that they
aUow the overall network to be easily modifieci or replaced, by simply download-
ing new circuity. in addition, this thesis describes a top-dom design flow meth-
odology h m abstracted simulation through to implementation in Xilinx FPGAs.
The use of top-down design and FPGA technologies shorten the overall develop-
ment cycle. In addition, as these networks are extmmely compact there is the
potential for experimentation during prototyping.
ACKNOWLEDGEMENTS
When in your Me seems to be leading you towards some unknown yet
strangely faxdiar conclusion and its not un611 you reach the end that you realize
t h t coincidence has been conspiring to guide you into th& parücular window of
the eternal now. My three-year graduate studies was sort of ïike Ut- So thRnks
to ail the guides and earth angels who genemusly contributecl and supported
throughout my graduate studies.
1 would like especially thank m y advisor* Professor Robert McLeod for his advice,
encouragement. and assfstance throughout this thesis.
I also would iike to aclmowledge ail the members of the VLSI Laboratory at the
University of Manitoba. specifically Hart Poskar, Dean McNeill. Richard Wieler
and Ken Ferens. in addition, 1 wodd iike to acknowledge the support of
Cornputer Senrtces at University of Manitoba.
Support provided by the Naturai Sciences and Engineering Research Co& of
Canada, the Fedezal Government's Centers of Excellence Micronet Pro- and
the Canadian MicrOelectronics Corporation îs highly appreciated.
TABLE OF CONTENTS
CHAPTER 1
Introduction ................................................. 1 ...................................... ArüfldalNeuralNetwork 3
S v .................................................. 6
Org~tionoftheThesis ..................................... 7
CHAPTER 2
.................................... Pulse4ode Neural Networks 8
.......................... Fundamentals of Pulse-Code Arithmetic - 9 .................................... Puise Stream Addition 11
................................ Pulse Stream Multipffation 13 .................................. Pulse Stream Generation 13
.................................. Pulse-Code Neural Network -17 EUse-code F e e d f o d Neural Network .O..................,.. 18
......................... I h i d n g Pulse-Code Neural Networks 20
TABLE OF CONTENTS
Simulation of Pul2)8.Icoâe Neural Networks ........................ 24
Simulation Environment .................................... - 2 4
................................ Cheque Character Recognition -31
Implementation of PuIse-Cade Neural Netwocks in Xilinx FPGAs ...... - 35
FPGAs Implementation ...................................... 35 Modirlat Design of Pulse Stream Neural Networks .............. -35
................................... Neuron Synapse Units -37 ......................................... Re-randomizer -38
WeightResolution ....................................... 39
................ Overview of Xiïinx Field Programmable Gate Arrays - 4 0
.............. Design Flow of Pulse Code N e d Network Haràware - 4 3 ...................................... DatabaseStructure 45
............................ Xerion Neural Networksimulator 46 VHDLCode ............................................ 46
............................... Synthesis and O p m t i o n - 4 8 ....................................... Design Vexification 49
................................... Neocad FFGA Foundry - 5 0
..................................... FFGAs Design Examples 51 ........................................... XOR Ploblem 51
........................................ Encoder Roblem 55 ............................ Cheque Character Recognion - 5 6
TABLE OF CONTENTS -- - -
CHAPTER 4
Conclusions and Future Work ..........................O......- - 5 9
APPENDlX A
.......................... Targethg VHDL ûesign to Xllhx FPGAs - 6 2
.................................... Wait for XX ns Statement 63
....................................... AfterXXnsStatement 63
.............................................. rnitialVaIues 64
......................... Order and Group Arithmetic Functions -64
.................................... Xilinx Name Co~ll~entions -64
...................................... Latches and Registers - 6 5
.................. Implernenting Multiplexers with Tkïstate Bu&rs -67
APPENDlX 6
................................. Xilinx Oeviœ Quick Refemnces -71
LIST OF FIGURES
Figure 1.1:
Figure 2.1:
Figure 2.2:
Figure 2.3:
Figure 2.4:
Figure 2.5:
Figure 2.6:
Figure 2.7:
Figure 3.1:
Figure 3.2:
Figure 3.3:
Figure 3.4:
Figure 3.5:
Figure 3.6:
Figure 3.7:
Figure 3.8:
Multi-layer feedfolward network ....................... - 4
Random pulse sequeme .............................. 10
Unipolar pulse stream repmsentatio on, .................. - 1 1 OR gate addition ................................... -12 Example of the OR-gate addition and AND-gate multiplication . . 13
Block diagram of a pulse stream generation .............. -15 Rate multiplier schematic ............................ -17 Negative and positive weight, ......................... -20 The user interface of the pulse-code neural network simulator -25
................. Training evolution of the XOR probl exn. -26
The impact on dtvision on network training. ............... 28
................... Training for the four bit parity problem 29
Training for 8-6-8 encoder ............................ 31 Sample chcque ..................................... 32
.............. Training for the cheque character recognition 33
Input activation for the cheque characte r recognition problem . 33
UST OF FIGURES
Figure 4.1. The top-levd of pulse stream neural networks ............. 36
Figure 4.2. Neuron Synapse Unit ............................... -37 Figure 4.3. Block diagram of the re-randomizer drcuit- ............... 38
Figure4.4: XillnxFPGAsarchitecture . . ,. ........................ - 4 1
F ï 4 . 5 . XC4000CLB ...................................... 42
Figure 4.6. EUse-code neural network design process ..,............ -44
Figure 4.7. Database organization ..................,........... -45
Figure 4.8. VHDL testbenches .................................. 49
Figure 4.9. XOR simulation results ............................. - 5 2
. ...................... Figure 4 IO: The FPGA layout of XOR problem -53
................ Figure 4.11. The timing report of theXORFPGAdesign 54
...................... Figure 4-12: The FPGA layout of 5 4 5 encoder 55
........... Figure 4.13. The FPGA layout of cheqye character recognition 57
FigureA.1. Latchinference .................................... 66
......................... Figure k 2 : Lat& implemented with gates 66
.................... Figure A.3. Implementtip 5-to- 1 MUX with gates 68
.................... Figure A.4: 5-to- 1 MUX iniplemented with gates -69
.................. Figure AS: Implementing 5-to-1 MUXwith BUFTs -69
................... Figure A6: 5-10-1 MUX implemented with BUFTs 70
LIST OF TABLES
.............. Table2.1. C o ~ l ~ t r u c t i o n o f C A w i t h ~ c y c l e l e n g t h -16
..................... Table A. 1: D latch implementation cornparison -67
................ Table 8.1. XilinxDevices. Packages andspeed Grades -71
Chapter 1
Introduction
Aïmost evplrvthinR in the field of neural networks has been done by simu-
lating the netmorks on serial computers. There has been comparatively liffle
study of hardware implementations. General purpose computers are not opti-
mized for neural network calculations: they require specialized hardware in
order to utilize the iaherent parallellsm. The alternative approach is to build
spechi hardware for neural ne0worIcs on a single chip or multi-chip system. A
neural network architecture can be implemented as an integrated circuit using
analog. digital. or mixed nnsiiog/digital structures. nie analog circuit^^ permis
high density implementation as the multiplication is based on rnodiiied Gilbert
multipIiers (1) and sunumation on Kirchhoff s current Iaw. However. d o g hard-
ware does not produce high accuracy arithmetïc and the dorage of analog
weight values re~uired for the synapse is drnicult. On the other hand, the digital
hardware can perform arithmetic operations with a bigh degree of accuracy and
the storage of the we5ght values is easy in the digital form. Nso. digital hardware
can take advantage of some of the benefits of current VLSI technology. such as
weii understocxi and advanceci design techniques. as w d as prototying in Fiefci
Prognunmabie Gate Array FPGA technologies. Howewer one of the major con-
straints of digital implemcntatïons of neural networks is the amount of circuity
required to perform the multiplication, This problem is especially acute in high
speed digital des-. where paralleeï multipïiers are errtremeïy expenskve in
terms of cirCUIty. Adopting an equivalent bit serial architecture sienificantiy
reduces thls complerdty. but stiil tends to resuit in large and complex designs. In
addition a single multiplier would consume a significant proportion of a current
state of the art FPGA. thus making the use of such devices impractical for this
approach.
This thesis describes an alternative neural network architecture which may be
implemented using standard VLSI technology. but also maps extrema effi-
ciently to FPGAs such as those of Xilinx [2]. The central idea is to represent the
real-valueci signais passing between neurons using encodeci binary pulse
streams. Pulse streams arithmetic req, O* simple digital logic gates to per-
form multiplication and addition. The main advantage of such an approach is
structural simplidty of the arfificiai synapse and neUron, comparable to analog
implementations. thus aiiowîng vny -dent space usage of the fme grained
FPGAs.
The work presented in this thesis examines pulse-code neural network imple-
CHAPTER 1 - lntroâuctbn
mentations on FPGAs using top-down design methodology, In the remaïnder of
this chapter background material of neural networks is presented which wili
serve to familiarize the reader with some of the general concepts, Thfs will help
estabhh a common reference h m which to base the discussions in the later
chaptets.
1.1 Artif icial Neural Network
Modern neural network theories can be traced backa d to ideas first intro-
duced in the 1940s and 1950s. in 1943. McCulloch and Pitts proposed a simple
model of navon operation [3j. This model attracted much interest because of its
simplicity. in the late 1950s. Rosenblatt developed networks that could leam to
recognize simple patterns t4]- The perceptron, as it was d e d , couid decide
whether an input belonged to one of two ciasses. A single neuron would compute
the weighted sum of bhary-type Inputs. subtract a tlueshold and pass the
result through a non-linear hard litnithg threshold that classifiecl the input. in
1969. Minsky and Papert [5] showed that a s& class of perceptrons could not
perform certain tasks in pattern recognition. The simple example is the exrfusiue
or (XOR] problem: a single output neuron is qgired to turn on if one or the
other of two input Unes is on. but not when neither or both inputs are on, 'ïhey
believed that structures with more layers of neurons could solve the prob1em.
but they could not find a learning rule ta train a multi-layer network With this
roadblocb. researchers left the neural network paradigm for almost 20 years. I t
CHAPTER 1 - lntioduction
was not until 1986, when Rumehart. Hinton and Williams introduced a new
learning algorithm. known as backpropagation (61 to the problem of the nemorks
âiscussed in Perceptrons [5] that neural networks regained the& popularity.
Neural networks are usu- characterlzed by the way in which neurons are
interconnecteci. There are two mjor classes of neural network topologies: mulli-
lagr feedfoonuanl networks and feedback networks, Feedback networks are
beyond the scope of tbis thesis which will focus on multi-layer feedforward net-
works on&. The general form of the multl-layer fedorward network consists of
an input layer, one or more hiciden layers, and an output layer of n m n s (see
Figure! 1.1)-
Output Layer
a e a a Hidden Layer(s)
Input Layer
Figure 1.1 : Muhi-layer feedfomard network.
Complete bipartite graphs are constructeci between each adjacent layer of neu-
-- -
CHAPTER 1 - Introduction
rons using weighted comctions. Data is presented to the input layer. each is
multiplied by a weight and summed at neurons of the connecting layer. These
weighted sums are then passed through a nxmsible nonïinear transferficncffon
forming the input to the folio- layer- This procedure is repeated until reach-
ing the output layer thereby complethg a fornard passe
Without a program of instructions a computer is a useless machine. The pro-
gram usuafhr instructs the computer to perform specific tasks on a set of input
data to create some sort of output. Therefore, the program is an essential part in
a computer environment. Neural networks are not programmed in the conven-
tional sense. they are Wht. Teaching a neural network cognitive knowledge is
basicaüy a modification of the synaptic weights according to some learning algo-
rithm or rule. nierefore. the knowledge or -program" of a neural network is in
the weights. There are two main types of learning rule: mpemised learning and
unsuperviseci learning. In supenrised learnlag, an example set of input/output
pairs is necessary, and the error between the actual response and the target
response îs used to correct or modify the network. in contrast to supervised
learning, unsupenrised learninq is not @en any information about whether its
outputs are right or wrong. Instead. the network must decide what characteris-
tics of the training set are relevant, and modlfjr the weights in order to extract
those features. This thesis wiil focus on the s u e learning neural network
A popular supervised leaming algorithm is the baclpmpugation leamhg ulgo-
rUhn which was introduced by Rumelhart. Hinton and Williams in 1986 16).
CHAPTER 1 - Introduction
The backpropagation learning algorithm involves the presentation of a aaining
set of input/output pattern pairs. The objective is to [ind a set of weights that
ensures that the output produceci by the network is the same as, or close to. the
target output pattern for each of the input patterns. During the backwcd pass
Oearning phase). the actual output Is compareci to the target output and an error
vector is created. These m m are then backpmpugated through the network
modi3ring the connection weights according to an iteratitre grnAt'clnt descent aigo-
rithnt After many iterations of the training set the connection weights settle to a
local minimum of the output error over the training set. Better miriima may be
found repeated trainhg with randomiy selected initial coxmection weights.
2.2 Sumrnary
This chapter has provided a quick overview of the advantages and disad-
vantages of analog and digital implementaffons of neural networks. In order to
implement these networks in FPGAs. the area of the circuity is the most impor-
tant criteria. The use of pulse stream arithmetic was proposed as it allows low
area implementation of the hardware required to perform the arithmetic for neu-
ral networks. A brief history of neural networks. neural network topologies and
learning d e s was overviewed. Multi-layer feedforward networks and the learn-
ing algorithm known as Backpropagation was a h discusseâ.
CHAPTER 1 - Introduction
2.3 Organization of the Thesis
The II& chapter discusses the implementation of pulse-coded neural
networks. It begins wi th the fundamentais of pulse stream arithmetic foiiowed
by a discussion of app- pulse stream arithmetic to neural networks. Chapter
3 presents the software simulation of these networks. Chapter 4 describes the
design process of the puise-coded neural networks ont0 Xilinx FPGAs. F W y
conciusions are d r a . and proposab for future work iç presented.
Chapter 2
Pulse-Code Neural Networks
Pulse-code neural networks use pulse streams to perform the network
calculations. The idea of using pulse streams to communkate information
berneen nemns is rno-ted from biologicai models. although biological pulse
streams are much more complex thRn the simple pulse representation consid-
ered in t h thesis. The main motfvattion here of applying pulse-code arïthmetic
to n e d networks is the ability to implement high density arïthmetic operations
using digital drcuitry. Specifically, pulse-code representations are able to use
simple âigital hardware to perfonn addition and multipïication. which are the
two most important operations in a neural network Also. pulsecode implemen-
tations offer the advanttages of both analog and digital computation. Like analog,
pulse representation requires O- one Une to carry the values, and the size of
the hardware [digital gates) needed to perform adthmetic computations is com-
parable to analog hardware. Like digital. the design methods and implementa-
tions are well established.
In this chapter the fundamentals of pulse-code arithmetic are presented and
this discussion Ieads to application of pulse-code arithmetic for neural net-
works.
2.1 Fundamentals of Pulse-Code Arithmetic
The fùndamental idea of pulse-code arithmetic is to use probabilities to
carry information (7. 81. Here the probability p is de5ed experimentally by con-
sidering the fkequency of the occurrence of an event (pulse in a tfme dot). A
smaU number of time slots results in an erroneous assessrnent of the probability
and the number which it represents. in the iimiting case of an inhite number of
time slots: if there are n pulses in N slots for a @en thne. and if n/N tends
towards a iimit as N + - we set
Figure 2.1 shows a synchronous randorn pulse SeQuence. At the top of the Fig-
ure 2.1.3 pulses are in 10 time slots, leading to the condusion that the number
transmitted is 0.3. At the bottom, 3 puises are arrangeci in différent time slots.
leacling again to 0.3. The order of the pulses in the time slots does not affect the
outcome of the representation. The probability can be transformecl into some
physical quantity by an appropriate mapping. This thesis only considers a ünear
mapping. although it shouid be noted that nonlinear mappings exist which per-
mit computations with numbers in an infinite range with logarithmic error char-
CHAPTER 2 - PulsbCoda Neural Nstnrodcs
actesistics [8]. There are two Iands of ïinear mapping: unipolar mapping and
maPPing,
3 in 10 (Average) + 0.3
DSerent arrangement of3 in 10 (aIso -> 0.3)
Figure 2.1 : Random pulse sequence
Unipolar linear mapping is used as the implementation method in this thesis. In
-polar mapping, the values are encbded bebveen O and 1. An example of uni-
polar pulse represemtation is shown in Figure 2.2. The value of the unipolar
pulse stream is represented by number of pulses. n, being ON withia the time
intemai divideci by Iength of the tfme interval. N, or
A unipolar pulse stream with N bits can represent N+ 1 u n i ~ u e values. For exam-
ple. a 10-bit puise stream can represent 11 values from 0-0 to 1.0. in increments
of O. 1. The resolution of value depends on the length of the pulse stream. The
longer the puise stream. the higher the resolution that can be achieved.
10- bit Pulse Stresms Unipolar Pulse Vaiue
Figure 2.2: Unipolar pulse Stream representation.
2.1.1 Pulse Stream Addltfon
The addition of two unipolar pulse can be pesformd by using an OR gate.
Howwer. OR gate addition does not perform exact addition with puise streams
because of limitations imposed by the representation of the pulse streams, fur-
thermore it camt handle a sum greater than 1. The output of the OR gate is
aven by
Thus for A<<1 and B e ~ 1 the AB tum is small and the output of the OR gate is
appmrdmatw A + B. For large A and B. the output of the OR gate saturates to 1.
This result of a saturathg nonlinearity will be useful to the impllementation of
neural networks. The output for an OR gate with n inputs is @en by
Output = l - r I ( 1 -in)
The n-input OR gate can be easily imp1ernented in hardware by using wfred-OR
logic. Figure 2.3 shows the output probabil@ of the OR gate addition. As the
number of the inputs increases, the output of the OR gate addition saturates for
a greater range of inputs. Also. as the value of average input increases. the out-
put of the OR gate addition saturates. Therefore. it is desirable to keep the fàn-in
of the inputs as wd as the value of the input small.
r i
Figure 23: OR gate addition.
2.1 1 Pulse Stream Multiplication
To multiply *O unipolar puise streams. an AND gate may be used if two
puise sequence are statistfcally uncorrelated 181. Since the unipolar value is
always less thsin or eQual to one. the product of two nurnbers is guaranteed to be
at most one and the result will not saturate as it did with OR-gate addition,
Asnm&qg the pulse Seqllences A and B are statisticalïy uncorrelateci, the output
SeQuence of an AND gate multiplication is given by
A n B = A B (2.4)
Figure 2.4 shows an example of the OR-gate addition and AND-gate rnuitiplica-
tioa
-Dm u -D- IU.
Figure 2.4: Example of the OR-gate addition and AND-gate multiplication.
2.1 -3 Pulse Stream Genemtion
An essentfal component of pulse-code arithmetic is the generation of the
pulse streams for use in the arithmetic operaions. As mentioned earlier, the
information is carrieci by the probabiiity of occurrence of a ON logic Ievel within
a time d o t Each logic level is generated h m a random variable, and the statis-
ticaIl.y independent resuits form a pulse sequence whose average pulse rate is
determined by the variable to be represented. It shouid be noted that the validity
of the pulse streams arithmetic relies heavily on the assumeci property of statis-
ticai independence between operating variables, Hence, of vital importance is
generators for the provision of independent tiniformly dismbuted random num-
bers*
In general, a random puise stream is generated with a unrform random number
generator and a digital comparator. Figure 2.5 shows a block diagram of a rate
multiplier to generate weighted puise streams. The following procedure can be
used to produce a raadom pulse stream with probabiiity P(ON)=W
Generate a random number R such that O S R 9 1 .
if W>R output a 1 else output a O.
in digital hatdware R and W are usually represented as binary integers. If the
maximum possible weight value is M. and the value stored in the weight register
is W. then the probability of a pulse should be NONI = W/M.
Digital Comparator Mse Stream Output P(0N) = weight
Weight R e g i s a
Figure 2.5: Block diagram of a pulse stream generation
A common technique to genenite a pseudorandom nimiber in digitai hardware is using a
hear feedback shift register (LFSR)[9]. Another method to produce a digital random
number is to employ a partïcular configuration of a one-dimensional Cellular Automata
(CA) atray[lO]. Hortensius[ll] has shown that certain arrangements of CAS pos-
sess maximai length SeQuences with superior random nuxnber properties corn-
pared to the LFSR A CA is a set of registers whose next state is governed by nearest
neighbour connections. A CA can yield a maximal-length binary sequence b m each site
(Le. 2n - L ). Wre the maximal-length LFSR by combining d e s 90
ai ('+ 1) = ai- ('1 ai+ (0
CHAPER 2 - Pulse-Cde Neural Neîworks
and rules 150.
a i ( t+ 1) = ai-l ( t ) @ a i ( t ) ( f )
where ai ( t ) is the value of the register at position i at thne t
The ordering of the d e s for construction of a mardmal-Iength binary sequence
is irregular. with complexity simikir to that involveci in determinhg the polyno-
miaï for a maximal-length LFSR Table 2.1 gïves a sample of possible co11struc-
tions for producing CAS with maximai cycle length up to length 15. Here. '1"
refers to CA rule 150 and 'Ow r e f i to CA rule 90. Hence. a 1ength-5 maximal -
length CA wodd be constructeci by using d e s 90 and 150 in the following
0rde.r 150, 150,90,90 150.
Table 2.1 : Construction of CA with maximal cycle length
Length n Construction Cycle Length
-
Figure 2.6: Rate multiplier schematic
A rate multiplier can be used to compare a binary set of wedghts and a set of rau-
dom bit streams with P(ON)=0.5, and produces a weighted bit stream with
N- 1 P(ON)=w/2 . The rate multiplier as shown in Figure 2.6 originated out of
research in VLSI pseudo-dom test pattern generatfon [12].
2.2 Pulse-Code Neural Network
The idea of using pulse streams has been tried by Tomberg and his CO-
workers, who pubiished two papers on neural networks using pulse-density
modulation [13. 141. The implementation was a Hopfield type fUlly connecteci
neural network architecture based on bipolar pulse density modulation. Tomlin-
son (15) and Dickson (161 aiso studied in-situ learning neural network using
unipolar pulse streams. The work presented in this section is based on the
research fkom Tomllnson and Dickson. extendhg the design methodology for
these irnplementations.
2.2.1 Pulse-code Feedforward Neural Network
The basic cornputalional operations requLred in feedforward neural networks are
multiplication, summation. and a non-linearity fiinction. Each neuron cornputes
a weighted sum of its inputs h m other neurons* and passes this summation
through a noniinear function to produce an output This output forms the input
to the foliowing layer. TomlinSan [15] has proposeci a new neural activation fùnc-
tïon whxe the summation and the nonrinear activation are performed sirnuita-
neously us- the OR logic. As mentioned earlier the OR gate addition saturates
to 1 with either an increase in the number of inputs or an increase in the value
of those inputs. The saturating effect of the OR gate addition requires no extra
hardware to implement a nonlinear activation function. Also, the logical OR can
be easiïy implementeci in hardware using wired-OR logic, The multiplication of
the weight (wu) and the input (oi) are performed with a simple AND gate as pre-
viously describeci, assuming that these pulse Sequenees are statisticaliy ucor-
related.
Let ni be the probabîlity of a puise occurrence in the output SeQuence of an n-
input OR gate. The inputs of an OR gate are the product of w, and oi produced
CHAPTER 2 - PulssCods Neural Nctworks
from the AND gates. This Is represented mathematicdy by the following equa-
tion:
Since the unipolar nature of the pulse stream representation does not support
negative values, each synaptic weight is separateci into two distinct nets: the
e x c h t o r y and the inhibüorg nets. Therefore, there are two dedicated wired-OR
iines per neuron ANDed togethex to form the activation fiuiction and the net
inputs variables are defineci as
Each neuron f combines the excitatory net input and the inhibitory net input
ni to determine the neumn output oj. Since thue Is no means to perform sub-
traction in pulse-code arithmetic. the net output of the neuron is not simplly
4 - oj = ni - nj . Moreover negathre and positive! nets would requin the accommoda-
tîon of negative neuron outputs. If the excitatory net input n,? and inhibitory net
input are statistïcally uncolTelateci. the probability of output pulse occur-
rence oj is
Rapîd-Pmtotyping of Artifltlrl Neural Netumrlo 19
CHAPTER 2 - Pulm-Code Naumt Networks
While the mathematics implies that a weight can have a positive [w;) cornpo-
nents and a negative [ w i ) components, it is not necessary to accommodate both
simuîtaneousIy. T'herefore. it ody requires one regîster to store the weight value
and one bit to indicate the sign of the weight. The hardware required for this
computation is shown in Figure 2.7.
Wbgbt Siga Bit (positive = 1)
Figure 2.7: Negafive and positive weight.
29.2 Training PuIse=Code Neural Netwoiks
Because a pulse-code neural network uses a non-traditional activation
fiinction. it is instructive to consider a modifieci learning algorithm where this
activation fiinction is acorporated into a popular leaniing algorithm. Equations
2.5 and 2.8 are continuous and difllerentiable. inüicating that the backpropaga-
tion learning algorithm can be used for training. Backpropagation is an iterative
technique that performs a gradient descent, typicaily over a sum squared error
measure:
where t j is the desired output and oi is the actual output of the neuronf. The
weights should be modlfied dong the negatkve gradient of this error with respect
to each weighk
The goal of the backpropagation learning algorithm is to reduce the total
error by adjusting the weights. Since the output of the neurons are computed
fiom the excitatory nets and inhibitory nets. the derivative must be considered
separatdy for positive and negative weights. Using the chain rule. the positive
and negative equations go- the change of weights is as follows
Let us define.
The result of the positive and negative equations become:
Equation 2.9 shows that the error at the output neurons is sirnply the diner-
ence bemeen the training data and the netftrark output:
For the hidden layers. the aror is propagated back through the network.
Each of the K output neurons Is connecteci to hidden neumn j. and wtll contrib-
ute to th& error. This error has two components. h m the excitatory and inhibi-
tory net inputs to each output neuron.
Therefore, the result of the error for the hidden neuron becomes
Summary
This chaptes has presenteà a method of arithmetic using pulse streamç.
As discussed pulse streams aiiow computation using only simple gates. in addi-
tion this chapte-r has discusseâ the suitabillty of pulse stream implementations
for neural networks. The theoreticai -sis showed that the backpropagation
learning algorithm can be used for training this networlr, using the OR-gate acti-
vation function as a continuous and differentiab1e non-iinear fiuiction. It may be
possible to implement neural netwarks of high density due to the use of simple
digital gates for pedorming arithmetic.
Chapter 3
Simulation of Pulse- Code Neural Networks
This chapter examines the simulation of pulse stream neural networks.
Four examples WU be discussed in this section. The fht three examples are 'toy
problems" which are often used for tes- and benchmarking neural networks-
Typtcaily the training set contains ali possible input patterns so there is no
question of generalization. The iast example is a more real-world classification
problem in which a network is trained to recognize the digits of a cheque book.
3.1 Simulation Environment
The nettirrorks cari be simulateci on two levels: probability and pulse
stream. The probability l d models the network activation as probabilities: the
pulse level mdeis aU activations as pulses and direcüy emulates a hardware
impîementation of this network. In our simulator the probability mode1 is chosen
for the training of the networks. since it produces more accurate results without
worying about the hardware limitation of the network, A backpropagation algo-
-- -
CHAQTER 3 - Simulation of Pul- Neural Networks
rithm is employed for training as describeci in the previous section.
l i
Figure 3.1 : The user interface of the pulsecode neural network simulator
The training of the pulse-code neural network has been pedormed off-Une using
C code interfaceci with the m o n neural network simuïator library [17]. Xerion's
iibraries contain routines for constmcting. disphying and trainhg nemorks.
The software also contaias routines to graphicaiiy display neuron outputs and
weights using Hinton dhgrams. The pulstcode neural network simulator mit-
ilapid-Pmtotyping of Artiftcial Neural Networlirr 25
ten using the Xerion iibraries has access to a CO-d iine interface with built
in commands for: creatïng and training networks, examining and modlfling data
structures. as weiî as mfsceiianeous utilities- Once the traidng is completed. the
simulator can generate a network description fiie which Lncludes the network
topdogy and the weights of the synapses for hardware impiementation,
3.2 XOR Problem
The XOR problem [6] is a fi.equently appJied test of neural networbs since
it cannot be SOM by a single layer network because it is not a linearly separa-
ble problem- A network consfsting of two input neurons, two hidden neurons,
and one output neuron was applied to the pulse-code neural network
Figure 3.2: Training evolutiDon of the XOR problem.
Figure 3.2 shows the learning curve of the XOR problem trained with the regular
CHAPTER 3 - Simulation of Pulse-Coâe Neural Nctworîcs
and pulse-code backpropagation algorithmw. It has an interesthg resutt that the
pulse-code network learns the XOR problem almost 10 times faster than the
normal backpropagation network The normal backpropagaffon network uses
the sigrnoid function as the nodinar transf- fitnction. while the pulse-code
network uses the wired-OR gate addition saturation to obtain the nonlbearity.
in addition. the initiai weights of the puise-code network must be very srnail to
prevent immature saturation. in general the wired-OR gate addition saturation
is not uniform throughout the network. depeadbg on the number of inputs to
the neuron (see Figure 2-31.
The karning equations for pulse-code netmorks contain dtvisioa Aithough divi-
sion can be performed using pulse-code arithmetic [81, it is undesirable for a
number of reasons. For esample there are problems associatecl with division by
zero, in addition to being a time consuming operation and requiring more hard-
ware. Dickson[l81 has proposed to omit division for training pulse-code network
in order to reduce hardware overhead for imp1ementing on chip learning capabil-
ity-
To determine whether dtvision is necessmy the nebworks were trained with the
division operattïon ornitteci. The resuits of the simulation are shown Figure 3.3.
Aithough the XOR problem trained in more cycles without the division. in con-
trast again to regular baclcpmpagation ft stiîl r c q M ody haif the time. The
resuîts show that eliminating division does not impair the abiïity of the network
CHAPTER 3 - Simulation of Pulse-Code Neural Natworks
to minirnize the error.
r t
(a) T ' g fiw XOR problcm wïth dciwaminamr (b) TraiDing for XOR pmblan arithout denomiriator
Figure 3.3: The impact on dision on network training.
3.3 Parity Problem
The parity problem is weil knmm to researchers doing performance eval-
uation of neural ne-rks. This problem is essenthly a generallzation of the
XOR problem to N inputs 151. Here the nehwork is presented with binary inputs.
0's and 1's. and the s&@e output neufon must output a high value if the input
pattern has odd parity (an odd number of 1's). and a low value if the inputs have
even parie. The parity problem is one of the most difficult problems for an artifi-
ciaï neural network to solve, because the netftrork must look at ail the input sig-
nais in order to detemnine whether the pattern has even or odd parîty. This is
untypid of most rcal-worid ctassification problems which usually have mu&
more rtgularity and allow generaiïzation within classes of simiiar input patterns.
Four inputs were used in this study. I t is k n m that a network with four hidden
neufons[l9] can sohre this problem using the standard backpropagation algo-
rithm. but the puise-code network M e d to do so because of the timiation of the
pulse stream representation. By tryfng different network topoIogies. the firial
network topology for the four bit parie problem consisteci of two hidden layers of
12 and 8 uni& respectkvely. Figue 3.4 shows the error during training.
Figure 3.4: Training for the four bit parity problem
This problem has shown a iimitation of the pulse-code representation. specifi-
cally the limited range of the synaptic weights, The other potentfal problem is
usîng the OR gate for addition- As the number of inputs to a ne- (the OR
gate) fncreases. the probabiüty of a 1 output for a given set of inputs rises. This
tesuit suggests that the weights in a neural network must be very s d to pre-
- - --
CHAPTER 3 - Simulation of Pulse-Code Neural Neîworks
vent the neuron 6rom constantly saturating. The accornrnodation of the iimited
synaptic connections and premature neuron saturation is cruciai for the suc-
cess of puise-code networks. While the cornplaty of the network is greater for
these ne-orks, the harâware compleAity is still significantiy less than the con-
ventional digital nehworks. Since the hardware requirement is simiificantly less.
the cast of adding more computing elements is not severe,
The general encoding problem invohres fincIiTlp an efficient set of hidden
neuron patterns to encode a kuge number of input/output patterns. The
number of hidden neurons is intentionally made small to force an efficient
encoding. The speciûc problem usu* considered involves auto-association,
using identical umaqy input and output patterns. A three-layer network consists
of N inputs, N outputs and M hidden neurons. with M < N. This is often referred
to as an N-M-N encoder. There are exactly N members of the training set, each
ha- one input and the corresponding target o a and the rest off-
An 8 input encoder was useâ in this stuciy- Conventional backpropagation could
sobe this problem with an 8-3-8 encoder netarork [19]. The activation pattern of
the hidden neurons glve the binary representation of the training pattem
numbe!r. This can be achieveâ with the connection strengths pattemd after the
binaxy numbers. Clearly pulse cade networks wiîl fh.il as the weight connection is
- - -- - - --
CHAPTER 3 - Simulstion of Pul- Neural Neworlg
bounded by [- 1. + 11. in order to sohre this problem. a more complex puise code
network is needed. The thai network uses 6 hidden neurons, which is an 8-6-8
encoder. Figure 3.5 shows the learning of the 8-6-8 encoder. Once again. the
limited range of the synaptic weights is the roadblock of the Iearning capabïiity
in pulse code ne-ork
O M 10 60 80 100 120 140 180 180 200 Number of Epoches
Figure 3.5: Training for 8-6-8 encoder
3.5 Cheque Character Recognition
In the proceeding three examples the training set normally includes aU
possible input patterns, so no gmeralization issues arises. Cheque character
recognition is a more real-Me problem in which noisy patterns can be used for
testing the generallzation abiîity. Most cheques have some strange characters
- - -
CHAPTER 3 - Simulation of Pulse-coâe Neural Nstworlrs
indicating the account numbef. as shown in Figure 3-6. These characters are
mapped to a 5x5 maMx as the input to this nebvork with 10 output neurons
corresponâing to each of the possible chmcter (0.1. .... 9). One hidden layer of
six neurons is used to train the pulse-code network. The trainhg emr is shown
in Figure 3.7 and the input activation is show in Figure 3.8.
Figure 3.6: Sample cheque
CHAPTER 3 - Simulation of Pulse=Code Neural Natworks
~p
Figure 3.7: Training for the cheque character recognition
'O I g o - . 330 - . 6
B ,-
O
Figure 3.8: Input activation for the cheque character recognition problem
1 t r I 1 \ 1 O 2 4 6 8 1 O 12
Num ber of Epoches (xi d)
This chapter presented a number of simulation contrasting pulse-code
neural network implementations to more traditional networks. 'Rvo potentiai
problems arose using puise-code representations. The first concerns the use of
the OR gate for addition. As the nonlinearity of the OR gate depends on the
number of inputs. some of the neurons could immeturely saturate. This problem
can be SOM using multi-ïayered architecture and lirniting the number of
inputs to each neurone Rumelhart et. al [19J stated that *A simple method for
overcoming the fm-out limitation is simply to use multiple layers of units." The
second potentiai pmblem with pulse-code neural networks is the iimited range
of the weight connections. Increasing network topology or complexity could
accommodate these limitations,
Chapter 4
lmplementation of Pulse- Code Neural Networks in Xilinx FPGAs
This chapter describes the FPGA implementation of the pulse stream neu-
ral nelmorks. It starts off by describing the overall hardware architecture of
these networks, followeà by the brief oventiew of Xillnx XC4000 series FPGAs.
The design process of the network implementactan is then describeci in detail.
FLri;illy, several examples of the networks implemented in XilfIix FPGAs are
examineci.
4.1 FPGAs lmplementation
T b section discusses the implementation of pulse stream neural net-
works in FPGAs.
4.1.1 Modular Design of Pulse Stream Neural Networlcs
There are two main components required to construct a pulse stream
neural network a random number generator and a neuron/synapse element.
Figure 4.1 shows the block diagram of the network structure. Each neumn ~yn-
apse layer has one CA based randorn number generator. T'bis random number is
used to generate a weighted pulse stream of each synapse. The output of neu-
ron/synapse elements are passed to the inputs of next Iayer.
Figure 4.1: The toplevel of pulse stream neural netwoiks
- ---
CHAPTER 4 - Impiementaion of Pulse-Co& Neural Networks in Xlinx FPGAs
stream. This weighted pulse stream is ANDed with an input pulse stream fkom a
previous layer to produce a synaptic multiplication. The product fs transmitted
to an excitatory net-input Une or an inhibitory net-input Une. if the sign bit of
the weight is '0'. the product pulse stream is transmitted to an excitatory net
input Une. Otherwise, it is transniltted to an inhibitory net input line.
The neuron preforms addition of aU the net input signals h m synapses through
an OR gate. The outcome of these excitatory and inhibitory net input signals are
ANDed to form the activation signal. This signal passes through a re-randoniizer
circuit to generate the output puise stream.
CA Random Number
Figu- 4.3: Block diagram of the re-randomizer circuit-
- - - - -
CHAPTER 4 - lmpîementaüon of PulssCode Neural Networb in Xilinx FPGU
As mentioned in the earïier chapter the pulse stream arfthmetic relies heavily on
the assumed property of statistical independence between puise streams. The
re-randomizer re-orders the neuron output in order to prevent the correlation
between the output pulse streams fkom the earlier layers.
The re-randomizer (181 consfsts of an up-down couder and a rate multiplier. as
shown in Figure 4.3. The up-dom counter controis the density of the output
stream. If the output is high when the input is low, the counter is decremented.
If the output is low and the input is high. then the counter is incremented. tf the
input and the output are equfvalent then there is no change. The re-randomizer
uses the random number fi-om the neuron/synapse shifted one bit to the left.
The rate multiplier compares the counter vafue and the random numüer to form
the re-randomized output pulse stream.
4.1 .4 Weight Resoluüon
The weights are stored in the form of a . n-bit hctionai sign-magnitude
number. Each pulse represents the magnitude of 1/ 2" - . A 9 bit weight resolu-
tion is chosen, which has 8 bits magnitude and 1 sign bit Thus there are 2% -2
possible positive values. 28 - 2 negative values. and zero. Increased resolution
requires more hardware due to a larger random number generator* neuron re-
randomizer and rate muitipUer. EUm et ai.[20] has shown that a 9 bit weight pro-
vides an acceptable result for neural network classification and generalizatfon.
CHAmER 4 - Implemmtatbm of PulssCode Neural Networks in Xilinx FPGA8
4.2 Overview of Xilinx Field Programmable Gate Arrays
In 1985, a new technology for implementing digitai logic was introduced, Field
Programmable Gate Arrays -As). These devices could be viewed as a cross between
Ma&-Programmable Gate h a y s (MPGAs) and Programmable Logic Devices (PLDs).
-As are capable of implementing significantly more logic than PLDs, because they can
implement multi-level of logic, while most PLDs are optimiir:ed for two-level logic, W e
they do not have the capacity of MPGAs, they &O do not have to be custom fabricated,
greatly lowering the costs for Iow-volume parts and avoiding long fabrication delays. One
of the best know FPGAs is the X k Logic Cell Arrays(LCAs)[2]. In this section their
third generation FPGA, the Xilinx 4000 series will be discussed-
Xillnx FPGAs consist of a . array of uncornmitteci lagic elements that can be
interconnecteci in a g e n d way like MPGAs. I t uses sfatic RAM (SRAM) cells as
the programmable element so that it can be re-programmeci as many times as
the designer wishes. Figure 4.4 shows a typical architecture of a XiIinx FPGA It
is a symmeetrical array architecture, consisting of a two-dimensional array of
Con$gwabZe Logk Blocks (CLE3s) that can be connected by prcigmmmable inter-
connectfon resources. The ïnterconnect comprises segments of -, where the
segments may be of various lengths. Present in the interco~ect are programma-
ble switches that serrre to connect the CLBs to the wire segments. or one w h
segment to aneth= Ingic circuits are implemented in the FEGA by partitionhg
the logic into individual CLBs and then interconnecting the blocks as required
CHAPTER 4 - Implamcntaüon of PulseCode Neural Networks in Xilinx FPGAs
via the switches. The I/O B k k (IOBs) surround the boundary of the FPGAs.
providinp the interface between the packages pins and intemal signal
Figure 4.4: Xilinx FPGAs architecture.
A Xillnx 4000 series CLB. as shown in Figure 4.5. is made up of three b k u p
Tabks [LUTs), Iwo programmable flip-flops, and muitipIe programmable multi-
plexers. The LUTs aUow arbitrary combiriationai firnctions of their inputs to be
created. Thus, the structure can perform any fimetion of five inputs (using all
three LUTs. witü the F & G inputs identical). a m tWO functions of four inputs
(the txwo 4-input LUTs used independentiy). or some fûnctions of up to nine
inputs (using al1 three LUTs. with F & G inputs different]. SRAM controîled mul-
CHAPTER 4 - Impkmentaian of PukaNode Neural Natworks in Xilinx FPGAs
tiplexers then c m route these signais out the X and Y outputs. as well as to the
two flip-flops. The inputs at top (Cl-C4) provide the third input to the 3-input
LUT, enable and set or reset signais to the fiip-flops, and a direct co~ection to
the fiip-flop inputs. This structure yields a very powerfùi rnethod of implement-
ïng arbitrary, complex digital logic. Note that there are sweral additional fea-
tures of the Xilinx FPGA not shown in these figure, including support for
embedded mernories and carxy chains.
Figure 4.5: XC4ûûû CLB
The CLBs are surrounded by horizontal and vertical routing channels that per-
mit arbitrary point-to-point communication. AU internai connections are com-
posed of metaî segments with programmable switching points to implement the
desinxi mutinp. n e r e are three main types of inte~connect. distfnguished by the
relative length of th& segments: single-length Unes. double-length Unes. and
- --
CHAP'ER 4 - Implementatian of Pu- Neural N e t ~ & in Xilinx FPGAs
longhes. Single-Iength Unes travel the height of a singIe CLB. where they then
enter a switch mat* The switch matrix allows this Signal to travel out vertically
and/or horiu,ntally h m the swftch martk Thus, multiple single-length Unes
can be cascadeci together to txavel longer distances, Double-length bes are sim-
ilar, except that they travel the height of hm CLBs before enterjng a switch
matrix. thus double-Iength Ifnes are use- for longer-distance routing, travers-
ing two CLB heights without the extra delay and the wasted confimiration sites
of an intermediate switch matrix Fi.nally. Ionglines are lines that go haif the chip
height, and do not enter the switch matrix. Ln this way. very long-distance
routes can be accommodated effidentiy. With this rich sea of routing resources,
the Xillnx 4000 series is able to handle fair& arbitrary routing demands, though
mappings that emphasize local communication will still be handled more effi-
cientiy.
4.3 Design Flow of Pulse Code Neural Network Hardware
Aithough the Xerion neurai network simuiator is valuable tool for simu-
lating pulse-code networks. another goal of th& thesis is to search for a design
flow generatïng FPGA harchNare h m a high level network description. Idealiy
the design flow progresses h m the Xerion neural network simulation to the
generation of a Xilinx bit file for pro- the FPGA device. Xerion is used to
simulate and train the pulse-code neural nebwork iterate the design, and then
generate a nebwork description file including the network topology and final
CHAPTER 4 - lmpkmarrfaion of Puhe-Code Neural Networb in Xilinx FPGAs
XeRoa N e u d Network Simuiator
Autdogic
Synthesis and Optimitation
I
Figure 4.6: Pulsecode neural network design process
xerioa
NNtrainmg and simuiaaon w VHDLGenmLtbm*
--
CHAPTER 4 - Impkmarrtitiori of Pu- Neural NaWodm in Xilinx FPGAs
weights. Using thfs description, a custom 'C' program converts it to a VHDL
description. Mentor Graphics Top-Down Tools [21] are used for VHDL' compi-
lation, syntax verification, synthesis. opümïzation, and simulation NeoCad
FPGA Fouciry tools[22] are used to map the design to a physicai FPGA device
and to create the bit file for p r o g r a m . the Xillnx chip. Static timing anaiysis
of the placed and routed design is also done withïn the NeoCad tools. Finally, the
design information is back-annotateci to a Mentor Graphics database and func-
tioxdly tested against the top-14 VHDL testbench, The complete design flow is
shown in Figure 4.6
4.3.1 Database Structure
As the design goes thmugh various tools. it is vexy important to organize
the database properiy. Many procedural problems can be avoided by planning
the directory structure. Figure 4.7 iilustrates one example of how to organize
design database for the design,
1. VHDL - VHSIC Hardware Description Language is a language for designing integrated circuits.
CHAFTER 4 - Implemantation of PuhmCade Neural Networks in Xllinx FPGAs
The neural network design directoxy Ls divideci into five sub-directories for
Xerion simulation, VHDL source, synthesis and optimization. gate-level sche-
matic and the physicai FPGA iayout,
4.3.2 Xerion Neural Networû Simulator
Xerion ïs used to train and simulate the pulse code neural networlzs, as
mentïoned in the previous chapter. The input of the sirnulator is a text file with
the description of the network topology and example training data, The networks
train with the backpropagation algorithm. Once the training is completed,
Xerion generates a network spedfwation for hardware implementation. This
specification includes the topology and the weights of synapses.
4.3.3 VHDL Code
The netftrork spedcation generated from Xerion is converted into VHDL
code for hardware implementation. VHDL is a hqpage for designing Integrated
Circuits. which can describe the circuits at the behaviour and/or structure level.
In order to ensure that the VHDL code is syathesizable. the designs must be
described at Register Tramfer ïanguage (RTL] lwd. in addition. there are cer-
tain code seles to use when targettng Xillnx FPGAs. Appendix A provides design
hints for writing VHDL for Xillnx FPGA designs. A custom 'C" program is used to
generate the synthesizable VHDL code for XilInx FPGAs h m the network speci-
fication.
The VHDL description of the network is hie.rarchicalEy 0rganbm.i. The top-level
CHAPTER 4 - Implemantation of Pu- Neuml Networlœ in Xilinx FPG&
circuit represents the connections behween the neuron/synapse unîts and CA
randorn number generators. Kt is described in structurai VHDL, The newon/
synapse units and C h are described in RTL descriptions.
Each VHDL component must have an entiiy which defines the I/O of the model-
Each VHDL component in the design also has several architectures. For neu-
ron/synapse U t s and CAs, the RTL description of the circuit is fn an architec-
ture d e d RTL The top-Ievel circuit uses a structural VHDL mode1 which
should be in an architecture d e d sbuct,
0th- VHDL architectures are also necessary. In order to fàcilitate the aeation
of a hierarchid schematic for the top-1evd circuit, a dummy architecture has to
be created for each low-lem4 circuit (neuron/synapse elements and C h ) . These
dummy architectures have nothing between the BEGIN and END statements in
the VHDL. They are just place ho1ders for the schematic generation process. A
schem architecture is required of the top-level circuit. The architecture is the
same as the struct mhitecture except that it calls only the dummy VHDL archi-
tecture for the circuits beneath it.
Mentor Graphies' system-1076 compiler performs the syntax Che- and data-
base generatfon of the VHDL design. Once syntactidy correct. the compiler
creates a Mentor Eddm Anfrrhhce h m the VHDL code which can be simulated in
Quicksim for simulation and read into Autologic for syathesis.
CHAPTER 4 - Implementatim of Pu- Neural Netwodm in Xilinx FPGAs
4.3.4 Synthwis and Optimization
After compiling the VHDL files. the Mentor Eddm database is read into
Autologic and synthesized to the Xillnx XC4000 FPGAs. The NeoCad Xillnx
XC4000 ïibrary is useâ and the target environment variables are set to commer-
cial deratiog fàctors.
The neuron/synapse elements and CA circuits in the low-lwel hierarchy are
written in R n level VHDL. This code is synthesized directly to XC4000 gates.
These circuits are synthesized separateiy. AU hierarchy implied in the VHDL
code at this level is fiattened to improve the area optim&îtion. The optimïzation
recipe in Autologic used is AREA(LOW) with an AREA REPORT. Since timing
optb&&ion is not available this is aii that is required at this level.
Symbo1s must be created for each low-level circuit so that they can be referenced
by the top-1wel hierarchical schematic. These symbols are automaticaily gener-
ated when the VHDL entities are compiled. To Save these symbois, they must be
opened within Design Architect and saved.
The top-level for this design is only the co~lxlection between the neuron/synapse
and CAS, which is deîmed fa a structural VNDL netllst in order to generate a
hierachical schernatic for the top-levei circuit in Autologici the following multiple
step process is used:
- ~p --
Rapid-Prototyping of ArtH-id Neural Networîœ 4 8
CHAPTER 4 - lmplementdon of Pulse-Code Neural Networûs in Xilinx FPGAs
Ail neurons/synapse uni- and CAS circuit must have a durnmy architec- tures.
The low-levei circuit must have been previousiy synthesized to gates and a symbol created to represent the circuits.
The top-level circuit should have a schem architecture which calls the dummy architecture of the low-level circuits. This was prevfously mentioned.
The schem architecture of the top-levd circuit is synthesizeû and optimized into a schematfc.
On the resuitinp schematic, the dummy components are replaced with sym- bols for the real circuits that h a . been previousiy synthesized.
Back in Autologic. the resulting schematic with real componets is re-opti- mized. The 1 / 0 ports and buf5ers are added to the final schematic.
4.3.5 Design Varification
TEST DRIVER
Be haviorai VHDL
TEST MOrnOR
Be havio ml VHDL
Ftgure 4.8: VHDL testbenches
Qt&k&m îs used to simulate the operation of the circuits in both VHDL and
XC4000 gate representations. To iùnctionally verify the design, VHDL test-
benches are creatd for the top-level circuits. After synthesis. the same test-
CHAPTER 4 - lmplementstion of Puise-Coôe Neural Networks in Xilinx FPGAs
benches are used to simuiate the gate-level circuit. The VHDL and a gate opera-
tion are compareci to make sure they match.
The VHDL test-benches are only a piece of behavioural VHDL to drhre the input.
If the design is compIex. then a piece of VHDL can also be written to monitor the
outputs and report if an error is seea. The achranttage to the VHDL driver/moni-
tor test-bench is that it can be used to test the gate-level models as well as the
VHDL. Figure 4.8 shows the connections of a VHDL testbench and how it can be
used to drive and monitor the operation of a circuit block. For pulse stream neu-
ral network design, the test-driver provides the input example to the network,
and the test-monitor uses a number of up counters to monitor the pulse density
of the output neurons.
4.3.6 Neocad FPGA Foundry
The Neocad foundry tools are used to map the Xfllnx XC4000 gates into a
physical array. The NeoCad tools accept data fkom Mentor Graphics in the form
of EDE 2 O O netlist. Mapsh. mapping tool. maps the EDIF file into the specific
Xilinx part and package. performs the design rule checking. and generates a
NeoCad database file for place and route. Parsh. place and route tool, performs
the place the and route of the FPGA. Tricesh is then used to d y s i z e the timing
of the design. The layout related timing information is back-annotation to Men-
tor Graphics design verifkation tools. Finally. Neocad can -te a bitstream file
which is useâ to physicaiîy program the Xillnx FFGA chip
CHAPTER 4 - Impîementaüon of PulmSoôe Neural Natworîm in Xilirix FPGAs
4.4 FPGAs Design Examples
Three neural networks example problems are implemented into Xillnx
FPGAs: the XOR the encoder and the cheque character recognition. The fmst
two examples are implementd onto Xilinx XC4010 PG -6 FPGAs. Appendix B
provides a qui& refkrence of different Xilinx 4000 series FFGAs that are avaiia-
ble. The XC4010 part has 4000 CLBs equivalent to 10.000 gates, Therefore. it is
able to accommodate a simiificantiy large desig~l in a single FPGA. The speed
grade of this part is 6 which meam there fs a 613s delay of each CLB. The last
problem is implemented on a Xillnx XC4013 FPGA. In this section. the simula-
tion of the XOR pmb1em will be discussed and the area and timing of al1 three
probkms wiU also be examineci.
4.4.7 XOR Problem
The XOR pmblem, as mentioned in the chapter 3. is implemented into
XilLnx XC4û10 FPGAs. The networks to solve this problem consists of two input
neurons. tnm hidden neurons and output neuron.
A top-level testbench was created to v& the VHDL mode1 and the gate level
representation, This testbench presented inputs to the nettktork and monitored
the output pulse density using and up-counter. Figure 4.9 shows the simulation
result of the network. The curve in the figure represents the value of the coun-
ter in the re-randomizer of the output neUron. The initial value of re-randomiz-
ers has a 'prechargingœ value, 1/2 of the maximum couter value. With each
-
CHAPTER 4 - Implamentatîon of Pol- Neural Networks in Xilinx FPGAs
presentation of an input vector the value of re-randomizer is reset to 'precharg-
hg" value. n i e simulation result has proved that the design is functioning as
expected,
Figure 4.9: XOR simulation results
The complete layout of the design is shown in Figure 4.10. The design required
69 CLBs and 5 IOBs. It has the interesting result that there are two clusters of
CLBs in the layout. The smaiï duster of the CLBs is the output neuron and the
larger one is the two hidden neurons. It only takes 23 CLBs per neuron in this
design.
Neocad's timing @sis tools. Trcesh, is useâ to perform the static oming anal-
ysis. Figure 4.1 1 shows the timing report of the XOR design. It details the maxî-
mum dehy path of the design. This report identifies both logic delays and route
CHAPTER 4 - Impkmmtatfon of PuiseCoch Neural Networks in Xilinx FPGAs - --
delays. The 'R' next to the delay entxy indicates a delay based on rising edge tlm-
ing of signals through a 106 or CLB. The maximum delay path of this design is
171.585ns. The report also shows a breakdown of the percentage of delay attrib-
uted to logic vs. routing delays. in this case 53.6Oh is logic delay and 46.4Oh is
routing delay. The total delay of the design is l?l.585ns logic and routing delay
plus 8.011s setup, which is 179.585ns. Therefore the maximum operating fke-
quency of the XOR des- is 5.56ûMHz.
Figure 4.10: The FPGA layout of XOR problem
RapWPrototyplng of A M k l a I Neural Naworim 53
CHAPTER 4 - Impiementaion of Puke-Code Neuml Networb in Xilinx FPGAs
Figure 4-11: The timing report of the XOR FPGA design
CHAPTER 4 - Impkmmtation of PulseCoâe Neural Networks in Xilinx FPGAs
4.4.8 Encoder Problem
r
Figure 4.12: The FPGA layout of 54-5 encoder
The other design example is an encoder. The network has five inputs. four hid-
den and &le output neurons. This network is s d e r than the one dfscussed in
chapter 3. but it is the same class of problem. The FPGA layout of the encoder
design is shown in Figure 4.12. There are 215 CLBs and 12 IOBs used. The total
Rapid-Prototyping of ArtMcla1 Neural Netwodcs
logic and routing deky is 249.346ns with 44.9% h m logic and 55.I0h from
routing. The routing deïay in this design has higher percentage than the XOR as
it has more synaptic connections ktween the layers. The maximum opera-
fiequency Ls 3.886- The average number of CLBs per neurons are 23.89. The
design has more synapses per neuron than the XOR but this average is stiJl very
dose to that of the XOR As aU the synapses are combinational logic the place
and mute is able to effidently pack them into the CLBs.
4.4.9 Cheque Character Recognition
This prob1em was âiscussed in chapter 3 and the network has 25 inputs.
6 hidden and 10 output neurons. This design is impllemented into a Xillnx
XC4013 FPGA It used 506 CLBs and 37 IOBs and the average number of CLBs
per newons is 31.63. The maximum dela. path of this design is 340-09 with
32.6% logic and 67.4% muting. The maxîmum operating hquency is
2.873MWz.
CHAPTER 4 - lmplementation of Pulse-Code Neural Networks in Xilinx FPGAs
Figure 4.13: The FPGA layout of cheque character recognition
Rapid-Pmtotyping of Artifkial Neural Networlcs SI
4.5 Summary
This chapter has presented a hardware implementation of muiti-layer
neural networks using pulse-code m e t i c * The design of the networks is hier-
archically organized so that they resuits in more optimïzed circuity during logic
synthesis and opümïzation. The networks are dfvided into neuron/synapse
units and random number generators. This chapter aiso discussed a re-rand-
omizer for the neuron output in order to prevent conrrelation between the neu-
ron puise streams,
A top-dm design flm for constructing these networks has b e e ~ discussed.
This flow progresses h m a high-levcl network description to the generation of a
Xilinx bit stream for progr,Immfr,g the Xillnx FPGA device. The use of a VHDL
testbench for design verification was aiso discussed, FoUawing this. three exam-
ple problems were implemented on Xillnx 4000 series FPGAs. The Lmplementa-
tion resuits showed that the network are exfremeïy compact and use only 23
CLBs per neuron/synapse unit for XOR problem.
Chapter 5
Conclusions and Future Work
This thesis has demonstrateci the implementation of puise-code neural
nehmrks in Xilinx FPGAs. The hambare reQuiremnts of these neworks was
shown to be rninimni; oniy simple digital gates were required to perform the
arithmetic. Also. the use of the backpmpagation learnlng algorithm for training.
as well the simulation of these networks was discussed- The simulation results
suggested that the weights in a neural network must be very srnail to prevent
the neuon fkom constantly saturathg and multi-layers networks shouid be
used in order to wercome the fan-in iimitation of the nemon.
The harâware architecture of these networks was described as well the top-down
design flow for implementing these networks in Xillnx FPGAs. ln addition. tcRo
design examp1es. the XOR and encoder. were implemented and examinecl. The
implementation results have shown that the average number of CLBs per neu-
ron/synapse unit was only 23 for XOR problem. nie increase in the number of
the synapse connections of the neuron did not signiûcany contribute to this
hardware cost As a result a si&nificantiy large network can be implement on a
single XIllnx FPGA For example. approximateiy 44 neuron/synapse units couid
be implemented on a Xfllnx XC4025 part. the largest Xillnx 4000 series part
which has 1024 CLBs on a single chip. With the aid of the c u i ~ e n t state of art
C A . tools. the design cycle took only days to complete as opposed to traditional
design methodology of weeks or months.
Continued work in this area shouid fnvestigate the use of time mdtiplexixg to
m e r increase the number of neuons per device. The idea is to have one single
physical layer of neurons and re-use them for different layers emuiating multiple
layers of neUronS.
The use of multiple FPGA environments for implementing these ne-orks shouid
also be investigated. in this case, very large s d e neural networks can imple-
mented for prototying. As weiï, other FPGAs device should be considered. One
potential candidate is the new Xillnx 8000 series which is a sea of gate
architecture. Since neural network structures are highly regular with ïittle glo-
bal wiring* the basic architecture is simiIar to the architecture of the XC8000
serfes FPGAs, therefore better utiiization of the FPGA can be achieved.
A sophisticated high levd interface that compiles a given neural architecture
directly to a single FPGA or multi-FPGA based hardware system should be dwd-
CHAP'ER 5 - Conclusions and Future Worlc
oped. Howwer, the logic synthesis toois could be a problem for such systems as
the cument logic synthesis tools do not work well when the design exceeds 3000
gates. F d e r work should be done on partitionhg the networks into srnafi
pieces for logic synthesis as wd as on the use the Mentor Graphies' design man-
agement software, WorWCpert. to automate the des- flow and design capture.
Appendix A
Targeting VHDL Design to Xilinx FPGAs
As the density and complerdty of Xillnx FPGA designs increase to 20,000
gates and beyonci, the traditional schematic capture design entry is often cum-
bersorne. The use of hardware description kmguqes (HDLs). such as VHDL and
Verilog HDL. can raise designer pducthrity. Hfgh-level languages combineci
with logic synthesis c a . prwide a consistent design methodology across a range
of technologies. By raishg the level of design abstraction. synthesis tools can
increase pducavity. uisuring m r - h e gate level reaiizations and fkeeing
designers for more creative tasks, However- the designer should not ease up on
harâwam imp1ementation consideration when synthesis tool aids are a m b l e .
The methods for des- ASICs do not aiways apply to des- with X i h x
FPGAs, ASICs have more gates and routing resources than Xilinx FPGAs. Since
ASICs have a large number of avaiîable resowces. the designer can easily create
inefficient code that resuits in a lage number of gates. When designing with Xil-
inx FPGAs, the designer m u t create efficient code.
APPENDIX A - Targsting WDL Oaaign to Xilinx F PGAs
The VHSIC Hardware Description Language (VHDL) is a language for designhg
integrated Circuits (ICs). which can descilx the designs at the behaviour and/or
structure level. VHDL des- can be behaviouraily simulatecl and tested to be
fiulctionaily correct before synthesis. However many VHDL constructs are not
supported by synthesis tools. In general. only a subset of VHDL constructs,
called Register T m f e r Leuel WïL) cons.truccts. are accepted by the synthesis
tools. in addition. systhesis tooîs intreprete the VHDL code differeniy when tar-
geting different technologies. The foliowing guidelines ensure VHDL code that
takes the best advantages of)(illnx's resources and pniduces the same function-
aïity after synthesis.
A.l Wait for XX ns Statement
Wait for XX ns statements specifies the number of nanoseconds that must
pass before a condition is executed. This statement does not synthesize to a
component- In designs that iaclude this statement.. the fiinctionalilty of the sim-
ulated design does not match the fùnctionallty of the synthesized design.
A.2 After XX ns Statement
After XX ns statement is usually uoed as a condition of a signal assign-
ment. This statement is usually ignored by the synthesis twl. An example of tbis
statement Is:
Q <=O after xx ns
APPENDIX A - Targeîhg VHDL Design to Xilinx FPGAs
A.3 Initial Values
Assigning signais and variables initiai values are ignored by most synthe-
sis tools. The functionality of the simuiated design may not match the fùnction-
ality of the synthesized design. For example, do not use initiaïization statements
such as the following:
variable SUM: TNTEGER:=O
A.4 Order and Group Arithmetic Functions
The ordering and grouping of arithmetic fiinctions influence design per-
formance. For example, the following two statements are not equkvalent:
ADD <= Al + A2 +A3 +A4 ; ADD <= [Al + A2) + (A3 + A4);
The first statement cascades three adders in series. The second statement cre-
ates *O adders in parailel: Al + A2 and A3 + A4. h the second statement, the
two additions are evaiuated in paraiiel and the results are combined with a third
adder. RTL simulation results are the same for both statements, however. the
second statement resuits in a faster circuit after synthesis.
A.5 Xillnw Name Conventions
XLllnx has reseme names for their FPGA. The following FPGA resource
names are r e s d and should not be used to name nets or components:
- - - - -
APPENDIX A - Targeting VHDL Design to Xilinx FPGAs
Configurable Logic Blocks (CLBs). input/Output BIocks [IOBs), clock buffers.
tristate bufEer (BUlTs). oscillators, package pin names. CCLK. DP. GND.
VCC, and RST
CLB names such as AA, AB, and RE2
Prfmitive names such as TDO, BSCAN, MO. M l . M2. or STARTUP
Do not use pin names such as Pl and P2 for component names
Do not use pad names such as PADl for component names
For f/urther XL1Lnx. naming conventions, XLUnx Data Books [2] provide a more
detailed references.
A.6 Latches and Registers
VHDL compflers Infer latches h m incomplete specffications of condi-
tional expressions. Latch primitives are not available in XC40ûû CLBs, howwer,
the IOBs contain input latches. Latches described in VHDL are implemented
with gates in the CLB fhction generators. For example. the D latch shown in
Figure Al is implemented with one functioa generator. The D Latch imple-
mented with gates is shown in Figure A2.
APPElYDlX A - Targaing VHDL Wign to Xilfnx FPGAs
LBRARY mgc~mctabIe : USE mgcgoctablcqsim,Iogicall; ENTïTY d-htch IS
FORT ( GATE DATA: in qsirn-state: Q: out qsh-state):
end casecaseex:
ARCHITECTURE BEHAV OF d-latch CS begin
LATCH- process (GATE. DATA)
begin if (GATE = '1') thcn Q <= DATA-,
end if; end pnictss: -End LATCH
end BEHAV;
Figure Ami : Latch inference
DATA
GATE
Figure AmZ: Latch implemented with gates
In this example. the VHDL code contaias an IF statement without the ELSE
which ahays implies a latch in gate-ld npreseatation. The drawback of a
latch is that it îs implemented as a combinatorial feedback loop in a CLB and
APPENDIX A - Targeüng VHDL Design to Xilinx FPGAs
synthesis bols do not process hold-time requirements because of the uncer-
tainty of routing delays. In order to diminate unnecessary latches. it is desirable
to replace them with D regîsters. as each CLB has two D flip-flops. To convert a
latch to a D register uses an ELSE clause in the ïF statement or a WAIT UNTïL
statement.
In ai l other cases (such as latches with reset/set or enable). use the D £Np-flop
instead of a latch. This rule also applies to JK and SR flip-flops. Table k 1 pro-
vides a cornparison of area and speed for a D iatch implemented with gates and
a D fiip-flop.
table A.l: D latch implementation corn parison
D Flip-Flop
Requires change to VHDL to convert D latches to D Bip-flops, No hold time or combinatorial 1 0 0 ~
-
Advantaged Disadvantages
1 Ares 1 1 Function Gellerator 1 1 Register
D Latch
VHDL that infers D latch imple- mented with gates. Combinato- riai feed-back lmp results in hold-time requirement.
~- -
1 Logic Level; no combinatorial Combinatorial feedback loop
A.7 lmplementing Multiplexers with Tristate Butters
A 440-1 multiplexer is efficient@ implemented in a single XC4000 Cm.
The six input signals (four inputs, two select Unes) use the F. G, and H fûnction
generators. Multiplexers that are larger than 440-1 exceed the capacity of one
- - - -
APPENDIX A - Targeting VHDL Design to Xilinx FPGh
CLB. For example a 16-to-1 multiplexer requires fie CLBs and has two logic lev-
elS. These addltional CLSs increase area and delay. in order to utiiize! XC4000
resources. using tristate b d i (BUETs) is recommended to implement multi-
plexers larger than 4to- 1.
A VHDL design of a 5-to-1 muîtipiexer built with gates is shown in Figure A3.
TypicaUy. the gate version of this rnultip1exer has binary encoded selector inputs
and requires three! select inputs (SELd:O>). The schematic representation of
this design is shown in Figure A3
LJBRARY mgcpnable ; USE rngcqoctable.qsim,lo~caii ; EiNTïW m u s IS
PORT ( sel: in qsimqsimscatc-vcctor(2 downto O); A.B.C,D,E: in qsim-Statt; MUXMUXOUT: out qsim-statt);
end mux-gatc:
SELPROCIESS: proctss (SEL.AB.UW b e n
case SEL is wben "000" => MUX-OUT (= A; when "001" => MUX-OUTG B; when "010" => MUX-OUTeC; when"Ol1" => MUX-OUT-D: whenothcn => MüX-OUTOE;
end case; end proass; -Ead SEL-PROCESS
end B E W .
Figure A.3: lmplementing 540-1 MUX with gates
APPENOIX A - Taqeting VHOL Wign to Xilinr WGAs
Figure A.4: 5-to-1 MUX implemented with gates
LIBRARY mgcnrtable ; USE rngc~rtableqsim,logicaü ; ENïTïY muxUXtbufIS
PORT ( sels in qsimsimstate-vector(4 dodownto O); A.B.CD.E in qsim-statc; hinTXhinTXOUT: wt qsimsimStateState~lvcdvcdlc):
end mux-tblb;
AR- BEHAV OF mux-buf IS b e n
MUXMUXOüT c= A whca (SEL(O)='O') e k TT; MUX-OUT c= B whcn (SEL(l)='û') e k 2'; MUX-OUT c= C when (SEL(2)='0 e k Z; MCrXMCrXOüT c D when (SEL(3)='0') e k 2': MUXMUXOüT c= E when (SEL.(4)='0? e k Z:
end B E W .
Figure 11.5: lmplementing 540-1 MUX with BUFTs
APPENMX A - Tafgeting VHDL üesign to Xillnx FPGAs
The VHDL design shown in Figure A5 is a 540-1 multiplexer buiît with tristate
bu8Cers. The -tate b s e r version of the multiplexer has one-hot encoded sdec-
tor inputs and reqrrires five select inputs SEk4:Ox The schematic representa-
tlon of this design is shown in Figure A6.
Figure A.6: 5-1 0-1 MUX implemented with BUFTs
Appendix 6
Xilinx Device Quick References
Tabk 6.1 : Xillnx ûevices, Packages and Speed Grades
Device
XC4a)2A
Packages
PCW PQlûûPG120
Speed Grades
-5 -6
Ref erences
B. Gilbert, "A High-Performance Monolithic Muitipkr Using Active Feedback," IEEE J. Solid-Srate CircwWts, vol. SC-9, pp* 364-373, 1974.
Xilinx Inc., ï?ze Xilinx Data Book, 1994-
W. S. McCuiIoch and W. Pitts, "A L o g i d Calcuius of The Ideas immanent in Ner- vous Activity," Bde tm of Mathematical Biophysics 5, pp. 1 15-133,1943-
F. Rosenblatt, "Pnnciples of Neuzodynamics," New York Spartan Books, 1959.
M. Minsky and S. Papext, "Perceptrons: An Introduction to Computationai Geom- etry," Cambridge, MA: The MZT Press, 1969.
DI Rumeihart, G. Hinton and R Wiams , "Learning Internai Representation by Backpropagating Errors," Nature: 323, pp. 533-536, 1986.
B. R Gaines, "Stochastic Computing Sysîems," AdVances in infonnotr*on System Science, volume-2, Juiius T. Tou, editor, Plenum Press, 1969.
PI Mars, Stochastic and Detenninisnlc Averuging Processors, The Institution of Electrical Engineers, London and New Yotk, 198 1.
S.W. Golomb, '"Shift Register Sequences", Holden-Day Publishing Co.. San Fran-
ciso, 1982.
P. Hortensius, R- McLeod, B. Podaima, "Ceiidar Automata Circuits for Built-ln Self-Test", i23M Journi of Research and Developmenr vol 34, March, 1990.
P. Hortensius, "'ParaUel Computation of Non-deterrninistic Aigorithms in VLSL" Ph.D. thesis, Department of Eiectncal and Computer Engineering, Universiy of
Manitoba, 1987- -
F. Breglez C. Gloster, and G. Kedem. "Hardware-based Weighted Random Pat- tern Generation for Boundary Scan," IEEE International Test Conference, Aug 1989.
J. Tomberg, T. Ritoniemi, K. Kaski and H. Tenhunen, "FuU Digital Neural Netwok
Impiementatïon Based on Puise Density Modulation," Proc. IEEE Cctstonr M e -
grated Circuits Con$, (San Diego, CA: May 15-17), pp. 11.7.1-12.7.4. 1989.
I. Tomberg and K. Kaski, "Pulse-densiy Modulation Technique in VLSI Imple-
mentation of Neural Network Algorithms," IEEE Journal of Solid Srare Circuits,
25(2), pp. 1277- 1286. Oct. 1990.
LM. Tocniinson Jr., LM. Waiker and LM. Sïivilon, "A Digital 'ieurai Nenvork Archi-
tecture for VLSI," Proc. IJCM-90. pp. 545-550, San Diego, CA, 1990.
I- Dickson, R ~McLeod and H. Card, '5 tochastic Arithmetic Implementations of
Neusai Networks with In Situ Learniag," IEEE International Conference on Nertml
Nenvorks. (San Francisco. CA: Mar. 28-Apr. 1). pp. 7 1 1-7 16. 1993.
Drew van Camp, Evan E. Steeg. and Tony Plate. XERIONNeurcd Network Simrdcr- tor. Computer Science Department, University of Toronto, 199 1.
I. Dickson. "S tochastic An thmetic hplementation of -cial Neural Networks,"
MSc. Ihesis. Department of Electrical Enginee~g, University of LMantobe 1992.
Rapid-Prototyping of Artincial Neural Networks 73
D. RumeIhart, 1. McCLeiiand and PDP Research Group, ParaUel Disrributed Pro-
cessVrg Volume 1, The h4ïï Press, 1986.
Y. C. Kim, and M. Shanblatt, "Random Noise Enects in Rilse-Moée Digital Md- tiIayer Neural Networks". IEEE TrmacttOns on Neural NetworkF Vol 6, Nu. 1, lanuaq 1995.
Mentor Graphies, Bold Browser, 1995.
Neocad Inc, Neocad FPGA Foundry Tuorial, L994