Back-Propagation on Horizontal-Vertical-Reconfigurable-Mesh

8/8/2019 Back-Propagation on Horizontal-Vertical-Reconfigurable-Mesh

http://slidepdf.com/reader/full/back-propagation-on-horizontal-vertical-reconfigurable-mesh 1/4

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2010, ISSN 2151-9617

HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/

WWW.JOURNALOFCOMPUTING.ORG 57

Back-Propagation on Horizontal-Vertical-Reconfigurable-Mesh

Seyed Abolfazl Mousavi, Ali Moeini, Mohammad Reza Salehnamadi

Abstract — The new adaptive approach demonstrates implementing Check-Boarding algorithm on HVRM at this article. First,

Back-Propagation algorithm is summarized; second computational model used in basic Check-Boarding algorithm is explained

and finally the Reconfigurable Mesh and new method are presented.

Index Terms —Neural Network; Reconfigurable Mesh; Check-Boarding; Back-Propagation; Hypercube; Reconfiguration; RTR

—————————— ——————————

1 INTRODUCTION

ack-Propagation is a learning algorithm embedded byseveral neural network implementations. It is per-formed by various trends to technologies from Gen-

eral Purpose Processor to ASIC, but the computational

model was interested during 1990s for executing a gener-al domain of program and taking advantages of paral-lelism. Check-Boarding is one of them that was mappedon hypercube. HVRM has some better properties to mapCheck-Boarding which is revised by this trend, at the end,a comparison is made between these two models by tak-ing into consideration with Scalability, Run-Time perfor-mance and Area occupied in each model.This observation is operated in theory and it needs someproof-of-concept that would be the future work.

2 NEURAL NETWORK

Neural Network is a collection of computational nodeswhich are joined altogether through a connection net-work. It should be trained to use by altering correspond-ing weight to each link under a learning algorithm.Meanwhile, network is formed. The first and foremost ofthis sort of algorithm is Back-Propagation [1] applied inMLP. If the number of nodes is equivalent in each layer,ANN will be uniform and the contrary is non-uniform.The maximum number of perceptron is assumed n, f, w,y, θ respectively denote activation function, weight, out-put and bias. For supporting all kinds of plots by consid-eration for the universal approximation theorem, itshould be nonlinear. The BP has 2 stages: first, from thebeginning to the end and second from the end to the be-

ginning. At the first, output and error are computed ac-cording to current state and next weights are adjustedfrom the last layer-by-layer to the first one by interven-tion of error. The adjustment might be performed per

pattern, per epoch or after a complete training age. Note:the training is iterative and a collection of training datacan be fed into more than one time by adhering to follow-ing formulas:

(1)(2)

(3)

(4)

3 COMPUTATIONAL MODEL

ASIC-like circuit has been designed by using Field Pro-grammable Gate Array (FPGA) or other logical arraysand a high-level language such as VHDL or VERILOG.Since the surface is just devoted to a special purpose, con-sumption of area is not optimized and broad ranges ofhardware explorations are lost. In opposite, there is a

fetch-decode-execute cycle with extra overhead in se-quential running regime to achieve General Purpose Pro-gramming. [2] These two trends of programming spec-trum encounter with each other in computational model.Hypercube is one of them defined recursively on basis ofReflected Gray Code by adding one extra bit inside left ofn-1 cube and link equivalent points between 0-left and 1-left cubes. Neighbor nodes just differ in one bit. Theshortest path between 2 nodes are determined by Ham-ming distance, hence the diameter of topology equals ton. Good availability of this model is achieved by cost inwires and excessive degree, n, in each node. Scalability ofmodel is limited because node number is .

4 RECONFIGURATION

While computational model and hardware is sharedamong some programs, deployment of hardware is moreeffective by mapping computational model on it. Pro-grams are advanced consecutively and all has its ownconfiguration and state; which results in commute cir-cumstance of logic array in every next enter. Thus, recon-figuration is an inevitable outcome.

Configuration of hardware is alternated step-by-step,statically and according to the source code. Run-Time-Reconfiguration, which is dynamic and abbreviated by

————————————————

S.A. Mousavi is with the Department of Computer Engineering, SouthTehran Branch, Islamic Azad University, Tehran, Iran.

A. Moeini is with Department of Algorithms and Computation, Faculty of Engineering, University of Tehran, Tehran, Iran.

M.R. Salehnamadi is with the Department of Computer Engineering,South Tehran Branch, Islamic Azad University, Tehran, Iran.

B






RTR, is invoked to evolve computation of a program bychanging the state of physical base at run-time. RTR isapplied by various levels, from coarse-grain to fine granu-larity.

Involving reconfiguration in programming or fine-grain manner is performed by attaching switches to astatic topology. Reconfigurable Mesh [3] is one of thoseevolutionary models whose PE is attached by localswitches. RM is an acronym for ‘Reconfigurable Mesh’and used afterward instead of the complete name. Meshis a simple, regular, popular and suitable shape to fit on alogical array like FPGA. It is classified by tile, wrap-around link and communication mode. The topology isformed by tile pattern and wrap-around link [4].

Data transportation through this network is alive withtwo different approaches: link-oriented and bus-oriented.By link adoption, message is sent in point-to-point man-ner. In contrast, ideal bus can convey entry to all destina-tions at constant time, but it is not actually accomplishedbecause of physical restriction, and a propagation delay isloaded on computational model. In sum up, the propaga-

tion delay is involved in features of computational modelby sewing bus onto topology. Delay is measured in eitherconstant or dependent on the number of connectedProcess Elements (PEs). At the moment, three types arereflected:

1. Constant-time, delay=O(1);2. Log-cost, delay=log(n) n refers to the number of

bus-connected PEs;3. Bend-cost, delay=bend number, number of bends

can be counted at the bus structure.PEs of RM can read/write on link (bus)/local memory,

altering wiring between local switches and arithmet-ic/logical operation. Link (bus) accessibility is as same as

PRAM memory (CRCW, CREW, ERCW, and EREW);therefore they are classified not only by link or bus-oriented but also by link or bus accessibility.

Interference of switch in model causes to have flexiblearrangement. 15 connection patterns are possible withinmesh. Legitimate patterns practically differ betweentypes. As a result, they are figured out as RM features.The more connection patterns exist, the more computa-tional power is achieved [3].

A lot of algorithms are implemented on different typesof RM but which one is efficient and uses reconfigurationin a prefect way? The important factor is scalability. Thealgorithmic scalability will be apparently endangered by

a lavish reconfiguration employment. Computationalmodel is chosen with consideration to algorithmic de-mands. Designers usually state their solutions on an un-restricted model. It means the size of model depends onalgorithm size; however, the smaller one exists in prac-tice. Every deprived processor is alternated by memoryand its function is overloaded on another processor. Thisproperty is reduced in dynamic model by frequent mod-ification of arrangement and should be evaluated on algo-rithm design by making a trade-off between reconfigura-tion employment and scalability. It is possible by a gener-al view at a reconfigurable model and taking the closer

view into algorithm mapping in the presence of high de-gree usage of reconfiguration. The general view catego-rizes reconfiguration into optimal, strong, weak and thecloser view measures scalability degree with the purposeof degenerating reconfiguration use. This measurement isdone on strong and weak models.

5 CHECK-BOARDING ON RECONFIGURABLE MESH Check-Boarding is a distribution of weights along hyper-cube and obtains multi-node broadcasting to gain betterperformance. Each layer is distributed on all PEs withoutany dependency on other layers. Every input weight cor-responding to column number is saved on row. The rowis responsible for an internal neural node processing. Da-ta is fed into layer from diagonal nodes by vertical broad-casting and after product of data and weight inside PEs;the results are computed by parallel horizontal broadcast-ing and located on diagonal nodes as inputs to next layer[5]. Major feature of Check-Boarding is its arrangementand multi-node broadcasting [6] is a hypercube benefit

used to achieve better performance. As a result, this ar-rangement is assumed as Check-Boarding. A contributionwas presented on RM [7] but it is designed regardless ofmodel scalability which is weak. In this section, an algo-rithm using Check-Boarding arrangement is described onCREW-Horizontal-Vertical-RM or CREW-HVRM that isbus-oriented. HVRM just has 2 legitimate connection pat-terns from North to South and East to West. The patternssupport bidirectional transportation. The limited patternscompel a sufficient broadcasting delay on bus, the mini-mum O(1) and the maximum O(log(n)); n is the numberof PEs along length or width of mesh. In addition, themodel has an optimal scalability.[3]

Check-Boarding is performed in forward and back-ward pass. Forward pass starts with entrance of datafrom I/O port to diagonal PEs and then inputs broadcastthrough columns and the result of summation is calcu-lated by traversal of a binary tree from leaves to root. Di-agonal nodes are roots. At the end, outputs are ready fornext layer. This process lasts until the results of neuralnetwork are worked out in last layer. The computing ofmargin between desirable neural network outputs andcomputed values is essential for running backward pass.Note that the desirable outputs enter from I/O port.Broadcasting along rows and then traversal binary treesalong columns are differences between backward andforward pass. However, these functions are in diversedimensions (First column broadcast, row addition, andfinally row broadcast in forward. First row broadcast,column addition and column broadcast in backward.), thestructures are the same:

1. Binary tree addition;2. Broadcasting.

The basic operations in contemporary algorithm whichwill be introduced are as below:








Fig. 3. Example of adding values in a row of 8-PE-HVRM. The label

corresponding to same long link depicts the phase number of paralleltransformation among nodes.

The forward and backward pass pseudo-codes are thesame. Forward is shown for evident. This pseudo is ap-plied in a uniform neural net having L layer with n nodesin each layer. First step is ColumnBroacast which involvessame structure as RowBroadcast, x is row and y is columnand is a node sending packet to all other same-column-nodes. Every PE involves weights stored in the float arrayPE.w from L layers. The outputs of nodes are duplicated inPEs by PE.y. The output is held during backward phase forlink adaption.

6 MISCELLANEOUS

Suppose that an HVRM with m to m elements is aimed tocontain an algorithm based on computational model by nto n PEs, n is greater than m. This situation happens inreal environment. In order to execute the program, thelocal data and functional duties are divided among ex-isted PEs. Duties instruct PEs including equivalent data tocomplete computation. As a result, the main fence to divi-sion is efficient distribution of data owned by computa-tional model’s processors, while the duties are automati-cally assigned. As mentioned at the beginning of section

V the HVRM has this quality. Therefore, an HVRM is ableto accept this situation and furthermore nonuniformneural nets. It results in the other benefit that is not neces-sary to include same width and length for an HVRM tomap this algorithm. Scalability is more improved in com-parison with hypercube; while an n-dimensional hyper-cube must be with elements, it doesn’t matter withHVRM that can be designed by different elements em-bedded in width and length.

Broadcasting is the main factor in computation of Run-Time performance. Multi-node broadcasting takes a timefrom and HVRM is observed at most with

. P is the number of PEs.

The area occupied by hypercube or HVRM is able to bea function of link number within models. The number oflinks is product of node degree to P. In hypercube, itequals to and in HVRM is less than .

7 CONCLUSION

In summary, there is a comparison between these twomodels in Table 1 representing HVRM includes muchbetter implementation of Check-Boarding rather thanhypercube, and of course, it needs some simulationswhich is the future work.

TABLE 1COMPARISON BETWEEN HYPERCUBE AND HVRM

Implemen-

tations

Factors

Scalability Run-Time Perfor-

mance(broadcasting) Area

Hypercube

It increases in

O( )Log(P) PLog(P)

HVRM O(P)

ACKNOWLEDGMENT

The authors hope readers boost the content by raisingnew issues in our mail box, and finally sending our spe-cial regards to S.Z. Mousavi for painting the figures.

REFERENCES

[1] Ben Krose and Patrick van der Smagt, "Chapter 2 Fundementals; Chapter 3Perceptron and Adaline; Chapter 4 Back-Propagation," in An Introduction toNeural Networks, 8th ed.: The University of Amsterdam, 1996,http://www.divshare.com/download/7105390-f51.

[2] Kiran Bondalapati and Viktor K. Prasanna, "Reconfigurable Computing: Architectures, Models and Algorithms,̋ Current Science, vol. 78, pp. 828--

837, 2000.

[3] Ramachandran Vaidyanathan and Jerry L. Trahan, Dynamic Reconfigura-tion: Architectures and Algorithms, 1st ed.: Springer, January 31, 2004.

[4] Behrooz Parhami, ʺ esh-Based Architectures; Low-Diameter Architectures,"in Introduction to Parallel Processing: Algorithms and Architectures , 1st ed.:Springer, January 31, 1999, pp. 169-340.

[5] Vipin Kumar, Shashi Shekhar, and Minesh B. Amin, ̋A Scalable ParallelFormulation of the Backpropagation Algorithm for Hypercubes and Related

Architectures,̋ IEEE Transactions on Parallel and Distributed Systems,vol. 5, no. 10, pp. 1073 - 1090, October 1994.

[6] Dimitri P. Bertsekas and John N. Tsitsiklis, ʺHypercube Mappings,̋ inParallel and Distributed Computation: Numerical Methods, 1st ed. Massachu-setts: Athena Scientific, 1997, ch. 1, pp. 50-65.

[7] Jing Fu Jenq and Wing Ning Li, ̋Artificial Neural Networks on Reconfigura-

ble Meshes,ʺ

in Parallel and Distributed Processing. Heidelberg: SpringerBerlin, 1998, vol. 1388, pp. 234-242, Workshop on Biologically InspiredSolutions to Parallel Processing Problems Albert Y. Zomaya, The Univer-sity of Western Australia Fikret Ercal, University of Missouri-Rolla Ste-phan Olariu, Old Dominion Univesity.

First S.A. Mousavi is graduated in M.S from Azad University and inbachelor from Ferdowsi University.Second A. Moeini is a teacher of Tehran University in ComputerScience.Third M.R. Salehnamadi is a teacher of Azad University in SoftwareEngineering.

Date post:	10-Apr-2018
Category:	Documents
Upload:	journal-of-computing
View:	218 times
Download:	0 times

Back-Propagation on Horizontal-Vertical-Reconfigurable-Mesh

Documents