[IEEE 2005 12th IEEE International Conference on Electronics, Circuits and Systems - (ICECS 2005) -...

TOWARDS APPLICATION OF NEURAL NETWORKS FOR OPTIMALSTRUCTURAL SYNTHESIS OF DISTRIBUTED DATABASE SYSTEMS

Edward Babkin1, Margarita Karpunina2

1State University - Higher School of Economics (Nizhniy Novgorod Branch), Dept. ofInformation Systems & Technologies, Russia, Nizhniy Novgorod, B. Petchorskay 25/12, 603155

[email protected] (contact author). Tel:+7(8312)1695492Nizhniy Novgorod State Technical University, Faculty of Information Systems & Technologies,

Dept. of Applied Mathematics, Russia, Nizhniy Novgorod, Minina 24, 603024. [email protected]

ABSTRACT

Although Distributed Database Management Systemsincrease usefulness of sensors networks, proper databasedesign requires solution of various complex optimizationproblems. Synthesis of optimal logical structure belongsto the class of the most important database designproblems, and practice recommends using heuristicoptimization methods, such as utilization of the paradigmof artificial neural networks. This article describes a newapproach to representation of a procedure of logicaldatabase structure synthesis in terms of Hopfield neuralnetworks. Development of the energy function,modification of the stable state analysis, and some detailsof software implementation are illustrated by a simpleexample, but the same approach can be generalized tomore complex situations.

1. INTRODUCTION

Ongoing developments in distributed peer-to-peersystems and sensors networks necessitate a thoroughstudy of architectural and operational issues of distributeddata processing in presence of dynamical changes ofnetwork topology and of operation conditions [1,2]. Inthe framework of such systems Distributed Databases(DDB) play an important role because they facilitateparallelization of query processing, increase dataavailability and fault-tolerance. Recruitment of DDBarchitecture gives architects and developers indisputablebenefits but at the same time this approach requiressolution of various complex optimization problemsarising during synthesis ofoptimal DDB structures.

In particular, the synthesis of optimum DDB logicalstructure is one of the central problems of DDB design.Solution of this problem gives an optimal composition ofmultiple data elements into several logical record typeswith subsequent allocation of the record types to certainnetwork hosts. Mathematical formulation of this problemincludes multiple criteria, such as minimum total time ofconsecutive single queries processing, minimum totaltime of consecutive transactions processing, minimum

operational costs during processing user's queries, etc.Being fully specified this problem belongs to nonlinearproblems of integer programming, and it is concerned tobe NP-complete.

The analysis of combinatorial features has alloweddeveloping a number of exact methods for decision ofthis problem [3]. Usually the exact methods are based ondynamic programming paradigm or on the scheme of"branches and bounds" (interesting results are presentedin [4]). However similar approaches to the decision ofNP-complete optimization problems are laborconsuming, difficult in implementation and inefficientfrom the point of view of execution time, if problem'sparameters are slightly modified. We would like alsoemphasize mismatch between distributed nature of DDBand the centralized character of "branch and bounds"algorithms. That's why the practice of DDB designrecommends using heuristic methods of designing theoptimal logical structures of DDB. The artificial neuralnetworks (ANN) approach [6] falls within this category.Whereas the use of ANN for pattern recognition andprediction problems is a non-linear extension ofconventional linear interpolation/extrapolation methods,application of ANN in the discrete optimization domainreally puts something new on the table. Rather largenumber of researches demonstrates the effectiveness ofapplication of ANN in this domain. Indeed, the neuralapproach is particularly transparent for graph bisectionproblem [5] due to its binary nature. Also results of [7]show capability of the Hopfield neural network to solvethe Traveling Salesman Optimization Problem (TSP),frequently arising in practice.

Although synthesis and optimization of DDB logicalstructure has many similarities with the mentioneddiscrete optimization problems, direct application ofknown ANN-based approaches is not suitable. So, weinvestigated a number of different types of ANN, andspecialization of Hopfield networks was found to be veryefficient for the discussed problem. In this article wedemonstrate main stages of our approach to developmentof Hopfield network algorithm for the simplified case ofthe problem of optimization of logical structure, but thesame approach can be generalized to more complexsituations.

2. THE MAIN IDEA OF THE APPROACH

2.1. Description of the problem

We consider a case of minimal DDB redundancy, soeach data element should be included only into onesynthesized logical record type. It means that in each rowof the resulting matrix only one non-zero element shouldexist:

(3)

OUTJ

Lqper 1Layer 0

NET· = L w·, . OUT. +/N·} i:t j 1J I }

"-v-"the weighed sum ofoutputs

from other neurons

Figure 1. Structure ofHopfield network. The dashed linesdesignate zero weights.

OUTj = 1, ifNETj > Tj ;

OUTj = 0, ifNETj < Tj

OUTj does not change, ifNETj = Tj .

In our task m = n2, where n - number of data

elements.

zero layer does not carry out computing function, andonly distributes outputs of the network back on inputs.The output of each neuron is connected to inputs of allothers neurons. Each neuron of the first layer calculatesthe weighed sum of its inputs, giving a signal NET, whichthen with the help of nonlinear function F will betransformed to a signal OUT. The submission of entrancevectors is carried out through separate neurons inputs.

If F - a threshold function with a threshold 7), theoutput ofjth neuron is calculated as

h .fhT bl 1 Ra e • esu ts 0 t e synt eSlS.

~ 1 2 3 4 5data element1 1 0 0 0 02 -0 1 0 0 03 0 1 0 0 04 0 0 1 1 05 0 0 1 1 0

1F(x) = -X.NET where A~ 00 , (1)

l+ewhere NET - weighed sum of inputs.

In our approach each logical record type is describedby a line of n neurons. That of these n neurons (dataelements), which outputs are equal to 1, are included inthe given logic record type. It: for example, n = 5 (fivedata elements exist), the result of neural network workcan be represented in a form of the following table(Table 1).

Let's consider a case where only semantic contiguity ofDDB data elements is taken into account. So, the inputdata can be represented in form of a matrix of semanticcontiguity. This matrix A Z with binary elements hasdimensions n x n, where n - a number of data elements.The element at?ij is equal to 1 if data element i issemantically related with the data element j. The ANNalgorithm should find optimal composition of dataelements into some number of logical records types, Le.to determine splitting set of capacity n on some number(m ~ n) of subsets. The ANN consists of neurons in amode with the large steepness of the characteristic (Le.the transfer function of a network) is "almost threshold":

For the problem described we offer to use a Hopfieldnetwork, and its block diagram is given in a Fig. 1.

The network design differs slightly from the methodused in original Hopfield' s work and others, but it isequivalent to them from the functional point of view. The

toLXi! =1,'Vi=I,I,

t=1

to -capacity of the set of logical record types,

I - capacity of data elements set,

xi! = 1, if i-th data element is included in

t-th logical record type; xi! = 0, otherwise.

2.2. Specialization of Hopfield network

(2)

Now we need to express the energy function of theneural network considering terms and constraints of theproblem given. This energy function should satisfy twogeneral requirements:

1. should be small only for those solutions , whichhave only one non-zero element in each row of theresulting matrix;

2. should give preference to the decisions with suchcombination of data elements, that are deduced from thematrix A Z of semantic contiguity of data elements.

For representation of our task in the terms of neuralnetworks we supply each neuron with two indexescorresponding to certain data element and the logicalrecord type. For example, OUTxi = 1 means, that the dataelement x will be included to the i-th logic record type.All outputs of a network neuron OUTXi have a binarynature, Le. accept values from the set {O, I}. Given such

(5)du.

The equation _X_l = 0 is fair, as a state is steady,dt

We shall assume that in a steady state, when thenetwork converges to the fair decision, among all

elements in X1,X2 , ••• ,Xk and X rows only

OUTXi ,OUTx i ,... ,OUTx i and OUTXi are greater11 22 k k

than zero, while other elements are equal to zero. Thus, ina steady state we have the following equation ofmovement:

(8)

(7)

dUxi A_=-_0 I OUT .-Co[IIOUT.-dt 2 j*i Xl Y j Yl

B ( 2 )2-n]--o I OUT ·-a -2 y*x yl xy

_~o I [OUT .o2 o(OUT ._a2 )J.

2 y*x yl Xl yx

dUxi--=-Co[OUTxi+OUTxi +OUTx i + ... +~ 1I 22

B (2 )2+OUT . -n]-- 0 I -ax = O.Xklk 2 y*x Y

energy function must satisfy, so that valid solutions ofhigh quality can be always obtained.

In our work the adaptation of SSA-technology isapplied to define feasible domains and most acceptablevalues for coefficients B, C of the energy function. Theenergy function ofthe Hopfield network, designed for ourtask, looks like (4). Having taken partial derivative of thisenergy function on OUT xi, we receive the followingequation of movement:

where A? ={a~}, x, y =1, n given matrix of

semantic contiguity, A,B,C - some coefficients.

First and third sum items of the energy function (4)provide for satisfaction the first requirement: the first sumitem is equal to zero if each row contains no more thanone non-zero element, the third sum item is equal to zeroif the resulting matrix contains exactly n non-zeroelements. Simultaneously, the second requirement issatisfied with the help of addition of the second sum itemto the energy function E. At enough large values of Aand C, low-energy states will represent the allowablecomposition of data elements into logical record types,and the large values of B guarantee finding the mostpreferable solution.

The next step is establishing conformity between themembers of energy function (4) and members of theenergy function in the general form for feedbacknetworks (Le. Lyapunov function) (5).

1E = --' ~~LLWXi 1); ·OUTxi .OUTy}-

2 I 1 x Y '.J'J ,

-4LIy} 'OUTy} +~LTy} .OUTy} ,lY lY

AE=_oII I OUT. o OUT . +

2 x i j*i Xl Xl

+~ 0 II I OUT. o(OUT . _axy2 )2 + (4)2 i X y*x Xl yl

specification we propose to use the following energyfunction E that should be minimized:

Le. there is no dynamics of inputs uXi ' they do not

change, and, hence, derivative is equal to zero. Then wehave:

where E - artificial energy of a network, Wxi,yj

weight from an output of xi th neuron to an input of

yj th neuron, 0 UTXi - output of xi th neuron, I yj -

external input of yj neuron, Tyj - threshold of yj th

neuron.As a result the complete expression of our Hopfield

network dynamics (6) will be obtained:

B ( 2 )2C . n - - 0 L a = C 0 [OUT . +2 y:tx xy Xl

+OUTx.i + OUTx i + ... + OUTx i l·11 22 kk

(9)

The optimality of the decision obtained with the help ofHopfield feedback neural networks in the large degreedepends on values of coefficients A, B, C, used in theenergy function. In [7] a general method called the stablestate analysis (SSA) technique was developed todetermine constraints that the weights in the Hopfield

Let's divide both parts on C > 0 (in accordancewith the meaning ofthe energy function (4)) and obtain:

Under the assumption, made by us,

O~i+OUr:i +OUTXi +...+OUTXi >0 (if asII 22 k k

the neurons transfer function is the logistic function withdomain in interval (0; 1)). Hence,

NET. =-A· LOUT .+B· L (2.ae-1).

XI J*i X) y*x xy

oOUTyi-C·~. L OUTvlJ·+C.n.)*'y*x h

2.3. Using of SSA-technology

(6)

B ( 2 )2_0 I a2 y:tx xy

n - = OUTxi + OUTx i +C 11

+OUTx i + ... +OUTx i ·22 kk

(10)

~. L (a2 )2 ~. L (a2 )22 y;t:x xy 2 y;t:x xy (11)

n- >0 :=>C>-----C n

Having generalized the inequality (11), we obtain thefollowing restriction for coefficients of the energyfunction:

~.maxJ L (a~)2,\fX=1,n}2 lY;t:x (12)

C >------------n

The received inequality (12) for coefficients restrictstheir feasibility domain during solution ofour problem.

3. DEVELOPED SOFTWARE

Despite the reducing coefficient's feasibility domain,there are no regular ways of a priori definition of theseoptimal values. That's why our test-bed softwareimplementation applies principles of genetic algorithms toprovide a tuning "framework" for the core neural networkalgorithm (fig.2). The purpose of such framework isevolutionary selection of the best results (andcorresponding values of A, B, C) among multiplesolutions, produced with the help of neural networkalgorithm (in general, it is a local optimum). Wedeveloped special rules of population representation,suitable genetic operators and a feasibility function thatallow finding effectively a global optimum. In mostexperiments the global optimum is already presented inthe third population.

1. The increased speed of the convergence to thedecision.

2. Flexibility of our algorithm: including newconstraints requires changes in the suitability function,used by the genetic algorithm, and does not mention thestructure ofa neural network itsel£

The later advantage allowed us to start extending theANN energy function by new components. These newcomponents will consider such criteria as network hostscapacity. On the basis of the preliminary results webelieve that our approach can be generalized soon toinclude most of the relevant constraints of the modemDDB design.

Also we could mention briefly an effect of goodrobustness of the ANN-based algorithm. The exactoptimization methods require complete rerun wheneverthe operational conditions of DDB were changed. Incontrast, ANN is able to obtain quasi-optimal solutionswithout any overhead initialization costs. Neverthelesswe still suffer from poor performance results of our testbed software implementation on the stage of initializationof initial parameters. It is relatively slow with comparisonof the exact "branch and bounds" optimization algorithm.Analysis shows that the genetic framework is the mosttime-consuming component of our system. Wearelooking forward to successful resolution of theperformance problem and believe that application ofparallel computing will help us to reduce an operatingtime of the neural network algorithm for large problems:as individuals of a population in the genetic frameworkare independent from each other, the parallel process oftheir suitability calculation will be the effective decisionof the performance problem.

This work was supported by the RFBR grant 03-07-90225.

5. REFERENCESNeural Netwotk

4. DISCUSSION

Figure 2. The block diagram ofthe test-bed softwareimplementation.

Since Hopfield network does not require preliminarygeneration oftraining sets and training with a teacher, ourapproach facilitates development of actually autonomousand self-organized optimization components of DDB: allthe process of ANN initialization and calculation ofparameters can be done by an automatic manner. So, withcomparison of training with a teacher our approach hasthe following advantages:

[1] Z. Mao, C. Douligeris, A Distributed Database Architecturefor Global Roaming in Next-Generation Mobile Networks.IEEE/ACM Transactions on Networking, Vol. 12, No.1,February 2004.[2] S. R. Madden, M. 1. Franklin, Joseph M. Hellerstein, W.Hong. TinyDB: An Acquisitional Query Processing System forSensor Networks ACM Transactions on Database Systems, Vol.30, No.1, March 2005.[3] D. Kossmann. The State of the Art in Distributed QueryProcessing. ACM Computing Surveys, Vol. 32, No.4,December 2000.[4] V.V. Kulba, S.S. Kovalevskiy, S.A. Kosyachenko, v.a.Sirotyuck" Theoretical bases of designing optimum structuresof the distributed databases ". A series "Information of Russiaon a threshold ofXXI-th century". - M.: SINTEG, 1999,660 p.(in Russian).[5] Peterson C., Soderberg B., Artificial neural networks andcombinatorial optimization problems in Local Search inCombinatorial Optimization, eds. E.H.L. Aarts and lK. Lenstra,New York- John Wiley & Sons. 1997.[6] Arbib M.A. (ed.) Handbook of brain theory and neuralnetworks. 2ed. MIT, 2003.[7] G. Feng, C. Douligeris. Using Hopfield Networks to SolveTraveling Salesman Problems Based on Stable State AnalysisTechnique, IEEE-INNS-ENNS International Joint Conferenceon Neural Networks (IJCNN'OO), Vo1.6, 2000.

CUplt-data:the resultITBtrixof

datadimbuticnonl~ic

reccrds,graphIcs

Initial data:the matrix

ofasemanticocnti~ity

Date post:	12-Dec-2016
Category:	Documents
Upload:	margarita
View:	215 times
Download:	1 times

[IEEE 2005 12th IEEE International Conference on Electronics, Circuits and Systems - (ICECS 2005) -...

Documents