+ All Categories
Home > Documents > Information storage capacity of incompletely connected associative memories

Information storage capacity of incompletely connected associative memories

Date post: 29-Jan-2023
Category:
Upload: calpoly
View: 0 times
Download: 0 times
Share this document with a friend
18
Transcript

Information Storage CapacityofIncompletely Connected Associative MemoriesHolger BoschD�epartement de Math�ematiques et d'Informatique�Ecole Normale Sup�erieure de LyonLyon, FranceFranz KurfessDepartment of Computer ScienceNew Jersey Institute of TechnologyNewark, New Jerseysubmitted for publication to \Neural Networks", Oct. 1995AbstractThe memory capacities for auto- and hetero-associative incom-pletely connected memories are calculated. First, the capacity is com-puted for �xed parameters of the system. Optimization yields a max-imum capacity between 0.53 and 0.69 for hetero-association and halfof it for auto-association, improving previous results. The maximumcapacity requires sparse input and output patterns and grows withincreasing connectivity of the memory. Furthermore, parameters canbe chosen in such a way that the information content per patternasymptotically approaches 1.1

1 IntroductionTasks like voice or face recognition are quite di�cult to realize with con-ventional computer systems, even for the most powerful of them. On thecontrary, these tasks are easily performed by the human brain despite itshighly inferior computation speed. This indicates that it is rather a con-ceptual problem of the conventional computer model and one could ask foralternatives.An important di�erence to the human brain is that some of its func-tionality is based on an associative memory. The main advantages of thiskind of memory is its noise tolerance and the nearly constant retrieval time.Therefore it has been intensively studied in the last decades (Steinbuch 1961,Kohonen 1979, Palm 1980, Willshaw et al 1993) and some interesting appli-cations and theoretical results have been obtained. Most of the theoreticalmodels are fully connected memories, which can be applied to investigate theactual digital implementations of arti�cial associative memories.However, some future realizations like optical or analog associative mem-ories may not possess fully connected layers, due to physical properties ofthe implementations. Hence incompletely connected models should also beconsidered to supply a theoretical foundation for these memories.Another reason to study incompletely connected associative memories isthat the corresponding parts of the brain are not at all fully connected. Forexample, the neurons of some parts of the hippocampus are connected to notmore than 5% of the neurons in their neighborhood (Amaral et al 1990). Theobtained results from these investigations could improve the understanding ofthe brain, even if the considered models are far from neurobiological reality.A naturally arising question about a memory concerns its storage capac-ity. For fully connected memories, Palm (1980) calculated it for a simplemodel and found a maximum capacity of log(2) � 0:69 for the hetero-associative memory and log(2)=2 � 0:34 for the auto-associative memory.Furthermore, he mentioned a capacity of 0.26 for the incompletely connectedhetero-associative memory.In contrast to the fully connected memory there's no natural retrievalstrategy for the incompletely connected case. Buckingham and Willshaw(1993) investigated several possibilities and analyzed their performances. Ourcalculations are based on one of their retrieval strategies, for the capacity ofa memory depends on the retrieval technique used.2

We will recalculate the capacity for both, the hetero- and the auto-associative memory, and improve the value of 0.26 given in (Palm 1980).We obtain for the maximum capacity C of hetero-association an increas-ing function between 0.53 for low connectivity and 0.69 for full connectivity,con�rming Palm's result of C = log(2). Auto-association achieves maximumcapacities between 0.27 and 0.34.2 Description of the ModelThis section contains a description of the associative memories on which thefollowing calculations are based. The hetero-associative case is described indetail; the auto-associative memory discussed in the second section di�ersonly slightly from the hetero-associative memory.2.1 Hetero-associationThe memory considered here is similar to the completely connected modelused in (Palm 1980).The input patterns Xs and the output patterns Y s are binary vectors of�nite dimension, i.e. Xs 2 f0; 1gM and Y s 2 f0; 1gN . A total number ofR associations between input and output patterns should be stored in thememory, i.e. s = 1::R.The neural network used for the memory task consists of one input andone output layer of M respectively N neurons, binary weights wij 2 f0; 1gfor the existing synapses between the two layers and threshold functions �ifor the output units which are set individually during the retrieval phase.The two layers are incompletely connected and therefore the net can berepresented as a partially �lled matrix W 2 f0; 1; bgN�M in which the blankb stands for a missing matrix element.The memory is trained with a one{step Hebbian learning rule and thevalues for the existing weights wij are calculated by:wij = _s=1::R(Y si Xsj )To retrieve the response for an input pattern X �rst the vector Y 0 =WXis calculated where the sums in the matrix product are over the existing3

weights wij. To obtain the �nal response Y the threshold functions aredetermined and applied to Y 0:Yi = �i( MXj=1wijXj)The calculation of the memory capacity requires some further speci�ca-tions of the model. The number K of \ones" in the output pattern Y s is �xedfor all patterns in order to simplify the correction of the obtained response.To obtain a consistent model the number L of \ones" in the input patternY s is also �xed. This restriction can be replaced by a probability for theoccurrence of a "one" in an input unit, which gives roughly the same results.Furthermore the net contains a total of ZMN uniformly distributed con-nections between the input and the output layer where Z, 0 < Z � 1, is theconnectivity of the network.The threshold functions are set by using the activities of the input patternX (see Buckingham and Willshaw 1993). The individual activity Ai for anoutput unit Yi is de�ned as Ai =Xj Xjwhere the sum is over the existing synapses wij between input neuron iand output neuron j. Note that the activities Ai can range from 0 to L,depending on the distribution of the existing connections. The threshold forthe output unit Yi is set to �i = Ai which means that an output neuron �resif its dendritic sum equals the input activity Ai.Hence, if a learned input pattern Xs is applied to the net, the answercontains all the K \ones" of its answer Y s but may contain O additionalwrong \ones". In this point the net has the same behavior as Palm's model.The described retrieval strategy can be e�ciently implemented by switch-ing to the complement matrix W of W , i.e. wij = 1� wij:An output unit returns \one" if its dendritic sum equals its input activity,i.e. if the existing connections between the high input units and the outputunit are high. Switching to the complement matrix, this is equivalent tocorresponding low connections, i.e. if its dendritic sum equals zero.Yi = 1�_j wijXj4

Therefore no additional calculations are required to retrieve the outputvector Y .2.2 Auto-associationThe same model as for the hetero-association is used, except for the followingnatural modi�cations:The dimensions of the in- and output vectors are the same, i.e. N = Mand both vectors contain the same number of \ones", i.e. K = L.Note that the obtained matrix is symmetrical and therefore only roughly halfof it is needed, which explains the lower capacity for auto-association.Having speci�ed the associative memories as targets of our investigations,we can now proceed to calculate the memory capacities.3 Capacity of the Hetero-associative Mem-oryFirst, the capacity is calculated for a given memory. In a next step, thiscapacity is optimized with respect to the connectivity Z. The obtained valuesare then numerically veri�ed.3.1 Theoretical Results for Fixed ParametersThe following calculations are partly inspired by the calculations for the com-pletely connected model from (Palm 1980). First, the expectation value ofthe information stored in the memory is determined. The de�nition of infor-mation is based on the corresponding expression of Shannon's InformationTheory.The total information I is the sum of the individual informations storedper output pattern Is: I = RXs=1 IsIs is the di�erence between the information contained in the output patternY s and the information needed to determine this pattern given the memory's5

response to Xs: Is = ld NK !� ld O +KK !Therefore we obtain for the expectation value of the total informationE(I) = R E "ld NK !� ld O +KK !#= R E K�1Xi=0 ld N � iK +O � i!= RK�1Xi=0 E�ld N � iK +O � i�Using the general inequation1 E(ldA) � ldE(A) a lower bound can be derived:E(I) � R K�1Xi=0 ld N � iK + E(O)� i (1)The expectation number of the wrong \ones" E(O) in the response needs to bedetermined next. E(O) depends highly on the distributions of \ones" in theconnection matrix. Unfortunately the occurrences of \ones" in the same rowor column are not independent. But for sparse2 input patterns and a largenumber of stored patterns the dependence is very weak. Since the maximumcapacity is reached for this choice of parameters, the independence of theoccurrences is assumed for this calculation. For a more detailed discussionsee (Palm 1980).The expectation number E(O) of wrong \ones" is given byE(O) = (N �K)Pr ( (WX)i = AijYi = 0)(WX)i = Ai holds if and only if all existing connections between the outputand the high input units have high weights. Since independence for theweights is assumed we obtain:E(O) � (N �K)(1� Z Pr(wij = 0))L= (N �K) 1� Z �1� LKMN �R�1!L (2)1which derives from the algebraic geometric inequation2Here, sparse means that the input vector contains only a few \ones".6

which completes the calculation of the stored information.The expectation value for the memory capacity E(C) is then given byE(C) = E(I)ZMNfor the memory contains a total of ZMN storage units.3.2 Maximum CapacityIn order to maximize the capacity, some approximations are required.As (A� i)=(B � i) � A=B for A � B, and N � K + E(O), we obtain by (1)E(I) � RK ld NK + E(O)For sparse output patterns, i.e. i � K << N , this gives a tight approxima-tion. The substitution of E(O) by (2) yields:E(I) � RK ld NK + (N �K)�1� Z �1� LKMN �R�1�L� RK ld NK + (N �K) �1� Ze�R LKMN �L� RK ld NN �1� Ze�R LKMN �L (3)= RKL ld 11� Ze�R LKMNThe approximation in (3) overestimates the memory capacity, but thereal value will be close to the estimation if K << E(O). This requires sparseinput patterns and a certain loss of information, i.e. there are many morewrong than genuine \ones" in the answer. But if furthermore E(O) << N ,even a high percentage of the information can be stored (see Section 5).Now, writing R = p(Z)MNLK for the number of stored patterns, the expec-tation value of the total capacity is:E(C) � p(Z)ld 11� Ze�p(Z) (4)7

Z1.00.80.60.40.2

p(Z)1.00

0.95

0.90

0.85

0.80

0.75

0.70

Figure 1: Optimal values for p(Z) where R = p(Z)MNLK is the number ofstored patterns.Numerical optimization of E(C) in (4) yields values for p(Z) between 0.7 and1.0 (Figure 1). With these values we obtain a theoretical maximum capacitybetween 0.53 and 0.69 which depends only on the connectivity Z (Figure2). This is much higher than the previously estimated 0.26 (Palm 1980) andbrings the capacities of incompletely connected networks close to those ofcompletely connected memories.3.3 Numerical Veri�cationTo verify the obtained result the capacity has been optimized for M = N =100 (Figure 3) and M = N = 1000 (Figure 4), testing a wide range ofparameters. Both �gures contain the theoretical capacity from Figure 2, thee�ective capacity from the optimization and the capacity for K = L = 2 andR = p(Z)MNLK which is called capacity for estimated parameters.For M = N = 100, the theoretical and the real capacities are indistin-guishable whereas the capacity for estimated parameters is not yet optimal.For M = N = 1000 all three capacities are indistinguishable, con�rmingthe theoretical result.For the veri�cation, the formulas (1) and (2) were used even if they don'tgive the exact capacity, since the exact formula requires very time-intensive8

Z1.00.80.60.40.2

C0.680.660.640.620.600.580.560.54

Figure 2: Maximum capacity of the incompletely connected memory. Thecapacity C increases with the connectivity Z.

Z1.00.80.60.40.20.0

C0.680.660.640.620.600.580.560.54

Figure 3: For 100 neurons the theoretical and the real capacity are indis-tinguishable (upper curve) whereas the capacity for estimated parameters isnot yet optimal (lower curve). 9

Z1.00.80.60.40.20.0

C0.680.660.640.620.600.580.560.54

Figure 4: For 1000 neurons the theoretical, the real and the capacity forestimated parameters are indistinguishable.calculations. The obtained results were con�rmed by the exact formula forthe optimal parameters, showing in these cases only insigni�cantly di�erentcapacities.In this chapter the capacity for hetero-associative memories has beencalculated and then optimized for �xed connectivity. These theoretical valueshave been con�rmed by numerical evaluations of the exact formula.4 Capacity of the Auto-associative MemoryThe corresponding calculations are performed for the auto-associative mem-ory. The capacity is optimized in the same manner as for hetero-associationand is therefore only brie y outlined. Since the result is quite similar tohetero-association no numerical veri�cations were performed.The amount of information stored in the auto-associative memory is de-�ned in the same way as for the hetero-associative memory. Clearly thecorrection information per pattern must be de�ned di�erently, since the in-put pattern equals the output pattern. To retrieve a pattern the high unitsare determined successively, one at each step, starting with an \empty" vec-tor. A step consists of the application of the actual pattern to the memory10

and the determination of a further genuine \one" among the obtained highoutput units. The sum of the information of these determinations equals thecorrection information. Hence the information stored in the memory is givenby3 E(I) � R K�1Xi=0 ld N � iK + E(Oi)� iwhere E(Oi) is the expectation number of spurious \ones" at the ith step.Note that the applied input pattern at the ith step contains exactly i \ones".Similar to the hetero-association caseE(Oi) � (N �K)(1� Z Pr(wij = 0))i= (N �K)0@1� Z 1� K2N2!R�11Aiand comparable approximations yieldE(I) � RK�1Xi=0 ld0@ 11� Ze�RK2N2 1Ai= 12RK(K � 1)ld0@ 11� Ze�RK2N2 1Aand �nally, with R = p(Z)N2K2 we obtainE(C) � 12p(Z)ld 11� Ze�p(Z)Considering that only half of the symmetrical matrix is necessary the resultis the same as in the case of hetero-association and the same optimizationyields values between 0.26 and 0.34 for the maximum capacity.5 Information Content per PatternBesides the maximum capacity of an associative memory the informationstored per pattern is also of interest. The corresponding value will be calcu-lated and then optimized. Again, numerical veri�cations con�rm the theo-retical results.3see (Palm 1980) for the derivation of the formula11

Z0 10.80.60.40.20

c(Z)25

20

15

10

5

Figure 5: c(Z) logN \ones" in the input pattern are required to store almostthe entire information of the pattern.The information contained in a pattern isIp = ld NK ! � K ldNin the case of sparse output patterns. The condition RK ldN = E(I) guar-antees that roughly all information of the patterns is stored since RK ldNis the total information of the patterns. For the optimal informationE(I) = RKLld 11�Ze�p(Z) this provides the following condition for L:L � logNlog 11�Ze�p(Z) = c(Z) logN (5)where c(Z) is the function shown in Figure 5. Low connectivity requiresa larger number of input \ones" than high connectivity, for a certain level ofactivity must be present to obtain a high percentage of information storage.Based on the above value of L the average activity A = ZL is a decreasingfunction of connectivity, resulting in values between 2:7 logN and 1:4 logN .Remember the necessary condition K << E(O) in section 3.2 to obtain12

a high capacity. It can be modi�ed to:K << (N �K)(1� Ze�p(Z))L � N(1� Ze�p(Z))LApplying the logarithm yieldslogK < logN + L log(1� Ze�p(Z))which gives another condition for L:L < logN � logKlog 11�Ze�p(Z) = c(Z)(logN � logK) (6)Thus, comparing the conditions (5) and (6), high capacity and highpercentage of stored information are achieved if logK << logN andL = c(Z) logN << N . In particular for small connectivity this is reachedonly for very large N . It is striking that, neglecting logK in (6), the samefunction, i.e. c(Z) logN , is derived by two totally di�erent approaches.Figure 6 shows the maximum capacity and the corresponding fractionof stored information for a very large memory. The number of \ones " ischosen between c logN and c(logN � logK) which results in high capacitiesand high information contents. For L = c(Z)(logN � logK), maximumcapacity is given whereas L = c(Z) logN leads to an approximately completeinformation storage.The dependency of the capacity on the number of \ones" in the inputpattern is illustrated in Figure 7 for connectivity Z = 0:5. The number ofpatterns has been optimized to obtain maximum capacity. The increasingfunction represents the information content per pattern.Figure 8 shows the capacity and the information content for an increasingnumber of patterns R and connectivity Z = 0:5. The capacity increases �rstwith the number of pattern before it begins to decrease when too many \ones"in the matrix are stored. The information content is a steadily decreasingfunction as expected.The previous results show that high information content per pattern ispossible and can be realized together with high capacity for appropriatechoices of parameters. 13

Z0 10.80.60.40.20

1

0.9

0.8

0.7

0.6

0.5Figure 6: Maximum capacity (upper curve) and information content for N =M = 1020, K = logN , L = (Z)(logN � 12 logK and R = p(Z)MNLK .

L0 100806040200

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2Figure 7: Maximum capacity (increasing curve) and information content asfunction of L. The �xed parameters are N =M = 1000, K = 10 and Z = 0:5whereas the number of stored patterns is optimized. The capacity decreaseswith L whereas the information content increases with the number of \ones".14

R0 20,00010,00000

1

0.8

0.6

0.4

0.2

0Figure 8: Capacity and information content (decreasing function) for in-creasing numbers of patterns R. The �xed parameters are N = M = 1000,Z = 0:5, K = 10 � logN and L = 30 � c(0:5) logN .6 DiscussionThe memory capacities of incompletely connected associative memories havebeen investigated. The maximum capacities range from 0.53 to 0.69, depend-ing on the connectivity of the network, and can be obtained for sparse inputand output patterns and convenient numbers of stored associations. Addi-tionally, more restricted choices of parameters guarantee a high informationstorage per pattern which asymptotically approaches 1.The memory loses its stored information if too many weights are set to\one" in the matrix. Sparse input patterns achieve therefore better storagecapacities, since the information content does not increase with the numberof \ones" in the input patterns, whereas the number of \ones" in the matrix isincreased. On the other hand, a large number of \ones" in the input patternleads to high information content.Memories with low connectivity require more \ones" in the input patternsin order to achieve a high information content, since a minimum activity isneeded.The advantage of our model is that only spurious \ones" {and no \zeros"{occur in the answer and that the number of output \ones" K is �xed. There-15

fore no information is required to determine the number of wrong \ones" O.These speci�cations result in a high capacity, but are a restriction for thebiological validity of the model.Furthermore, our retrieval strategy implies that an output unit is set to\high" if it has no connection to any of the high input units, i.e. if theactivity is zero. This does not seem biological plausible and shows the limitof our model.Further work should investigate the memory capacity for other retrievalstrategies, modi�ed learning rules or di�erent architectures of associativeincompletely connected memories. The present work can be seen as basis forsuch investigations.

16

7 List of Symbols UsedAi individual activityC capacity of the memoryc(Z) constant for optimal number of input \ones"I stored informationIs stored information per patternK number of \ones" in the output patternL number of \ones" in the input patternM length of the input patternN length of the output patternO number of wrong \ones"p(Z) constant for optimal number of patternsR number of stored patternsW weight matrixX input patternY output patternZ connectivity of the network�i threshold function

17

References[1] Amaral, D. G., Ishizuka, N., & Claiborne, B. (1990). Neurons, num-bers and the hippocampal network. progress in Brain Research, 83,1-11.[2] Buckingham, J., & Willshaw, D. (1983). On setting unit thresholdsin an incompletely connected associative net. Network, 4, 441-459.[3] Hogan, J., & Diederich, J. (1995). Random Neural Networks of Bio-logically Plausible Connectivity. Technical Report, Queensland Uni-versity of Technology, Australia.[4] Kohonen, T. (1979). Content-Addressable memories. New York:Springer Verlag.[5] Palm, G. (1980). On associative memory. Biological Cybernetics, 36,19-31.[6] Palm, G., Kurfess, F., Schwenker, F., & Strey, A. (1993). Neuralassociative memories. Technical Report, Universit�at Ulm, Germany.[7] Steinbuch, K. (1961). Die Lernmatrix. Kybernetik, 1, 36.[8] Vogel, D., & Boos W. (1993). Minimally connective, auto-associative,neural networks. Connection Science, Vol. 6, No. 4, 461-469.

18


Recommended