Report

Knapp 1

MATLAB Project I Single Layer Perceptrons

Michael J. Knapp

CAP6615, Neural Networks for Computing Department of Computer and Information Science and

Engineering University of Florida

Instructor: Dr. G. Ritter Department of Computer and Information Science and

Engineering University of Florida, Gainesville, FL 32611

Date: 4 October 2006

Knapp 2

I. Introduction

The overriding theme of this project is to train and test single layer perceptrons using Rosenblatt’s training algorithm. Since the project is very intensive on vector and matrix mathematics, MATLAB was chosen since it was originally designed as a matrix laboratory. MATLAB also includes powerful built-in graphing features that make it ideal for visualizing data. This also removes any cumbersome code from the project and allows the focus to be placed on the SLP itself.

Since Rosenblatt’s algorithm can be generalized from a single output, single layer perceptron to a multi-output, single layer perceptron I wanted to write a core SLP function that would accept the input data X and the desired output D and produce the matrix of weights based on the dimensions of the parameters. Since each part requires its own program, by using this centralized function each of the three parts of this problem becomes more focused on the process that part is asking for as opposed to being inundated with having three disparate incarnations of the SLP algorithm.

I also wanted to explore how varying α affected the number of epochs required by the algorithm. Therefore, before collecting data for the results section, I stepped through the algorithm from 1.0 down to 0.01 to gauge where the algorithm decreased in time to where the time began increasing again. I then intelligently chose α values to show this phenomenon.

For the run-time metric, I specifically chose CPU time over system time. This was done since in today’s multi-tasking computing environments there is no guarantee on how much CPU time an application will receive from the kernel. As such, MATLAB’s cputime feature was used to keep track of how much CPU time the MATLAB process received during an interval and not just the elapsed system time.

For the error during a training session, I looked at two interesting measures of error. The first was the general mean squared error for the ΔW, which is an indicator of how much, on average squared, the weights were adjusted for that epoch. This is a good measure of how far we are from convergence since it indicates the level of activity that epoch. The next error information I look at is the average of the M.S.E. for all epochs. This is a strong indicator of how fast we converged since if we spend a lot of time with a small M.S.E. the average M.S.E. gets smaller.

Depending on the behavior of each trial, different information was chosen to be important and only the important information is presented in this report. However, detailed information for each run of each trial at each epoch is available and a soft-copy can be provided to the reader if requested. Also, the complete code listing of the program from each part can be provided as well. II. Part A. a) Statement of the Problem

This part is to design a single layer, single output perceptron and to train it with Rosenblatt’s algorithm, that is able to classify the binary image representation of the character ‘A’ into one class and characters ‘B’, ‘C’ and ‘D’ into the class not ‘A’. After training is completed, we must verify the input weights correctly classify the data by verifying each of the characters classify correctly.

Knapp 3

b) Approach & Algorithm

To approach this problem, I decided to break the problem into two parts. The first part is general logical flow of the program as follows:

a. Translate the uni-polar matrices of ‘A’, ‘B’, ‘C’ and ‘D’ into vectors by a row-major translation.

b. Set the input to be A, B, C and D and the desired output to 1, 0, 0, 0 respectively

c. Take the current CPU time and start the SLP function d. Output the time after the function returns e. Verify the weights classify the data by setting the calculated

output Y equal to one if the dot product of the weight vector and the input vector is greater than or equal to zero, zero otherwise.

The second part is the generic SLP training algorithm function that was

described in the introduction and covered in detail in Appendix I. c) Results

For this portion, the important information chosen was just the total number of epochs the algorithm ran, the CPU time it took to run and which characters were successfully classified by the network after training. Since all of these trial run for so few epochs, and the algorithm converging guarantees a final error of zero, error was not deemed important for this part. If detailed information about all data from each epoch is desired. For α = 0.85 Trial # | Epochs | Elapsed Time (s) | Avg. MSE | Characters Classified ---------+--------+------------------+----------+----------------------- 1 | 9 | 0.046875 | 0.080772 | ‘A’ ‘B’ ‘C’ ‘D’ 2 | 9 | 0.046875 | 0.080772 | ‘A’ ‘B’ ‘C’ ‘D’ 3 | 6 | 0.046875 | 0.072991 | ‘A’ ‘B’ ‘C’ ‘D’ 4 | 10 | 0.015625 | 0.116489 | ‘A’ ‘B’ ‘C’ ‘D’ 5 | 12 | 0.093750 | 0.092258 | ‘A’ ‘B’ ‘C’ ‘D’ ---------+--------+------------------+----------+----------------------- Average | 9.2 | 0.0500 | 0.0887 | ‘A’ ‘B’ ‘C’ ‘D’ For α = 0.75 Trial # | Epochs | Elapsed Time (s) | Avg. MSE | Characters Classified ---------+--------+------------------+----------+----------------------- 1 | 6 | 0.031250 | 0.056827 | ‘A’ ‘B’ ‘C’ ‘D’ 2 | 6 | 0.015625 | 0.056827 | ‘A’ ‘B’ ‘C’ ‘D’ 3 | 6 | 0.046875 | 0.056827 | ‘A’ ‘B’ ‘C’ ‘D’ 4 | 6 | 0.046875 | 0.056827 | ‘A’ ‘B’ ‘C’ ‘D’ 5 | 6 | 0.046875 | 0.056827 | ‘A’ ‘B’ ‘C’ ‘D’ ---------+--------+------------------+----------+----------------------- Average | 6 | 0.0375 | 0.0568 | ‘A’ ‘B’ ‘C’ ‘D’

Knapp 4

For α = 0.5 Trial # | Epochs | Elapsed Time (s) | Avg. MSE | Characters Classified ---------+--------+------------------+----------+----------------------- 1 | 7 | 0.062500 | 0.037582 | ‘A’ ‘B’ ‘C’ ‘D’ 2 | 8 | 0.046875 | 0.035385 | ‘A’ ‘B’ ‘C’ ‘D’ 3 | 7 | 0.031250 | 0.037582 | ‘A’ ‘B’ ‘C’ ‘D’ 4 | 7 | 0.046875 | 0.037582 | ‘A’ ‘B’ ‘C’ ‘D’ 5 | 9 | 0.062500 | 0.040342 | ‘A’ ‘B’ ‘C’ ‘D’ ---------+--------+------------------+----------+----------------------- Average | 7.6 | 0.0500 | 0.0377 | ‘A’ ‘B’ ‘C’ ‘D’ For α = 0.2 Trial # | Epochs | Elapsed Time (s) | Avg. MSE | Characters Classified ---------+--------+------------------+----------+----------------------- 1 | 8 | 0.046875 | 0.007892 | ‘A’ ‘B’ ‘C’ ‘D’ 2 | 10 | 0.062500 | 0.008738 | ‘A’ ‘B’ ‘C’ ‘D’ 3 | 8 | 0.046875 | 0.007892 | ‘A’ ‘B’ ‘C’ ‘D’ 4 | 10 | 0.046875 | 0.008738 | ‘A’ ‘B’ ‘C’ ‘D’ 5 | 8 | 0.062500 | 0.007892 | ‘A’ ‘B’ ‘C’ ‘D’ ---------+--------+------------------+----------+----------------------- Average | 8.8 | 0.0531 | 0.0082 | ‘A’ ‘B’ ‘C’ ‘D’ For α = 0.1 Trial # | Epochs | Elapsed Time (s) | Avg. MSE | Characters Classified ---------+--------+------------------+----------+----------------------- 1 | 14 | 0.046875 | 0.002688 | ‘A’ ‘B’ ‘C’ ‘D’ 2 | 12 | 0.062500 | 0.002631 | ‘A’ ‘B’ ‘C’ ‘D’ 3 | 14 | 0.093750 | 0.002688 | ‘A’ ‘B’ ‘C’ ‘D’ 4 | 14 | 0.078125 | 0.002688 | ‘A’ ‘B’ ‘C’ ‘D’ 5 | 12 | 0.062500 | 0.002631 | ‘A’ ‘B’ ‘C’ ‘D’ ---------+--------+------------------+----------+----------------------- Average | 13.2 | 0.0688 | 0.0027 | ‘A’ ‘B’ ‘C’ ‘D’ d) Conclusion

The first point to note is how short all of the trials lasted. In the best case, where α = 0.75, only 1.5 passes were required through the input data and in the worst case where α = 0.1, barely over 3 passes were needed through the input data. This does however correspond to a factor of two increase in epochs ran. As one would expect, the elapsed time increased by a factor of just over 1.8 here as well. This is in line with epochs and elapsed time being linearly related.

One thing that easily derived from the algorithm, but worth mentioning is how the mean squared error of ΔW is always zero for a training point the last time the algorithm passes through it. This is intuitive since the algorithm has converged once a full pass can be made through the training points with no change in weights and hence no error. Here this meant that upwards of half the epochs in this part had no error since the trials were so short in duration.

Here, the error proved to be of particular interest. For each α, every case where the number of epochs to converge was the same, the average mean squared error was the same. This is counter-intuitive since the random initialization of the weights is supposed to add non-determinism to the algorithm, but this seems to be a sign of some determinism. I hypothesize this is due to the total number of epochs being so few that if the weights

Knapp 5

fall within a certain range it determines the number of epochs from the observed totals and forces the error to the range observed and if a higher precision were used, then the differences would be seen.

Finally, the idea introduced in class that varying α is somewhat of an art is reinforced here. The optimal α was found to be ~0.75 with a sharper increase varying α larger and a gradual increase α smaller. III. Part B. a) Statement of the Problem

This part asks us to generate a relatively large set of points that are separated by, and thus not falling on, a line. Since a line is linear, the two classes are linearly separable. Hence, we are designing a three-input, one output single layer perceptron, where one of this input is, as usual, x0 = 1. Once the network is trained, we are to generate a series of test points and make sure they are appropriately classified. If not, we add them to our training set and repeat. b) Approach & Algorithm

My approach was to first generate the 1000 points by generating the x values ~uniform [0,1] and then mapping them to [b1,b2). Once the x values were generated, I then generate the y values uniformly about the separating plane. This yields a healthy random population enclosed in a rectangle about the separating place. I then test the pairs with the equation of the separating plane to classify them as one if they are above the plane and zero if they are below the plane, and finally train the network.

Once the network is trained, I generate a set of test points ~uniform [b1,b2) and then generate their corresponding y value ~uniform [separating plane - 0.1, separating plane + 0.1] since the most difficult points to classify are near the plane. If at least one points fails classification, I add the entire test set to the training data and repeat the entire training process, if all classify I display the results.

To display the results, I employed MATLAB’s plot feature. I first plot all of the original training points. I then plot all of the test points disparately regardless of whether they failed classification and became training points. I then plot the separating plane and finally plot the learned separating plane according to the definition w0 + w1*x + w2*y = 0.

For this run, I felt that the mean squared error was not as important an error measure as the number of training points. Since 1000 points are used as per the project specification, any point over 1000 is a test point that was misclassified and forced the network to have to be retrained with the new set of points, which gives a real feel for the error as opposed to just listing a numerical mean squared error. c) Results

For reference, I have included an example output from trial 1 where α = 0.20. This is a good example because it not only shows how the data was represented visually, but it also shows how when more than 5 test points are needed how they are kept distinct from the original training data in the display.

Knapp 6

For α = 0.50 Trial # | Epochs | Elapsed Time (s) | Trn. Pts. | Equation Learned ---------+---------+------------------+-----------+-------------------------- 1 | 1445560 | 55.86 | 1000 | y = 0.508015*x + 1.517851 2 |16056640 | 565.33 | 1000 | y = 0.500945*x + 1.952659 3 | 617589 | 26.58 | 1000 | y = 0.500915*x + 1.928097 ---------+---------+------------------+---------- +-------------------------- Average | 6039929 | 224.78 | 1000.00 | y = 0.503292*x + 1.799536

Knapp 7

For α = 0.30 Trial # | Epochs | Elapsed Time (s) | Trn. Pts. | Equation Learned ---------+---------+------------------+-----------+-------------------------- 1 |29803146 | 1037.84 | 1000 | y = 0.500858*x + 1.958729 2 | 2087857 | 78.02 | 1000 | y = 0.500538*x + 1.980784 3 |10634854 | 373.00 | 1000 | y = 0.501288*x + 1.956244 ---------+---------+------------------+---------- +-------------------------- Average |14175286 | 496.29 | 1000.00 | y = 0.500895*x + 1.965252 For α = 0.20 Trial # | Epochs | Elapsed Time (s) | Trn. Pts. | Equation Learned ---------+---------+------------------+-----------+-------------------------- 1 |18960794 | 678.81 | 1005 | y = 0.501190*x + 1.887784 2 |13613850 | 485.11 | 1000 | y = 0.501521*x + 1.893265 3 | 8795781 | 314.70 | 1000 | y = 0.500414*x + 1.986367 ---------+---------+------------------+---------- +-------------------------- Average |13790142 | 492.87 | 1001.67 | y = 0.501042*x + 1.922472 For α = 0.10 Trial # | Epochs | Elapsed Time (s) | Trn. Pts. | Equation Learned ---------+---------+------------------+-----------+-------------------------- 1 |12409023 | 440.02 | 1000 | y = 0.502540*x + 1.873927 2 |10242057 | 365.23 | 1000 | y = 0.502869*x + 1.826852 3 | 2954272 | 109.16 | 1000 | y = 0.505318*x + 1.754559 ---------+---------+------------------+---------- +-------------------------- Average | 8535117 | 304.80 | 1000.00 | y = 0.503576*x + 1.818446 For α = 0.05 Trial # | Epochs | Elapsed Time (s) | Trn. Pts. | Equation Learned ---------+---------+------------------+-----------+-------------------------- 1 | 6086848 | 218.53 | 1000 | y = 0.502416*x + 1.839083 2 | 4968506 | 177.94 | 1000 | y = 0.502180*x + 1.847443 3 | 1005184 | 40.60 | 1000 | y = 0.507828*x + 1.509208 ---------+---------+------------------+---------- +-------------------------- Average | 4020179 | 145.69 | 1000.00 | y = 0.504141*x + 1.731911 d) Conclusion

This part really illuminated the effect of α on the network. Depending on the α chosen, the effect was making the network converge in longer times on average. Once again, we see what appears to be a non-linear effect of α on the run time. Also, based on the varying times for each α it appears that the initial values of W do effect the convergence as well; This was something not seen previously. The relation between α and the y-intercept seems to be parabolic, with a maxima when α = 0.30. The converse appears true for the slope of the line, where while it still appears parabolic, the minima is when α = 0.30.

Regarding the error, the most surprising thing was how few of the networks had to be retrained because of failing to classify the test points. At first I thought this to be an error with my code, so I tried only having 100 points to start with and found that indeed my code was correct. Given this, I feel that for a problem of this nature where there are only two linearly separable classes, 1000 training points are far more than needed. Also, although it is obvious, having to retrain a network is undesirable as it will then approximately double the training time since the original run was fruitless.

Knapp 8

Possibly a genetic algorithm could be used in these cases to attempt to use the resultant weights as the input to the next training attempt.

Finally, this is an excellent exercise to see visually, in two dimensions, how SLPs classify data. The neural network initially starts with random weights, having no idea where the separating plane lies and after some finite number of epochs, it is able to choose a plane that linearly separates the data. While simply finding a line may not seem impressive, I find it quite remarkable that knowing nothing other than some data points, nearly the exact function used to generate the data can be found. IV. Part C. a) Statement of the Problem

The problem here is to design a single layer, multi-output perceptron that classifies the characters ‘A’, ‘B’, ‘C’ and ‘D’ into four disparate classes using the generalized Rosenblatt algorithm. Once our network converged for the four classes, we were to add noise by flipping bits on an increasing percentage of bits until the algorithm no longer classified the data. b) Approach & Algorithm

For my approach, I studied the generalized Rosenblatt algorithm and read it as training one set of inputs fed to m, where m is the number of output nodes, single layer, single output perceptrons. For this problem I chose to stay with the same representation of the data outlines in section I. I also chose to take the problem at face value and have one output node per class and not employ and encoding to define the classes, this left the network with four output nodes.

For testing the data I felt it would better show the weaknesses of an artificial neural network (ANN) if instead of stopping the tests of the noise tolerance when one class failed, I stopped when all classes failed. I felt this would show if the ANN had a particular weakness and may bring to light class encoding to circumvent these weaknesses. The overall flow of the program was:

a. Create row-major vectors of uni-polar character matrices and create class vectors for desired output D.

b. Obtain the weight vector by running the SLP function on the aforementioned input, output pairs.

c. Initialize noise level to 0.00% d. Create a random mask, ~uniform [0,1], of the input vector’s size and

any value above the noise level sets the bit to one, zero otherwise e. Exclusive-or (XOR) the vector with the mask to toggle the bits, i.e.

simulate noise at the specified level. f. Perform a dot product and apply the hard-limiter to weights with the

noisy input to arrive at the output class values. g. Verify pattern classification for each character and display which

characters classified. If no characters classified correctly then exit, otherwise increase the noise level by 5.0% and go to d. Note the noise mask is kept constant at each noise level.

Knapp 9

c) Results For α = 0.80 Maximum Noise Tolerance Trial # | Epochs | Elapsed Time (s) | Avg. MSE | ‘A’ | ‘B’ | ‘C’ | ‘D’ ---------+--------+------------------+----------+------+------+------+------ 1 | 23 | 0.140625 | 0.083542 | 0.30 | 0.45 | 0.35 | 0.20 2 | 23 | 0.171875 | 0.083542 | 0.45 | 0.45 | 0.40 | 0.35 3 | 19 | 0.109375 | 0.092943 | 0.15 | 0.45 | 0.35 | 0.00 4 | 19 | 0.140625 | 0.092943 | 0.50 | 0.40 | 0.35 | 0.35 5 | 23 | 0.140625 | 0.083542 | 0.40 | 0.45 | 0.40 | 0.05 ---------+--------+------------------+----------+------+------+------+------ Average | 21.4 | 0.140625 | 0.087302 | 0.36 | 0.44 | 0.37 | 0.19 For α = 0.65 Maximum Noise Tolerance Trial # | Epochs | Elapsed Time (s) | Avg. MSE | ‘A’ | ‘B’ | ‘C’ | ‘D’ ---------+--------+------------------+----------+------+------+------+------ 1 | 19 | 0.140625 | 0.060912 | 0.30 | 0.25 | 0.25 | 0.15 2 | 19 | 0.093750 | 0.060912 | 0.35 | 0.35 | 0.15 | 0.15 3 | 19 | 0.140625 | 0.060912 | 0.40 | 0.30 | 0.15 | 0.15 4 | 19 | 0.109375 | 0.060912 | 0.20 | 0.30 | 0.20 | 0.15 5 | 19 | 0.125000 | 0.060912 | 0.30 | 0.30 | 0.20 | 0.15 ---------+--------+------------------+----------+------+------+------+------ Average | 19.0 | 0.121875 | 0.060912 | 0.31 | 0.30 | 0.19 | 0.15 For α = 0.50 Maximum Noise Tolerance Trial # | Epochs | Elapsed Time (s) | Avg. MSE | ‘A’ | ‘B’ | ‘C’ | ‘D’ ---------+--------+------------------+----------+------+------+------+------ 1 | 23 | 0.109375 | 0.031421 | 0.10 | 0.40 | 0.25 | 0.10 2 | 27 | 0.125000 | 0.029017 | 0.10 | 0.40 | 0.30 | 0.10 3 | 23 | 0.140625 | 0.031421 | 0.05 | 0.30 | 0.30 | 0.05 4 | 23 | 0.187500 | 0.028344 | 0.00 | 0.10 | 0.15 | 0.05 5 | 27 | 0.156250 | 0.023960 | 0.25 | 0.40 | 0.25 | 0.00 ---------+--------+------------------+----------+------+------+------+------ Average | 24.6 | 0.143750 | 0.028833 | 0.10 | 0.32 | 0.25 | 0.06 For α = 0.40 Maximum Noise Tolerance Trial # | Epochs | Elapsed Time (s) | Avg. MSE | ‘A’ | ‘B’ | ‘C’ | ‘D’ ---------+--------+------------------+----------+------+------+------+------ 1 | 17 | 0.093750 | 0.017759 | 0.25 | 0.20 | 0.00 | 0.00 2 | 13 | 0.046875 | 0.025723 | 0.15 | 0.05 | 0.00 | 0.15 3 | 27 | 0.156250 | 0.017605 | 0.30 | 0.30 | 0.30 | 0.20 4 | 17 | 0.109375 | 0.020047 | 0.20 | 0.15 | 0.00 | 0.20 5 | 26 | 0.140625 | 0.021889 | 0.10 | 0.25 | 0.20 | 0.00 ---------+--------+------------------+----------+------+------+------+------ Average | 20.0 | 0.109375 | 0.020605 | 0.20 | 0.19 | 0.10 | 0.11 For α = 0.30 Maximum Noise Tolerance Trial # | Epochs | Elapsed Time (s) | Avg. MSE | ‘A’ | ‘B’ | ‘C’ | ‘D’ ---------+--------+------------------+----------+------+------+------+------ 1 | 23 | 0.093750 | 0.015026 | 0.05 | 0.10 | 0.45 | 0.00 2 | 27 | 0.140625 | 0.014992 | 0.05 | 0.25 | 0.05 | 0.00 3 | 23 | 0.171875 | 0.015417 | 0.35 | 0.40 | 0.35 | 0.35 4 | 23 | 0.140625 | 0.015541 | 0.00 | 0.05 | 0.05 | 0.00 5 | 19 | 0.140625 | 0.017038 | 0.00 | 0.00 | 0.00 | 0.00 ---------+--------+------------------+----------+------+------+------+------ Average | 23.0 | 0.1375 | 0.015603 | 0.09 | 0.16 | 0.18 | 0.07

Knapp 10

d) Conclusion

The most interesting point I found was how the last element in the training set was the most susceptible to noise. Meaning, this was usually the first element to fail classification with increasing levels of noise. I surmise this is due to it being the last element to be accounted for on the first pass and that fact propagates through and manifests itself as stated. Also of note is that the second element is consistently the least susceptible to noise, which does not follow the converse of the logic used for the last element’s susceptibility.

One very interesting thing I found was that sometimes a character would not classify at one noise level and then later classify at a higher noise level. I would suspect this is due to where the noise is located. Looking at the matrices for the four characters, we see that many of the points stay the same between some classes. This would intuitively mean that the neural network has to classify on less than the full 18 x 18 points. So, if the noise is clustered in areas that are distinct between classes, as opposed to areas that are the same between classes.

I also noticed that these trials, even more so than in part A, had a tendency to take the same number of epochs given the α, however as α decreases the variance in epochs between trials increases.

I also noticed that the more epochs in a trial tended to make the classes less susceptible to error. This follows intuitively since each epoch brings us closer to the solution so if more epochs bring us closer to the solution, then we have more room for noise. Specifically looking at α = 0.40 trials 1 and 2 have low noise tolerance, but trial 3 has a much higher noise tolerance.

Once again, run-time and epochs seem linearly related as one would expect. Also as α decreases the average mean squared error decreases and not in correlation with the number of epochs. This seems counter-intuitive and will need further investigation. V. Conclusion

Neural networks bring something completely new to the table, non-determinism as well as non-linearity. They are able to start with no knowledge of a function used to generate training points and train to behave as a linear approximation to that function. Also, once the network is trained only a vector dot product and then Boolean condition is required to generate the output from the input. This is extremely powerful as a multiply-accumulate unit (MAC) can be driven at extreme speed, while implementing the actual function in more traditional methods can not. This allows data to flow forward in real-time. VI. References [1] Ritter, Gerhard X. (personal communication, August 30, 2006)

Knapp 11

VII. Appendix I The core SLP function’s MATLAB source code is provided here as a reference. Since this is the core functionality, it should be trivial to design the calling programs to use this function, which implements Rosenblatt’s training algorithm. % This is a function to train a SLP. It takes as input a unipolar vector X, % which is an [n,k] matrix where n is the number of inputs to the SLP and k % is the number of training sets. Y is [1,k] unipolar for the output class % and alpha is the learning parameter function W = slp(X, D, alpha) % Add x0 = 1 to input and get values for n, k, m [n,k] = size(X); X = [ones(1,k); X]; [n,K] = size(X); [m,k] = size(D); if (K ~= k) disp('Error: X and D do not match.'); return; end % 1. Initialize t=0, learning rate parameter a, and set weights W(0)=arbitrary values % i.e. wi(0)=arbitrary for i=0,...,n; t = 0; W = rand(n,m); % n dimensional column vector % Prepare for the loop (2-5) disp(sprintf('Epoch#\tM.S.E.\t\tElapsed Time (s)\tIterations w/o change')); not_done = k; % not_done = k since we need k iterations where weights dont change % 2. For each pair (xk,dk) from the training set, do Steps 3–5; time = cputime; % Get the current CPU time totalmse = 0; % total the M.S.E. maxrun = 0; % longest run without updating W while (not_done > 0) i = mod(t,k) + 1; % current index % 3. Compute ?W(t)=aekxk, i.e. ?wi(t)=aekxik for i=0,...,n, where % ek=dk-yk and yk=f(?i=0...nwi(t)xik); if (m > 1) y = ((W' * X(:,i)) >= 0); e = (D(:,i) - y); dW = (alpha * e * X(:,i)')'; else y = (dot(W, X(:,i)) >= 0); % optimized for speed dW = alpha * (D(i) - y) * X(:,i); end

Knapp 12

% 4. Increment t=t+1; t = t + 1; % 5. Update W(t)=W(t-1)+?W(t-1), i.e. wi(t)=wi(t-1)+?wi(t-1) for i=0,...,n; W = W + dW; % 6. If no weight changes occurred during last epoch (iteration of Steps 3–5), % or other stopping condition is true, then stop; otherwise, repeat from Step 2. if (nnz(dW) == 0) not_done = not_done - 1; else not_done = k; end if (cputime - time) > 3600 disp(sprintf('Giving up, over one hour of CPU time used.\n')); not_done = 0; end % Display data about this iteration thismse = mse(dW); % !!! Comment these out if runtime too long totalmse = totalmse + thismse; showstats = false; if (t <= 25) showstats = true; end if (maxrun < (k - not_done)) maxrun = (k - not_done); showstats = true; end if (showstats == true) disp(sprintf('%d\t\t%f\t\t%f\t\t%d', t, mse(dW), (cputime - time), (k - not_done))); end end disp(sprintf('Average M.S.E. is %f for %d epochs', (totalmse / t), t)); % Display mean M.S.E. disp(sprintf('Total CPU Time: %f (seconds).', (cputime - time))); % Display elapsed time

Knapp 13

VIII. Appendix II Experimental test bench specifications, which were partially generated using CPUID’s CPU-Z 1.37 available at http://www.cpuid.com/cpuz.php: - Processor(s) Number of processors 1 Number of cores 2 per processor Number of threads 2 (max 2) per processor Name Intel Core 2 Duo E6600 Code Name Conroe Specification Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz Package Socket 775 LGA Family/Model/Stepping 6.F.6 Extended Family/Model 6.F Core Stepping B2 Technology 65 nm Core Speed 2397.6 MHz Multiplier x Bus speed 9.0 x 266.4 MHz Rated Bus speed 1065.6 MHz Stock frequency 2400 MHz Instruction sets MMX, SSE, SSE2, SSE3, SSSE3, EM64T L1 Data cache 2 x 32 KBytes, 8-way set associative, 64-byte line size L1 Instruction cache 2 x 32 KBytes, 8-way set associative, 64-byte line size L2 cache 4096 KBytes, 16-way set associative, 64-byte line size - Chipset & Memory Northbridge Intel P965/G965 rev. C2 Southbridge Intel 82801HB (ICH8) rev. 02 Memory Type DDR2 Memory Size 1024 MBytes - System Mainboard Vendor Intel Corporation Mainboard Model DG965SS BIOS Vendor Intel Corp. BIOS Version MQ96510J.86A.1176.2006.0906.1633 BIOS Date 09/06/2006 - Memory SPD Module 1 DDR2, PC2-4300 (266 MHz), 512 MBytes, Mushkin Module 2 DDR2, PC2-4300 (266 MHz), 512 MBytes, Mushkin - Software Windows Version Microsoft Windows XP Professional Service Pack 2 (Build 2600) DirectX Version 9.0c MATLAB Version 7.2.0.232 (R2006a)

Date post:	30-Oct-2014
Category:	Documents
Upload:	chethan-arkasali
View:	2 times
Download:	1 times

Report

Documents