C++ Neural Networks and Fuzzy Logic
by Valluru B. Rao M&T Books, IDG Books Worldwide, Inc. ISBN: 1558515526 Pub Date: 06/01/95 |
Previous | Table of Contents | Next |
The top row in the table gives headings for the columns. They are, Item, I-1, I-2, I-3 (I-k being for input layer neuron k); H-1, H-2 (for hidden layer neurons); and O-1, O-2, O-3 (for output layer neurons).
In the first column of the table, M-1 and M-2 refer to weight matrices as above. Where an entry is appended with -H, like in Output -H, the information refers to the hidden layer. Similarly, -O refers to the output layer, as in Activation + threshold -O.
The next iteration uses the following information from the previous iteration, which you can identify from Table 7.1. The input pattern is ( 0.52, 0.75, 0.97 ), and the desired output pattern is ( 0.24, 0.17, 0.65). The current weight matrices are as follows:
M-1 Matrix of weights from input layer to hidden layer:
0.6004 - 0.4 0.2006 0.8001 - 0.4992 0.3002
M-2 Matrix of weights from hidden layer to output layer:
-0.910 0.412 0.262 0.096 -0.694 -0.734
The threshold values (or bias) for neurons in the hidden layer are 0.2008 and 0.3002, while those for the output neurons are 0.1404, 0.2336, and 0.0616, respectively.
You can keep the learning parameters as 0.15 for connections between input and hidden layer neurons, and 0.2 for connections between the hidden layer neurons and output neurons, or you can slightly modify them. Whether or not to change these two parameters is a decision that can be made perhaps at a later iteration, having obtained a sense of how the process is converging.
If you are satisfied with the rate at which the computed output pattern is getting close to the target output pattern, you would not change these learning rates. If you feel the convergence is much slower than you would like, then the learning rate parameters can be adjusted slightly upwards. It is a subjective decision both in terms of when (if at all) and to what new levels these parameters need to be revised.
You have just seen an example of the process of training in the feedforward backpropagation network, described in relation to one hidden layer neuron and one input neuron. There were a few vectors that were shown and used, but perhaps not made easily identifiable. We therefore introduce some notation and describe the equations that were implicitly used in the example.
Let us talk about two matrices whose elements are the weights on connections. One matrix refers to the interface between the input and hidden layers, and the second refers to that between the hidden layer and the output layer. Since connections exist from each neuron in one layer to every neuron in the next layer, there is a vector of weights on the connections going out from any one neuron. Putting this vector into a row of the matrix, we get as many rows as there are neurons from which connections are established.
Let M1 and M2 be these matrices of weights. Then what does M1[i][j] represent? It is the weight on the connection from the ith input neuron to the jth neuron in the hidden layer. Similarly, M2[i][j] denotes the weight on the connection from the ith neuron in the hidden layer and the jth output neuron.
Next, we will use x, y, z for the outputs of neurons in the input layer, hidden layer, and output layer, respectively, with a subscript attached to denote which neuron in a given layer we are referring to. Let P denote the desired output pattern, with pi as the components. Let m be the number of input neurons, so that according to our notation, (x1, x2, , xm) will denote the input pattern. If P has, say, r components, the output layer needs r neurons. Let the number of hidden layer neurons be n. Let βh be the learning rate parameter for the hidden layer, and βo′, that for the output layer. Let θ with the appropriate subscript represent the threshold value or bias for a hidden layer neuron, and τ with an appropriate subscript refer to the threshold value of an output neuron.
Let the errors in output at the output layer be denoted by ejs and those at the hidden layer by tis. If we use a Δ prefix of any parameter, then we are looking at the change in or adjustment to that parameter. Also, the thresholding function we would use is the sigmoid function, f(x) = 1 / (1 + exp(x)).
Previous | Table of Contents | Next |