Main Menu
|
![]() |
|
![]() |
|
![]() |
|
![]() |
Back propagation always reminds me of waves on a beach - the input signal moves forward through the network and the errors trickle backwards (they "propagate backwards", hence the name), followed by the next signal.
The network is trained by presenting each input pattern in turn at the inputs and propagating forwards and backward, followed by the next input pattern. Then this cycle is repeated several (many?) times. For instance, if you were training a net to recognise the digits 0 to 9, you would propagate it once on digit 0, followed by one go on digit 1 etc. until you reached 9. Then you would repeat the process many times more, the network getting closer and closer to the required weight values each time.
The reason for not putting input pattern 0 at the inputs and propagating many times over before moving to pattern 1 and doing that many times, is that the network would develop weights that would only reflect the pattern on which it was being trained at the time. By the time it had undergone 1000 cycles of training on pattern 1, it would have lost all the weight values it had for recognising pattern 0. By cycling the patterns repeatedly it is to be hoped that the net develops average weight values that are sufficient for every pattern in the "vocabulary" it has to learn.
NUM_HID is the number of nodes in the hidden layer (we start counting at 1).
NUM_OUT is the number of nodes in the output layer.
run_network procedure defined in the previous chapter. That's the easy part.
In this case Ei is the error for node i in the output layer, ai is the activation for node i in the output layer and ti is the target activation for the node, i.e. the desired output for that node.
Here's the Pascal that does the same thing:
procedure calculate_output_layer_errors;
var i : byte; {for loop variable}
begin for i:=1 to NUM_OUT do
with ol[i] do
E:=(desired_output[i] - a) * a * (1 - a)
end;
Now you can see why I included the error value as part of the record. It means that you don't need to define an array specially.
Ei = ai . (1 - ai) .
Sj Ej.wij
Here the terminology has changed a little. This time the subscript i refers to the neurons in the hidden layer, not the output layer, so Ei is the error value for hidden node i, and ai is the activation of the hidden node i. The subscript j refers to an output node, so wij is the weight from hidden node i to output node j and Ej is the error value for output node j. The sum is performed for all the weights from the particular hidden node i that you are looking at to all the output nodes j (i.e. j = 1, then 2, then 3 etc.)
And here it is again in Turbo Pascal. You notice that the sum includes only the weighted links from this hidden layer neuron to all those of the output layers.
procedure calculate_hidden_layer_errors;
var i,j : byte; {for loop variables}
sum : real;
begin for i:=1 to NUM_HID do {Go through entire hidden layer}
with hl[i] do
begin sum:=0; {sum error values from O/P layer}
for j:=1 to NUM_OUT do
sum:=sum + ol[j].E * ol[j].w[i] {Just w[i] only}
E:=a * (1 - a) * sum {no other w[] value}
end
end;
Here is the equation we use:
In this case, wij is the weight from hidden node i to output node j, dj is the error term that we calculated for output node j and xi is the output of hidden node i. h is a constant value, called the learning rate which is a small decimal (typically 0.01 or 0.025). It makes sure that the learning is done slowly - i.e. the weights cannot change too much in any one go. This makes learning more stable as weight changes are not prone to vast swings.
What about the threshold of the node? That is also a value that has been set to a small random number and needs to be trained. The threshold is treated exactly as if it were a weight from a hidden node, except that the "input" from that node is taken to be -1 (as the threshold value is subtracted when the neural net is run):
|
|
![]() Explain further please! |
The following Pascal procedure trains the weights and the threshold:
procedure update_output_weights;
const LEARNING_RATE = 0.025;
var i,j : byte;
begin for j := 1 to NUM_OUT do {Go through all the output nodes}
with ol[j] do
begin {Process all weights from hidden layer to this O/P node}
for i := 1 to NUM_HID do
w[i] := w[i] + LEARNING_RATE * E * hl[i].out;
{Now train threshold for this node}
threshold := threshold - LEARNING_RATE * E
end
end;
In this code I have used j for the index of the output layer node in order to match the subscripts in the equation above (i.e. E matches dj). Similarly, hl[i].out matches xi, and w[i] matches wij.
procedure update_hidden_weights;
const LEARNING_RATE = 0.025; {No reason why this should be
the same as for the O/P nodes}
var i,j : byte;
begin for j := 1 to NUM_HID do {Go through all the hidden nodes}
with hl[j] do
begin {Process all weights from input layer to this node}
for i := 1 to NUM_INP do
w[i] := w[i] + LEARNING_RATE * E * test_pat[i];
{Now train threshold for this node}
threshold := threshold - LEARNING_RATE * E
end
end;
If you want the Java source code for this, then click here. Feel free to alter it as much as you like, of course.
If you want the Java class file, which you can include on your web site of course, then click here.
The network, as it stands has a fixed number of inputs (6) and a fixed number of outputs (5). The hidden nodes in the middle are represented by the blue blobs.
![]() |
The training patterns are represented on the left side of the screen. To set a training pattern, both the input and the desired output, click the mouse on the column of 6 input squares or the column of 5 output squares. Each mouse click darkens the square, where white represents 0, light gray represents 0.3, dark gray represents 0.7 and black represents 1. The sequence cycles, so clicking on a black square sets it to white again. |
![]() |
This combination of inputs and outputs represents the training pattern with the input set to (1, 0, 0, 0.3, 0, 0.7) and the desired output set to (0, 0, 0.7, 0.7, 0). Click on the + symbol or the - symbol next to the word "Training patterns" to increase or decrease the number of training patterns. |
To train the network on the input patterns that you have set up, click on the Train icon. To test the neural network you need to enter figures into the slots by the input nodes in the middle of the screen. You do this by entering figures into the slot at the top, then clicking on the input value that you want to change. When you want to run the test pattern, just click on the Run icon.
|
|
|