Main Menu

What are neural networks?

Neural networks (or "nets") are small models inspired by the sorts of small processing units that you find in the human brain (brain cells called "neurons"). Scientists make these models and get them to behave intelligently. They can be taught to do things which traditional computers find difficult, such as recognising people's faces. Early neural networks were called Multi-Layer Perceptrons (MLPs)

Another name for neural nets is "connectionist nets", and the people who make and design them are often called "connectionists". The nets often take the form of computer programs that simulate these models (and all throughout this guide you will find small snippets of computer code), although more and more silicon chips are being produced with neural networks designed directly in the form of electronic components.

The main feature of nets is that they learn from experience. If you wanted to sort out male faces from the female ones, you would show the net a large number (the more the better) of photographs of faces, and tell it which ones are male and which are female. The net adapts itself so that it learns the differences between male and female faces - imagine having to program a traditional computer with a set of rules for doing this! It would be practically impossible.

The structure of neurons in the brain was first explained by McCulloch and Pitts in the 1940s. MLPs seemed to be very promising in the early days of computing, but they fell out of favour and only in the last few years have they been making a comeback. Nowadays they are starting to appear everywhere, from teaching some small robot to navigate round a maze to acting as an artificial "nose" that can smell the difference between fine wine and plonk.

What do neural networks do, exactly?

Neural nets are generally used as classifiers. This means that they are used to choose between several options when faced with some input. For instance, I've already mentioned the possibility of getting a neural network to be shown a large number of photographs of people's faces and asked to sort them out into male and female. Alternatively, it could be given spoken words as input and be required to recognise what the words were (i.e. choosing one word from the possible vocabulary).

They can do other things, of course. Neural networks can act as content addressable memories. This means that they can store complex patterns and then you can prompt them with part of the pattern or a version of the pattern that has been corrupted, and they will retrieve the whole original pattern for you. You can see below an example of this. A neural network is given a quarter of a pattern and it reproduces the whole picture from this:

Of course, once you start stitching different neural networks together you can get much more complex behaviour. After all, the human brain is basically that - an enormous number of specialised neural networks all working together to give complex behaviour. Such systems are generally called multinet architectures.

General issues with neural networks

Nets consist of small units called cells, and these are connected to each other in such a way that they can pass signals to each other. In practice, the signal that one cell sends to another is a simple number between 0 and 1. 0 means no signal, 1 means a large signal:

The connections have certain strengths or weights. The net starts off with these connection strengths set randomly. The network is exposed to various inputs and the strengths adjust themselves according to some mathematical plan. This is what we call training and after it, the network can recognise input patterns or, at least, do something sensible - whatever it has been trained to do. The information is therefore stored in the strengths of the connections, just as it is in the human brain.

The training is done in tiny steps. With the exception of the WISARD architecture (which trains very quickly), the training is done as follows:

  1. The first input (training) pattern is presented to the network.
  2. The connections are adjusted a tiny amount to improve the network's chances of recognising that pattern if it sees it again.
  3. The second pattern is presented, and step 2 repeated.
  4. The same thing happens for all the training patterns.
  5. The whole process is rerun with all the training patterns for hundreds (thousands) of times!

The reason the connections are only adjusted a slight amount is to ensure that the network learns to recognise all the patterns fairly well. If the connections were adjusted a great deal after every pattern then the network would be able to recognise the most recently presented pattern brilliantly, and all the others not at all.

Typically, the training routine looks like this:

for training_run := 1 to 10000 do
  begin for pattern := 1 to NUMBER_OF_PATTERNS do
          begin present_input_pattern(pattern);
                adjust_strengths
          end
  end

Supervised and unsupervised training

Most neural nets are trained using supervised training. This means that they are presented with input patterns and the corresponding desired output patterns, i.e. they are not only told what the input pattern is but what output they are supposed to produce when they are given that input pattern. For instance, a network might be trained to recognise whether a picture presented to it was of a man or a woman (like SEXNET by Terry Sejnowski). The network might be trained to produce a 1 output for a man, a 0 output for a woman (or vice-versa). The training would consist of a large number of pictures presented repeatedly (see above) and each accompanied by a 1 or 0 as appropriate.

Supervised training is all very well, but we computer scientists have better things to do with our time than spoon-feed stupid neural networks with input all day! All right, I admit, it's automated, but, there are still some situations where providing input with corresponding output data (so-called "labelled data") is difficult or inappropriate. For this reason, it would be nice if the neural nets could formulate their own decisions as to what the output should be.

This is unsupervised training. The networks are given input, but aren't told what they are supposed to produce for output. Instead, they learn to organise it for themselves into sensible regions of their "output space". The standard network for this sort of thing is the Self-Organising Feature Map (SOFM) by Teuvo Kohonen (Helsinki University). It is set up in such a way that it organises its connections so that similar input patterns always produce similar patterns of activity in the net. In this way it learns to group similar input patterns together. Of course, it can't put names to those patterns but it does create some sort of order out of the chaos of the different input patterns which are fed to it.

Representing Input

It is important to the success of a neural network that the input signals fed to it are in an appropriate form. Some thought has to be given as to whether the input can be "coded", i.e. transformed from the form in which it appears in the outside world into some more logical form in which patterns may be more obvious. Here are two commonly used examples:

I am often asked how a neural network can be used to recognise handwritten text. Since handwritten text takes the form (usually) of dark marks on white paper, the most obvious way of presenting it to a neural net is as a rectangular grid of pixels. The two grids below illustrate this for two versions of a handwritten letter "a".

To a human eye, these are both clearly letter "a"s, but they translate to different sets of pixels. To a certain extent, this is due to the crude scaling of the pixel elements themselves, but a similar problem might exist in a real system. This shows that a rectangular grid of pixels isn't the best way of representing handwritten text.

All right, then, what could we use? Well, let's think about what handwriting is for a moment. You press the pen nib to the paper and then move it continually, tracing out a path. At various points, say at the end of a word or at the end of some letters, you have to lift the pen nib and then put it down again in a nearby position, usually slightly to the right of where you lifted it (if you are using the western alphabet. It would be slightly different for Arabic or Chinese writers who don't move from left to right).

A better way of encoding this is as a series of vectors, i.e. numbers describing the movement of the nib, specifically which direction it is going in at any time. Wherever the nib is at any point (i.e. whichever pixel it occupies), it can move in one of 8 directions, as shown.

I have labelled each vector with a number from 1 to 8, to which we can add a vector labelled 0 to indicate that the line stops (i.e. the nib is lifted from the paper). Here is the first letter "a" above marked with these vectors.

Now the two letters can be translated into a series of numbers:

Left "a" : 28886876665444242232666560
Right "a" : 1687665422322666664220

Now features start to appear. I have colour-coded patterns of digits common to both sets of numbers. 687 represents the top-left curve of the "a", 6654 the bottom-left curve, 223 the rising line just after the bottom of the letter, and 2666 the sharp spike on the right together with its down-stroke. The fact that the two strings of digits are different lengths is a problem, and there is also the problem that the neural net needs to know in which pixel the vector sequence is supposed to start, but at least the representation of the letter is improved.

The second example is simpler, but only if you can read music! A musician will tell you at a glance that the two snippets shown below basically represent the same tune - it is the first couple of bars of "Ba, Ba, Black sheep" in fact!

The only difference between the snippets is that the second one has been moved up one semitone, i.e. it is in D flat, rather than in C. However, this transposition means that the two versions only have one note in common!

The secret in this case is to represent the tune as a series of note changes, i.e. how many semitones each note is above or below the previous note. If the tune goes up by one semitone, this would be represented by +1. A drop of two semitones would be -2 etc. Each of the tunes could then be represented by two lists, identical for each version.

Pitch difference: 0 +7 0 +2 +2 +1 -3 -2
Rhythm: 2 2 2 2 1 1 1 1 4

(There is no pitch difference for the first note as there is no previous note to which to compare it.)

This same pattern would represent this tune regardless of its key.


Top
Hebbian nets