Data Mining - An example of Hebbian Links

This section describes some work that I did with a colleague at university to use simple Hebbian links to produce some useful results in the field of data mining. We had hoped that this work would lead to an academic paper but it was not to be, so you will probably be the only people who find out about it.

What is Data Mining?

Data mining is the process of deriving simple rules that describe (some of) the behaviour of large amounts of apparently chaotic data. Imagine, if you will, a situation where we want to discover which combinations of symptoms were typical for any particular disease. This is a data mining problem: gather a large amount of case data, describing the symptoms displyed by hundreds of patients together with the diagnosis produced by the doctors for each of the diseases. Then look for patterns in the data that link the diseases to the symptoms.

The example that my colleague was using was that of deriving connections in web pages. A computer program would trawl thousands of web sites looking for keywords such as "Microsoft" and "system analyst", and would try to find connections between them. For instance, if 90% of the web pages that mentioned the keyword "cirrhosis" also mentioned the word "alcohol", then the system would conclude that there was a connection of some sort between alcohol and cirrhosis.

The problem was that his symbolic rule-generation system wasn't working very well. That was when I stopped him in a corridor and suggested Hebbian Links.

Data Mining using Hebbian Links

Now, it won't have escaped your attention that Hebbian Links are a trained neural architecture. This means that we present it with an input pattern together with an output pattern, as shown in the diagram on the right. However, in the case of data mining from web pages, we don't have data in the form of input and output patterns, merely patterns of which words are present and which ones aren't.

So we solve this problem by using the Hebbian Links as an auto associator. This means that the same pattern is presented both as the input and as the output to the Hebbian Links and they learn (hopefully) to associate the pattern with itself, as shown in the diagram on the right.

You will notice in that second diagram that there is no connection between each input node and the corresponding output node (i.e. there is no connection between input 1 and output 1, between input 2 and output 2 etc.) It should be pretty obvious why this is the case: Whenever the same pattern is present on both input and output, there will always be a correlation between each input bit and the corresponding output bit, and this would lead to a very strong connection between them if the network weren't restricted. We are looking for more subtle connections, so I have disabled any connections between corresponding input and output bits.

Experiment 1

For initial tests, an artificial data set was created as follows: 1000 records of 10 binary data bits each were created. The values of the bits were set at random, with equal probabilities of being set or cleared, apart from the following possible combinations: If bits 1, 2 and 3 were all set, there was a 90% chance that bit 4 was set. If bits 5, 7 and 9 were set, there was a 70% chance that bit 10 was set, and if bits 2 and 6 were set, there was a 50% chance that bit 8 was set.

Two tests were carried out. A two-layer Hebbian net with 10 inputs and outputs was initialised as outlined above and then trained using LTP with post-not-pre LTD. Another similar Hebbian net was initialised and trained using LTP with pre-not-post LTD. The training involved presenting all 1000 records in turn and repeating the sequence 10 times. After training, the strengths of the weights were examined and are recorded in Tables 1 and 2 below. (All software was written in Turbo Pascal and run on a standard personal computer).

Table 1. Weights for the Hebbian net trained using LTP with post-not-pre LTD. Red type indicates weights that correspond to pre-programmed rules.
 

Output node

   

1

2

3

4

5

6

7

8

9

10

Input node

1

0 -3.75 -4.444 12.779 -1.174 -1.964 5.267 -0.415 -0.047 -2.605

2

-0.954 0 6.174 15.566 2.974 2.056 7.564 8.021 0.865 4.611

3

-4.382 3.54 0 12.372 3.873 -10.997 -4.154 4.757 5.643 -2.859

4

24.414 24.361 24.286 0 17.623 10.926 14.85 8.528 5.493 18.893

5

-0.972 0.441 4.216 5.88 0 -1.706 2.454 2.52 -4.53 4.558

6

2.482 4.106 -6.259 3.844 2.756 0 11.886 7.302 -0.015 10.053

7

10.466 9.958 1.229 8.626 7.563 12.428 0 4.028 6.964 8.058

8

-2.568 3.471 2.873 -5.182 0.331 0.447 -3.152 0 -1.525 -2.49

9

-0.423 -1.927 5.156 -6.536 -4.977 -5.057 1.469 0.269 0 11.378

10

5.313 10.387 5.516 15.601 13.037 13.916 11.553 8.03 19.916 0

Table 2. Weights for the Hebbian net trained using LTP with pre-not-post LTD.

 

Output node

   

1

2

3

4

5

6

7

8

9

10

Input node

1

0 -1.25 -4.694 24.279 -1.174 2.536 10.267 -2.665 -0.547 5.645

2

-3.454 0 3.424 24.566 0.474 4.056 10.064 3.271 -2.135 10.361

3

-4.132 6.29 0 24.122 4.123 -6.247 1.096 2.757 5.393 5.641

4

12.914 15.361 12.536 0 6.123 3.926 8.35 -5.222 -6.507 15.643

5

-0.972 2.941 3.966 17.38 0 2.794 7.454 0.27 -5.03 12.808

6

-2.018 2.106 -11.009 10.844 -1.744 0 12.386 0.552 -5.015 13.803

7

5.466 7.458 -4.021 15.126 2.563 11.928 0 -3.222 1.464 11.308

8

-0.318 8.221 4.873 8.568 2.581 7.197 4.098 0 0.225 8.01

9

0.077 1.073 5.406 5.464 -4.477 -0.057 6.969 -1.481 0 20.128

10

-2.937 4.637 -2.984 18.851 4.787 10.166 8.303 -2.47 11.166 0

The entries in the tables in red type correspond to the weights that should be high in order to encode the rules specified. Since there is no inherent direction in the connections, (the algorithm does not specify which of the patterns is input and which is output), the weight matrix should be approximately symmetrical about the major diagonal (with variations due to the initial random setting of the weights).

Conclusions

If the encoding of the rules were perfect then only the entries in red type would be high. The others would be low or even negative. Unfortunately, both training types seem to produce erroneously high figures and some figures which should be high are actually low. An examination of the records indicate that the erroneously high figures correspond to inadvertent correlations between data bits. It might be argued that these correlations are unintentional rules. However, this doesn't explain why some of the figures which should be high are actually low. The only explanation that I can offer is that some records contain bit patterns for these inputs that tend to increase the weight and that others contain bit patterns that tend to decrease them. However, this in itself wouldn't explain why the weight for wxy is low whereas wyx is relatively high. I suspect that the discrepancy indicates a basic limitation of this particular technique.

One problem that can be perceived with this experiment is that different rules that affect a particular data bit cannot be represented as separate items. Consider the following rules Bit 1, bit 3, bit 4 ® bit 10 and Bit 2, bit 5 ® bit 10. These would be represented by the network as high weight values from bits 1, 2, 3, 4 and 5 to bit 10, which would be indistinguishable from a rule Bit 1, bit 2, bit 3, bit 4, bit 5 ® bit 10.

This experiment has shown that the notion of a connectionist architecture being able to perform data mining is not an impossible one. Clearly, although rules can be represented using a simple two-layer Hebbian network, it suffers from the following limitations (among others):


Go back