Artificial Neural Networks - an early AI stumble

lpetrich · Jul 6, 2020

AI stumble? Let us look at some history.

An early version of artificial neural networks was the "perceptron". It takes some inputs, finds a weighted linear combination of them, then subjects the combination to some output function, like a step function or a smoothed version like a logistic function. But there are many problems that a perceptron cannot solve, including many simple ones.

Here is a very simple problem: exclusive or, XOR. Here is its operation table, for where z = x xor y:
[TABLE="class: grid"]
[TR]
[TD]x
[/TD]
[TD]y
[/TD]
[TD]z
[/TD]
[/TR]
[TR]
[TD]0
[/TD]
[TD]0
[/TD]
[TD]0
[/TD]
[/TR]
[TR]
[TD]0
[/TD]
[TD]1
[/TD]
[TD]1
[/TD]
[/TR]
[TR]
[TD]1
[/TD]
[TD]0
[/TD]
[TD]1
[/TD]
[/TR]
[TR]
[TD]1
[/TD]
[TD]1
[/TD]
[TD]0
[/TD]
[/TR]
[/TABLE]
Or graphically,
[table="class: grid"]
[tr]
[td][/td]
[td]#[/td]
[/tr]
[tr]
[td]#[/td]
[td][/td]
[/tr]
[/table]
It's easy to show that there is no way that a perceptron can do XOR. That was a conclusion reached by AI researchers Marvin Minsky and Seymour Papert, a conclusion that they wrote about in their 1969 book

Perceptrons. Systems of cascaded perceptrons could get around that problem, but it was some years before someone discovered how to train them without consuming a grotesquely large amount of computer time. Remember that this was the 1970's. This led to a lot of interest in cascaded perceptrons, more usually called artificial neural networks -

Artificial neural network. In fact, perceptrons are nowadays usually called "neurons".

Cascaded perceptrons proved capable of doing what single perceptrons were unable to do, like the XOR problem. One needs two "hidden units" that receive the inputs, and one "output unit" that works from the outputs of the hidden units. I will give two solutions for the XOR problem. H(x) is the step function, 1 if x > 0 and 0 if x < 0.

hid1 = H(x+y-1/2)
hid2 = H(x+y-3/2)
out = hid1 - hid2

hid1 = H(x-y-1/2)
hid2 = H(-x+y-1/2)
out = hid1 + hid2

I use a pure linear perceptron as the output one, but it can be semilinear like the hidden ones here.

Artificial neural networks have gotten a lot of development since the 1980's, with several types of perceptrons / neurons and connection architectures having been developed and used.

The original kind of training for ANN's is something called backpropagation or "backprop". It's essentially gradient descent, and one can use various elaborations that speed up training, like trying to go as fast as one can without overshooting, and function-optimization algorithms like conjugate gradients and quasi-Newton ones. I once had to implement an ANN for Matlab, and I found that backprop was very slow. So I looked for function-optimization functions, and I found CG and QN ones. The QN one was faster, but it needs member O(n^2) memory for n variables. The CG one only needs O

memory, and it is more suited for large ANN's.

But fancy forms of backprop are still used, like for big training sets where one randomly presents selections from them. Algorithms like CG and QN require presenting an entire set for each iteration.

Artificial Neural Networks - an early AI stumble

Loading....

lpetrich

Contributor