-
Notifications
You must be signed in to change notification settings - Fork 0
background on models
This work grow out of a lot of tiny ideas forming over the years since my initial Cand.Scient. thesis in 1994. It was about “Nevromorfe prosessorer for akustisk lokalisering” (“Neuromorph processors for acoustic localization”) and most of it isn't really of much interest in this context. Some of it touched on the use of adaptive beam forming and use of neural networks, and some of that use of Herault-Jutten networks. Some of those ideas kept hunting me for years, until they finally snapped. It is possible to use decorrelating (principal component analysis – PCA) or separating (independent component analysis – ICA) networks to make adjustments between neural networks to avoid excessive retraining. In other words; a traditional neural network does not have to learn all possible poses for an object, as long as the decorrelating or separating network can normalize the pose.
It does not seem that important, but it can normalize outputs from one network to another, and it does that as a black-box with strict feed forward process. INside the black-box there is some feed-back, but it does not extend outside the box. It does this without ever knowing anything about the actual layout, the only thing necessary is independent signals. Whether a signal vector is composed of a, b, and c, or of b, c, and a does not matter in our case – it is just rotations of a cube.
Not before Geoffrey Hinton published articles about capsule networks in 2017 did I realize this was in fact a pretty important finding. The clean forward propagation is pretty darn hard to get right, but under all mumbo-jumbo-talk it is a PCA. An ICA would probably fare better, that is a Herault-Jutten network. I've been sitting on a solution for this problem in over 20 years, and nature for a good bit longer. So much for sharing ideas.
In cerebral cortex there are clear indications that information propagates both sideways inside a hemisphere and between hemispheres. If there were no normalization of pose, then the whole interconnected network would have to relearn for each new pose. That imply backpropagation from one part of the neocortex to another, which is bad. When the interconnect goes through a normalizing network, then the long-haul backpropagation can be avoided, and each part learn in isolation. That is good!
This makes it quite a bit easier to get a grasp of what goes on in the neocortex. Each neuron can be viewed in separation, when there are no sideways interference, and then each microcolumn and even each macrocolumn That makes it somewhat simpler, but not really simple.
Assume we neglect sideways connections, then we know that neurons within each macrocolumn have nearly identical receptive fields, which means microcolumns within a macrocolumn have nearly identical receptive fields, which means neurons within a microcolumn have nearly identical receptive fields, and that should imply they implement the exact same function, but they don't.
The receptive field is the same for all neurons. That means the encoded space by excitatory connects will collapse into identical output as the input is the same. If we add inhibitory cross connects between the somas in a single layer, then this will force the encoded space (the output vector) to open up. If the neurons are inside a microcolumn they will encode a point in a binary space with a binary output vector, as seen by activation of the axon hillock. If the microcolumns are inside a macrocolumn, and the inhibitory connections reach to neighboring microcolumns, then the points are strung out in binary space. It isn't easy to imagine a string of points in binary space, but if we imagine the activation to plot points in some “activation space” then it is more obvious that this will be paths approximating lines.
The neurons inside each layer are a bit different compared to an artificial node in a neural net. They are typically pyramidal neurons, with one apical dendrite and one or more basal dendrite. The apical dendrite can be quite long. In some sense this can be compared to a residual neural network, but note the existence of external inputs in each layer. It isn't just a plain feed-forward from molecular layer (layer I) with later skipped layers.
Each layer in a macrocolumn implements paths in an activation space, on the surface of a manifold. That manifold has two incarnations, one represented by the apical dendrite and one by the basal dendrite. The apical manifold can be viewed as the target, and the basal as the source. Those two will approximate each other, but the source is more dependent on the sensory input, so it will usually win, although the target has some tricks to be more persistent. In some cases the target is the only one, and it will act as a fill-in.
If an approximation of the “world” exists in the target, that is the apical dendrite rooted in the molecular layer (layer I), then this can be compared to some sensory input from the source. The sensory input for this purpose comes from the basal dendrite rooted in external granular layer (layer II). When there is a match in both, the accumulated sum goes high, and the activation signal the match. If the accumulated sum is high, but not sufficient on either source or target alone, there may not be a match. For a source to be sufficient to trigger a match it must be very strong, or being supported by target. In some cases it seems like the target can be named the expectation. That makes it somewhat easier to grasp what goes on.
Now, assume there is a mixin coming from the long-tailed pyramidal neuron in some other area. It is just a bunch of axons, and neurons in the local column attaches wherever it can. Those attaching neurons are a little different, and are called stellar neurons and reside in layer III. Together with a short-tailed neurons they form an alternate sensory input. These can then be used for strengthening the previous source, giving that the target is the same. A weak sensory input as alternate source might then be sufficient given similar input from neighboring intracortical areas. Some research indicates they connect to the long-tailed pyramidal neurons at the same layer, some research that they connect to long-tailed pyramidal neurons at the next layer. In the first case they may get away with being without apical dendrite by piggy-backing on the activity in the primary neuron, in the second case they may not and must establish their own target. The different findings may also indicate two different functions, it is not quite clear what is the correct interpretation.
A mixin may be viewed as alternate supporting evidence for a specific interpretation within a given context. Previously the local neurons had a specific hypothesis about the world, but now lacks the supporting evidence. Instead, they use an alternate input that does support the evidence, and indicates a positive outcome.