General Discussion

Curt Welch

I didn't know you were working on that. I have HUGE interest in reinforcement learning in spiking networks. I've been playing with that for around 10 years now. Nothing near 10 years of work, but it's the only approach I've been looking at over the past 10 years when I have time to play with it. I'll look at what you have done...

ince the 80's I've played around with trying to create neural networks that can be trained by reinforcement as a path to try and create general human intelligence. I still believe that the correct and only path to making machines act like true intelligent humans, is by creating a reinforcement learning algorithm that works in the high dimension sensor and effector spaces we find in the real world. Most of my work was done with fairly traditional looking neural networks. Nodes had three inputs, and one output, and most of them had binary signal so each input and output was a simple 1/0 binary value. The difference from standard neural networks is that I was only interested in reward based learning algorithms to train than instead of back prop or other such approaches because I've always believed that only reward based training will lead us to true human like behavior.

Though I always just thought of this as AI, this sort of work is now going under the name of AGI. (Artificial General Intelligence). This search has been a long standing passion of mine that I pick up and play with from time to time.

My earlier networks were all standard synchronous networks so that the entire network was given a input vector with a 1/0 value for each input, then outputs for each node was calculated for each layer until output values were calculated. I used these simple networks to drive the behavior of trivially simple agents moving in simple environments like a 2D grid world, with a task such as trying to find food. The network was given rewards when the agents found food. I played with endless different algorithm variation of how these sorts of networks could perform the network calculations and how they would learn/change from the reward signals.

But, about 10 years ago roughly, I started to think more about the temporal issues of the problem and started thinking about the concept asynchronous spiking neural networks. I've since developed a real attraction to using asynchronous spiking networks instead of binary signal formats. Though it works to emulate the concept of a spiking network with binary inputs where 0 means no spike, and 1 means a spike, this becomes very computationally expensive where the network ends up having to process millions of zero and only a few 1 values. So I've switched to a very different style of implementing the networks where the spikes show up as inputs with a time stamp and where no two spikes are ever allowed to show up with the same time stamp. This changes how the network is coded so that only one input spike is dealt with at a time, entering the network.

For other reasons, I've also decided to build networks where the spikes can never fork and turn into two spikes, or die out. So for every spike that enters the neural network, it travels though the network, from node to node, and exists the network out of one node in the final layer of the network. So each layer of the network, always have on, and only one, neuron that "fire" to pass the spike on to the next layer.

So, instead of writing code that is processing 100 input values propagating through 100 layers, taking on the order of 100x100x3 computations, for each clock cycle, and maybe 1000 cycles per second, for 30,000,000 computations per second, my new style of networks our what I call these "pulse sorting" networks. So even with 100 inputs, the average pulse rate for input could be 10 per second, even though the temporal resolution of the timestamps could be much finer than 1/1000 of a second. So it may have 1,000 pulses per second to process. And each pulse, needs only about 100*3 computations to pass through the 100 levels. So this ends up taking only 300,000 computations per second reducing computational complexity by a factor of about 100. Which means, for the same sized network (nodes/synapses) the same CPU power can process about 100 times larger network using this sort of pulse coding, The savings is even greater, if the pulse density of the inputs can be less.

The difference is that most the data is encoded in the timestamp of the arrival of the pulse, and the restriction of the pulses not being able to "fork" creates a sparse encoding approach that greatly reduces the computational complexity. Or, another way to understand it, is that the data is encoded in the temporal spacing of the spikes, and not in the "values" of the inputs, which is conceptually how traditional artificial neural networks are coded.

I've not attempted to create accurate simulations of real neurons in any of these networks. I've only borrowed the conceptual idea of asynchronous spiking networks, and experimented with different sorts of algorithms.

But, with this approach, the action of the nodes becomes one of deciding what path to send a spike down, vs whether to "fire" or "not fire". So this path selection is what is trained ultimately by reinforcement learning in the style network's I've been playing with.

Real world behavior problems are very temporal in nature. That, it's more about deciding win to perform an action, than just what to do it. These spiking signal format that encodes most their information in the time domain, seems to lead more naturally to networks that are inherently time based in their actions, and seems to be a good fit with real world problems, where as traditional value-calculating neural networks are timeless and were external memory has to be added to give it the power to make time based decisions.

Reinforcement Learning for Spiking Neural Networks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

General Discussion

Curt Welch

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally