-
Notifications
You must be signed in to change notification settings - Fork 21
Gupta Paper Explanation Discussion
We have been discussing this paper , sometimes by email, facebook, and personally.The goal of this page, is to have a record of the conversation, so everybody can be part of it.
The file experiment.m runs the demo showed in Gupta "Character Recognition using Spiking Neural Networks' Paper, all the other files are called from there.
I read the paper. I don't understand the language or math that describes the actual operation of the neurons they are using however. Do you? I'd have to read more of the papers they reference to figure it out. As I've said, I've never bothered to try and study how real neurons work to the level they seem to be trying to emulate them in this work.
The paper describes a two level network but it's really just a single level because the first level of neurons is doing nothing other than acting as pulse generators.
Though the results show the network in general learns to make each neuron function as a pattern detector, it's unclear to me if they are attempting to train specific neurons to learn specific letters, or whether they simply let the network converge on it's own, so that neurons end up randomly picking letters to match. I believe the second is what is happening.
When given the full 48 characters to train on, the network mostly converges on a solution with each node configured to recognize a unique input character pattern, with the exception that one node is recognizing 3 letters, and another locks on to 2 letters, leaving 3 other nodes never being used.
To act as a perceptron network, the system needs learning networks that will balance the activity of all neurons. In this example, one neuron was firing 3 times as much as most the others, and 3 neuron's weren't firing at all. To act as a good perception network, all output neurons should converage on solutions that allow them all to have an equal amount of activity (and in so doing, carry a roughly equal amount of the information flowing though the network).
It should not be hard to tweak the learning rules to make this work like that. It only needs a small bias in the system that causes active nodes to adjust their weights to reduce the odds of firing, and nodes that don't fire, to adjust their weights in the direction of making them fire. It's a question of correctly balancing the learning so as to balance the OUTPUT activity of each neuron.
Yes, we understood, the same thing.
The math includes some ordinary differential equations to describe the membrane voltage, and STPD (spike timing dependant plasticity). This is, if a neuron A causes neuron B to fire, then the synaptic strength is increase.
The nice thing, about this simple learning rule, is the it leads to spatial coding, which make neurons end up randomly picking letters to match.
I agree that the integrate-and-fire neurons are just used as a interface to study the next layer.
The nice thing about this paper, is that it probes , that STDP its all you need to do unsupervised learning, so you can train visual or auditory system.
I have uploaded to the izhikevick folder, how you can inject dopamine, to modulate STDP. So, if a neuron A, causes neuron B, to fire increase it strength only a litlte bit.But if a reward is found, inject dopamine, which will increase that strength a lot.
you can see, in .m files in that folder, that you can select one synaptic and increase only that by injecting dopamine to the hole network.
Which, in my opinion is all you need to do reinforcement learning and make AGI.
Ok, so help my understand the underlying model if you would. Or, if you have a link to a reference that gives an overview of the terminology and model that would be good. Trying to understand the math when I'm not sure I understand the underlying models is difficult. (I can read the math, but don't understand what the variables are referring to).
With a "leaky integrator", I assume each neuron is modeled only as a device with that has one (and only one) dynamic state which is it's internal voltage level. Is that internal state for each neuron what the word "membrane voltage" is referring to?
So when that voltage exceeds a threshold, the neuron "fires" and sends a pulse to all the downstream connected neurons. is that threshold just a constant for the entire net or does the threshold needed to fire change dynamically for some reason?
Is the energy of the pulse and the shape of the voltage spike that reprsents the pulse modeled in any way? Is the propagation delay of the pulse through a 3D network model in any way? Or is it just treated as an instantaneous event which affects all downstream neurons as a function of the synaptic weights? I believe the answer is the later.
So if the voltage in the neurons are "leaky" it must decline over time. How is that modeled? Defined simply with a differential equation, or are they modelling it as current flow though a resistance (or both).
There is mention in the paper of the voltage being reset to -1mv after the fire, but that it will rise (not leak) up to "resting potential" value. What is the resting potential value? 0? and how is this effect modeled?
And how does the output spike of one neuron, change the voltage of the downstream neurons? How is the synapses and the effect of the downstream neuron modeled? Does it create a finite amount of current into the neuron for a given period? Or does it just magically cause a finite positive jump in the neuron voltage relative to the synapse strength?
The paper uses terms like "soma potential". What's a soma? What's the "soma potential" referring to? equation (3) defines the "soma resistance". What resistance is that referring to?
So, equation (4) is in reference to "The other influence to an output neuron comes from the somatic synapses feeding directly or close to the soma". And (4) "governs" "post-synaptic current". What is that all about? Is this maybe negative effects that govern lateral inhibition between neurons?
Also, there is reference to the term "Dirac ? Pluses". What's that?
So, since there are about 10 terms being thrown around here which I'm not sure of their meaning, it's hard to follow the math. Could you give me a quick overview of the model and what these terms mean? Or a link to a paper or web site that would give me that overview?
When I look at the level of computational complexity they talk about in the paper, where they are doing simulation with a cycle time of .2 ms, or 5000 steps per second, though the entire network, I know this is massively over complex, and that we should be able to greatly simplify the simulation to get the same learning results, with 1000 times less cpu work.
Simulating networks like this is educational and required for us to gain a good understanding of how the brain works. But once we understand the core principles that allow it to work, we don't need to keep simulating the complexity.
This would be like building an electronics simulator that simulates all the voltage and current flows in a digital AND gate, that would have to run at a million cycles per second, and calculate all 100 or so separate current and voltage flows in the circuit, just to simulate the operation of a binary AND gate.
Once we understand the operation of the AND gate, we can stop simulating the complex ways the voltage and currents change over time, and just do the AND operation with 1 and 0 value and save massive amounts of wasted CPU work.
The same will be true for these neural networks. Once you understand why it' works, we will be greatly simplify the computational work to get the same learning effects.
Once I understand the simulation, I'll be able to demonstrate exactly what I'm talking about by writing code that is 1000 times faster, but still able to learn the same way.
You have asked lot of questions that I don't really have an answer.I will try to answer to the best of my knowledge.
"The first layer consists of simple leaky integrate and fire neurons" Wikipedia explain this very well http://en.wikipedia.org/wiki/Biological_neuron_model#Integrate-and-fire Here a paper, that also shows how to simulate it in python. The electrical potential energy difference between the cell and its surroundings, which is observed to sometimes result in a voltage spike called an action potential which travels the length of the cell and triggers the release of further neurotransmitters. The voltage, then, is the quantity of interest and is given by Vm(t).This voltage is also called membrane voltage.
The spiking events are not explicitly modeled in the LIF model. Instead, when the membrane potential Vm(t) reaches a certain threshold (spiking threshold), it is instantaneously reset to a lower value Vr (reset potential) and the leaky integration process described starts again.This threshold is a constant for the hole net. http://en.wikipedia.org/wiki/Threshold_potential.I bealive in this paper, the threshold is a constant 10mv for the hole net.
You are right, it just treated as an instantaneous event which affects all downstream neurons as a function of the synaptic weights.There are models which considers the shape of a neuron Wikipedia that we can experiment with.
The voltage of the neuron are "leaky" because if it wasn't,If the model receives a below-threshold signal at some time, it will retain that voltage boost forever until it fires again. This characteristic is clearly not in line with observed neuronal behavior.In the leaky integrate-and-fire model, the problem is solved by adding a "leak" term to the membrane potential equation, reflecting the diffusion of ions that occurs through the membrane when some equilibrium is not reached in the cell.
I think the resting potential in the paper is set as -1mv.
From wikipedia, In most neurons the resting potential has a value of approximately −70 mV. The resting potential is mostly determined by the concentrations of the ions in the fluids on both sides of the cell membrane and the ion transport proteins that are in the cell membrane. Also from wikipedia,Most often the threshold potential is a membrane potential value between –40 and –55 mV.
I don't know why the values in this paper differs from the one in wikipedia.
Soma, is the greek word for body.Then soma potential, is the same as membrane potential. And soma resistance,is referring to the leaky term.
So usually, one nueron axon has a synapse with the dendrite of another neuron.But, sometimes, an axon has a synapse directly into the cell body of other neuron.We need different equations to model both.
Lateral inhibition is the capacity of an excited neuron to reduce the activity of its neighbors. Lateral inhibition disables the spreading of action potentials from excited neurons to neighboring neurons in the lateral direction. This creates a contrast in stimulation that allows increased sensory perception. It is also referred to as lateral antagonism and occurs primarily in visual processes, but also in tactile, auditory, and even olfactory processing.
I bealive that the dirac filter means, that the spikes are considered instantaneous [(wikipedia)] (http://en.wikipedia.org/wiki/Dirac_delta_function).
I agree that this take a lot of cpu use.This is the nature solution to the problem, there is probably a better solution, but we can start looking for it, only after understanding how this work.
I know, there are some answered questions.Let's research that.Also, let me know if I haven't been clear in something.
Those answers do a good job to get me started on understanding the equations and the underlying model.
On the CPU usage issue, I'll explain my point. These neuron models only affect each other when they fire. The only thing that changes between the firing events, is the exponential decay (leak) of the cell potential voltage (if I understand the model correct). If we use a small time step simulation, all we are doing is wasting 100 cycles between firing events simulating an exponential decay function that can be directly calculated in one simple formula instead of stimulated with 100 steps. So, all that actually needs to be simulated, is the firing events themselves and all the other steps can be eliminated. So instead of structuring the code as a time-step simulation loop, it can be re-structured to be a pulse event queue processor. So each pulse event is represented as a neuron that will fire, and a timestamp of when it will fire.
So if as in the paper, we want to have a constant pulse stream injected into the network, with a 100 ms constant spacing, for input neurons 1, 5, and 8, we can start the simulation off by placing the first 3 pulses into the event queue to be processed. That is, neuron 1, at time .1, neuron 5, at time .1, and neuron 8, at time .1.
The simulation code then simulates the firing of neuron 1, by resetting it's voltage, and then following it's output links and adjusting each of the downstream neuron voltages. If any of them exceed the firing threshold, they are all then added to the event queue to be processed next.
Each neuron however must maintain a time-stamp for the last time it's internal voltage was updated. So if a neuron started with an internal voltage of 20 mv, at time 0, and we are processing a pulse at time 0.1, then we must calculate the new voltage that results after 100 ms of "leak". Which is some sort of e^-t form of exponential decay function.
So the code loops, processing all pulses in the code, until there are not more pulses to process and then it stops. It returns to the input pulse generating system, which then places 3 more pulses, into the queue, with a time stamp of 200 ms.
For a small single layer network like the one in the paper, this will be a savings, but only maybe a factor of 10 or 100. And with a network this small, the inefficient step-wise approach is still so fast on modern systems, that it makes no difference that we are wasting so much CPU. But, when we expand to larger networks, with millions of neurons, it makes a huge difference. Not only does it eliminate all the extra steps simulating the voltage decay, it means that for a given pulse injected into the network, large portions of the network aren't affected at all. So with a million node network, one pulse might only touch 10,000 neurons as it propagates through the network. So to process that one pulse, only requires the updating of 10,000 neurons, instead of all 1 million. So between eliminating the updating of voltages just to simulate leakage when no neurons are firing, and the ability to ignore large sections of the network that are idle at any instant, the savings can add up to a factor of 1000 or more in CPU usage. The actual savings is heavily dependent on fan-out factors, topology, and input pulse density patterns. Which means we can run a network, 1000 times larger, by using event queue processing instead of stepwise simulation.
Likewise, when we run a network with stepwise simulation, the step-rate sets the limit of the temporal resolution of the spikes. So if we use a 1 ms step rate in the simulation, the temporal resolution of the spikes are limited to 1 ms (1/1000th of a second). This defines, and limits, the amount of information that is able to flow into the network with each spike. So when an input has an average spike rate of 10 per second, then the amount of information that can travel with each spike is about + or - 100 steps per spike, or about 6.6 bits (log2(100) about 6.6), times 10 per second, so 66 bits per second maximum flow of data through each neuron in the network.
But with event queue processing, the temporal resolution of the spikes no longer needs to be limited to the resolution of the simulation time step. It can be a 64 bit floating point real number for example. So input spikes don't just happen at a timestamp like 12.002 seconds, but rather, at 12.0024847746 seconds. Which gives us closer to 64 bits of temporal information per spike, instead of 6.6 bits. This allows the injection of much higher resolution data into, and throughout the network, with far fewer spikes. It matches the information carrying capacity of the network, to the innate floating point processing resolution of the hardware.
So, in real world terms, lets say we are trying to inject video data into the network at 30 frames a second. We can encode one pixel, with a single spike. But if a single pixel (one color), is 8 bits, and we try to inject 30 of these a second, though a single neuron, we need 240 bits of data flowing into the network each second per pixel. But with the 1 ms time step, we are limited to around 66 bits per second if we keep the spike rate average down to 10 per second. To try and squeeze in that much data, we would need to go to a spike rate so high, that the simulation would probably fail (a spiked every 5 simulation steps maybe?), or more likely, we would have to go to a much smaller step rate (.1 ms or .01 ms), which explodes the real time CPU requirements.
But with spike event queue processing, where the timestamp of each spike is a full floating point real number, then each pixel is easily encoded with one spike, and we end up with 30 spikes per second and code that is running only 30 simulation "cycles" a second, instead of 10,000 stype-wise simulation cycles per second to carry the data.
Whether it actually saves CPU depends greatly on the full structure of system, but it in general, is far more flexible, and CPU efficient.
My point of mentioning all this now, is that if the intent is to develop an algorithm that can operate a real time robot (which has always been my personal interest), efficiency becomes key to the success of the approach and the algorithm. If the code becomes structured in this way, it's also useful, to see if there are ways to simplify the actual math and behavior of the neurons, to make this code run faster. If, for example, the differential equations used are too complex to have a direct solution that can be calculated, then this approach isn't even possible, until a simplified version of the equations is found first, that creates the same learning ability for the network.
So though there is no need to change this project to this structure code, it's good to keep this potential structure in mind, as changes to the algorithm are explored.
I really appreciated you took the time to explain us, with details about this spike-driven approach. I think is brilliant, and is definitely worth implementing. We can start developing this right now.And make this net, try to learn to differentiate the dictionary proposed in the gupta paper. Do you have any already written code we can adapt for this purpose?