-
Notifications
You must be signed in to change notification settings - Fork 21
Dopamine Paper
On, you thoughts of adding the effects of dopamine to create reinforcement learning...
In addition to the entire network needing to be biased by a global reward signal, there must be internal rewards passed from neuron to neuron as well to create secondary reinforcement effects. The neurons must train each other.
The secondary rewards are where we learn to assign value, to objects and actions that our perception system can recognize. Such as a piece of paper called money.
So, the innate primary rewards, are the effects that cause us to learn, when we receive pain and pleasure. We trip and fall, and bash our knee into hard concrete pavement, and that hard wired pain sensor in our skin and knee generates a global reward signal that shapes the network.
But the primary point of "shaping the network" with that reward signal is NOT to change our behavior directly, but rather, to train the network, to become an accurate pain and pleasure PREDICTION system.
So when we are around a hard concrete sidewalk, our brain in effect "remembers" the pain it caused us in the past. And that "memory" is the brain's estimated prediction of us receiving pain today.
So, we visit the sidewalk 100 times, and receive no pain, but once we fall and get 10 units of pain, so the average expected pain over time becomes 1/10 of a pain unit for the "concrete sidewalk". So, the brain learns that being around the sidewalk is a risk of 1/10th a unit of pain.
This ability of the network to predict the danger of being around a sidewalk, is what does most our behavior training over time. So when we learn to act, so as to avoid the sidewalk, and the brain has predicted it's saver to walk on the grass, the behavior of walking on the grass instead of on the sidewalk, would then emerge from the system over time.
So the one fall, that caused real pain, becomes a learning signal that shapes all behavior around a sidewalk, for the next year.
So, one way or another, the network must implement secondary learning effects. It must use the primary reward events, to make the entire network at as a reward prediction system, and it's ability to constant predict rewards, is what does most the shaping of our actual behavior over time.