Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lazy-growing synapse graph #21

Open
floybix opened this issue Jun 27, 2015 · 5 comments
Open

lazy-growing synapse graph #21

floybix opened this issue Jun 27, 2015 · 5 comments

Comments

@floybix
Copy link
Member

floybix commented Jun 27, 2015

HTM model creation can be extremely slow. The time goes into creating the huge proximal synapse graphs containing all potential connections.

The problem of explicitly representing full potential synapse graphs is more acute in higher level layers because their input -- from cell layers -- is extremely sparse: column activation of 2% with depth 20 = 0.1% (except when bursting). With such sparsity, each column needs a lot of synapses in order to reach a reasonable stimulus threshold: to reach 5 active synapses, an average of 5000 random synapse connections are needed.

This is mitigated to some extent by the learning mechanism which can grow additional synapses directly to the active inputs (same mechanism as on distal dendrite segments), but we still need a reasonable degree of initial connectivity to activate columns in the first place.

First proposal

  • Lazy creation of the proximal synapse graph: synapses are only created upon the first activation of each source bit.
  • That would be equivalent to the current behaviour except that lazy synapses would not be decremented until they come into existence.
  • We could bias the new synapses towards neglected columns, achieving boosting and also partially adjusting for the above point.

Second proposal

Lazy creation would only happen while previously unseen input bits continued to appear. But random growth and death of synapses could also continue indefinitely (either eagerly or lazily), giving a boosting effect.

@cogmission
Copy link
Member

I know Jeff once described it as having the consistency of tapioca, but are there any papers which describe biologically what happens with interregional communication that could perhaps provide a hint?

Sent from my iPhone

On Jun 26, 2015, at 11:42 PM, Felix Andrews [email protected] wrote:

HTM model creation can be extremely slow. The time goes into creating the huge proximal synapse graphs containing all potential connections.

The problem of explicitly representing full potential synapse graphs is more acute in higher level layers because their input -- from cell layers -- is extremely sparse: column activation of 2% with depth 20 = 0.1% (except when bursting). With such sparsity, each column needs a lot of synapses in order to reach a reasonable stimulus threshold: to reach 5 active synapses, an average of 5000 random synapse connections are needed.

This is mitigated to some extent by the learning mechanism which can grow additional synapses directly to the active inputs (same mechanism as on distal dendrite segments), but we still need a reasonable degree of initial connectivity to activate columns in the first place.

First proposal

Lazy creation of the proximal synapse graph: synapses are only created upon the first activation of each source bit.
That would be equivalent to the current behaviour except that lazy synapses would not be decremented until they come into existence.
We could bias the new synapses towards neglected columns, achieving boosting and also partially adjusting for the above point.
Second proposal

Lazy creation would only happen while previously unseen input bits continued to appear. But random growth and death of synapses could also continue indefinitely (either eagerly or lazily), giving a boosting effect.


Reply to this email directly or view it on GitHub.

@floybix
Copy link
Member Author

floybix commented Dec 31, 2015

Boosting causes representations to be unstable, and to the extent they are unstable they are meaningless. I usually turn it off. I wonder if instead we could use the mechanism that we have for distal synapses (selecting winner cells in a column), but applied to proximal synapses (selecting columns):

  • Set a stimulus threshold of, say, 10 proximal synapses, that will indicate clearly recognised patterns. The top 2% of columns become active if they matched up to the stimulus threshold.
  • If no columns matched up to the stimulus threshold (or less than 2% did), choose random columns and have them grow new proximal synapses.
    • Actually, first check for matches on disconnected synapses, and give those matches priority. That gives the stability necessary for tentative synapses to be reinforced.
    • Column matches are by the number of active connected proximal synapses, but could also include predictive cell depolarisation (Fergal's "prediction assistance").
  • In this scheme we don't need to initialise the HTM with a million proximal synapses, just start empty like we do with distal synapses. So fast start up. But running would be slower. Maybe a lot slower.
  • There is a problem. Partial matches (below the stimulus threshold) are ignored. If we set a low stimulus threshold, previously matched columns would be adapted a lot, to anything remotely similar, losing discriminability. If we set a high stimulus threshold, each new stimulus gets a unique representation, but we fail to represent the similarity between them.
    • One solution: select a fraction of the columns as partially-matching ones in preference to random ones.
    • Actually this problem applies to cell selection in a column too!
  • For local topographic connections, only grow within a radius. And consider that when selecting random columns.

@mrcslws
Copy link
Collaborator

mrcslws commented Dec 31, 2015

Thinking of a tall hierarchy, it's interesting to think about how this would change things. Starting with an untrained model, the first region would start activating. Then the second. Then the third. And so on.

Currently, with random initial connections, the entire hierarchy might light up on the first input. Every region will do proximal/distal/apical learning right from the start, shaping a pile of random connections into something meaningful. With this new approach, it'd be more of a blank slate.

@floybix
Copy link
Member Author

floybix commented Jan 13, 2016

Thinking of a tall hierarchy, it's interesting to think about how this would change things. Starting with an untrained model, the first region would start activating. Then the second. Then the third. And so on.

Actually that's not obvious to me. I thought all layers should activate cells even if they don't have pre-existing proximal synapses - that is, even if those columns/cells are chosen randomly (and will then grow new synapses).

Maybe you mean, should we grow proximal synapses to bursting cells, or only to predicted cells? I'm leaning toward the former, given that first-level layers do not have predicted input (only sense input), but they still grow proximal synapses. The learning rate to predicted cells could be higher though.

On the other hand, it may not make much sense to learn a bursting signal since once the stimulus is learned/predicted in a lower layer it will have a different representation. But I think that could be OK if we have a low/slow learning rate. This paper (via Joseph Rocca) describes cortex as slowly learning to capture statistical properties of the world, in contrast with, and complementing, Hippocampus learning much faster: http://psych-www.colorado.edu/~oreilly/papers/OReillyRudy00_hippo.pdf

@floybix
Copy link
Member Author

floybix commented Feb 2, 2016

My experiments so far have shown that it is fatal to grow new proximal synapses directly to active sources. It results in column sets taking over -- masking -- multiple inputs particularly if there are subset / overlap relationships between inputs. I guess a solution would be to enforce a unique sub-sampling of "potential synapses" on each column; i.e. some sort of local topographic radius, even if the inputs are not meaningfully topographic: even if the inputs bits are in fact randomly shuffled.

Here's a completely different approach to the problem of boosting / decorrelating representations. Leabra's XCAL BCM rule is based on comparing the short term and long term average activations to apply a homeostatic stabilisation:
https://grey.colorado.edu/CompCogNeuro/index.php/CCNBook/Learning/Leabra

the BCM contrast or normalization is all about the receiver long-term average activity y_l, with the sending activity serving as the "conditioning" variable -- you only update the weights if the sending unit is active, and conditioned on that, compare the current receiver activity relative to the long-term average.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants