Epistemic Status: Far from certain and mostly speculation, but it does make sense.
Recently, I was pondering how continual learning works in the brain and realized that the interaction of our brain’s continual learning mechanisms with the hippocampal memory system would naturally explain a lot of the weirdness about how human memories work. Although, as I discussed in my TED talk, long term memory and continual learning are both vital systems the brain has that are missing in current AI systems, I have recently realized the very obvious fact that there is a fundamental tension between them. Long term memories have to be able to store the traces or paths to distributed neural representations over long periods of time. However, the entire function of continual learning is to keep updating and relearning these precise representations, thus slowly causing the stored memories or paths to the memories to drift out of date. This means there is a deep trade-off – the more continual learning that you do, the harder it is to maintain consistency and availability of old memories. It is possible to try to design some kind of synchronization system that regularly updates all memories to keep them in sync with the drifting representations, however this eliminates the main computational advantages of long term memory storage – that you can largely leave it alone unless needed. Having to constantly refresh all of your memories imposes an increasingly large burden as the memory size gets bigger.
This problem is actually more general and occurs in any system continually learning system with modularity. If you have two systems that try to communicate by sending representations to one another over some relatively low-bandwidth channel, then naively, if the underlying representations of these systems drift then they will become less able to talk to one another and so eventually the crucial patterns of communication will break down. It would be like trying to communicate with another person while both of your languages are evolving at the same time.
The brain is both modular and also performs continual updating of its own representations. So how does it solve this problem? I obviously don’t have a good answer and this is something that seems surprisingly un-investigated in neuroscience research. My speculations are that the brain does this through a combination of methods such as:
1.) If the systems regularly communicate, then although they will naturally drift over time, there is also the pressure that they successfully continue to work together. Any kind of end-to-end optimization system will be able to propagate the required updates through the communication channel. Even if the optimization is not fully end-to-end through the channel, some kind of local consistency objective could be used to try to keep them in sync.
2.) Once established, the interface between the two regions can be frozen, and then each independently updating subsystem will have to learn an encoder/decoder to map its representations, whatever they may be, to the frozen representations of the communication channel. This freezing does not have to be absolute but simply mean that the interface updates at a slower timescale than the regular representation updating. This is exactly how human language works. When two people speak to one another, they each are continually learning agents with their own idiosyncratic internal representations that the other cannot parse. However, there is a shared interface (e.g. words, syntax etc) which is defined by arbitrary convention and ‘frozen’ at least at the timescale of a conversation. Internally, to use language, each person must construct their own encoder/decoder which can map their representations to and from known units of the communication interface.
3.) Some kind of regularization to prevent too rapid reorganization and hence forgetting of existing knowledge. This is important in any case to prevent catastrophic forgetting, so the brain must have a general mechanism to control this.
This idea of having a frozen interface which lets continually updating representations talk is likely a common motif throughout the brain, and seems to be the real reason behind the commonly observed ‘critical periods’ in development. Almost all senses appear to have their own critical periods and there are likely other critical periods for more subtle developmental stages. It has long puzzled me why the brain does this vs just keep high plasticity always, since it would seem to be obviously advantageous not to be stuck with some bad representations forever after the critical period ends, as sometimes happens. The necessity to keep a complex modular system of brain regions functional in an online continual learning system thus provides a good explanation for why such periods are necessary. This also helps with catastrophic forgetting. The ‘critical period’ establishes the interfaces and the core representations and then the only updates that can happen are smaller tweaks to the base representations and to keep the encoders/decoders up to date against the fixed interface.
If the brain is not optimized end-to-end, then this problem would become even more acute, since, for instance, changes to fundamental representations of e.g. how to see or hear could just completely break the rest of the complex representational pipeline built on top of these fundamental perceptual systems. This problem could happen quite naturally given the non-iid representations that all animals/agents experience. It would be extremely bad if, for instance, after spending a night in the darkness, your visual cortex overfitted to the sensations of darkness and forgot how to see when the sun came back up. However this is exactly what neural networks naively trained with high learning rate on non-iid data would do.
This combination of methods presumably works pretty well for most brain systems in that continual learning operates in most people with a surprisingly high successrate given the complexity of dynamics involved and how easy it would be for it to go off the rails. Nevertheless, for the long term memory system, this challenge is much more intense. Many memories are accessed rarely even over years, so online updating has much less opportunity to keep them in sync. Moreover, since memories store both keys and values, and the entire point of the memory is to be able to keep storing new information, neither the interfaces nor the memories can remain frozen. Certainly parts of the interface, such as the key retrieval system can remain frozen after some critical period, but the entire system cannot if the memory system ever wants to add new memories or update old ones.
This means that some set of trade-offs between retrieval ability, memory storage, and fidelity are necessary. This, I believe, then explains a couple of puzzling phenomena about human memory:
1.) Early childhood amnesia: This one is pretty obvious. Most of early childhood is during or before critical periods for various systems in the brain. This means that the brain is running at super high plasticity during this time so the interfaces are not frozen as they are later on in life. This means that the representations stored in early childhood memories as well as the keys required to access them have likely dramatically changed compared to those that exist in the post-critical period brain. This means that even if the memories are still ‘there’ in some sense, which there is some evidence that they are, they cannot reasonably be retrieved in the normal way, since the keys needed to access them are far from the regular distribution of representations that can query the hippocampus. This thus results in seeming amnesia from early childhood even though the hippocampus was active and storing memories the whole time. It also explains by what memories are accessible are so weird and disjointed, since the brain is effectively trying to ‘read’ something in a very different format than usual, and also why these memories are often accessible via associations with particular strong base sensations such as smells or feelings since the representations of these have plausibly maintained more consistent over time than other more abstract representations and so they can still function as partial keys.
2.) Unreliability and anachronism of longer term memories: Often memories of a long time ago contain subtle anachronisms. For instance in memories from long ago you often remember the participants as how they are now rather than how they were then (i.e. their current age vs much younger), or you remember your childhood car as your current car or a later one, especially when these things are incidental to the main ‘point’ of the memory. This is because the memory is stored not as sense-data directly but as abstract representations of the sense-data. I.e. you don’t remember every last detail of your experience but an abstract ‘pointer’ to a concept such as ‘this is when me and my friend did X’. The ‘my friend’ part here points to the general representation you have of them. However, this representation is continually being updated by your more recent interactions with them so that when the memory is retrieved and reconstructed, the brain seamlessly inserts its current representation into the old memory, causing the anachronism. Interestingly, this doesn’t happen in the few memories that happen to be highly salient and thus the specific sensory data is stored rather than the high level gist.
3.) Memories changing when accessed: this is likely part of the brain’s system for rolling updates to maintain consistency. To ensure that memories can still be accessed, the brain has to update their ‘value’ representatinos to be more like the current value system. Specifically, it will decode the memory, and then re-encode it with the current representations. This repeated re-encoding is not lossless and can (and indeed must) change the semantic content of the new memory, since it enforces consistency with the current representational scheme. This can introduce new innacuracies into old memories or even result in complete confabulations of events that never occured. This is again a fundamental consequence of the choice to encode memories as highly compressed representations rather than as raw sense data – while being much cheaper in terms of storage and allowing more abstract search and processing of memories, it means that the memories themselves are slowly lost and distorted as the representational structure changes.
4.) Why procedural memories/skills become ‘rusty’ and then improve dramatically back to their original point with a little practice. Plausibly, a similar mechanism plays out here. Your motor cortex stores a huge amount of procedural ‘patterns/memories’ of e.g. playing an instrument you learnt when a kid. When you come back to it, years later, you try to retrieve these ‘memories’ but the representations have changed in the meantime, even if only subtly. This means that not only are the ‘keys’ slightly wrong so it is harder to retrieve the exact memories that you want but also the stored ‘values’ are out of date. This means when naively replayed they no longer map to exactly the right actions. To fix this, you need to replay the actions and then quickly do a round of online RL to fix up the representations based on real-world feedback, then re-encode the memory pathways again so that the new keys and values are stored back again.