I am not sure how to attribute the above saying but I read it on a Spinnaker SpongeBob SquarePants special edition watch.
It resonated with me because of the Agent Long-term Memory problem.
The Agent Long-Term Memory Problem
Human memory system supports remembering/recalling. This makes memory less like data and more like a function operating on data. The memory is never really available to us as a whole (unless we focus on a narrow slice of it or possess a photographic memory).
Example: you met your friend for lunch.. you will not remember each and every moment of that meeting but you will recall certain facts like what you ate, where you met but beyond narrow facts there will be big gaps (e.g., whether you took still or sparkling water). You will also remember certain other facts but not completely – e.g., what colour shirt they were wearing.
The whole process is about converting a moment we have experienced into a networked node that is explicitly tied to other moments through a subjective and objective value chain. This network changes over time as we experience new moments in our lives. Nodes are compressed, connected, and discarded.
In the example above that would be the name of your friend, their life state (closer the friend bigger the network associated with them as more you know about them).
Attention Mechanism Associated with Long-term Memory
There are at least two attention mechanisms at play here… what you were focussing on when you experience the moment (the context of the moment or attention at write) and what you are focussing on when you are attempting to recall the moment (the context of the recall or the attention at read).
The duality of this process is what I call the Agent Long-term Memory Problem.
Typically, in ‘Agentic Memory’ literature (excluding the ‘agents need memory’ type of articles) we find three types of memory being considered:
- Procedural – ‘how to carry out a task’, what worked well for a particular process and what worked well for a particular customer for a given process. There is a degree of personalisation in the latter.
- For example, what worked well when I was successful in preventing the customer from churning and what worked well when the last time I successfully prevented John Smith from churning.
- Episodic – ‘sequence of events and what they mean’, this is the most common example in current literature. The concept is to stitch together a sequence of interactions into a cohesive whole to allow for a warm start.
- For example, to continue customer on-boarding journeys, or to ‘predict’ the reason for the customer to contact us for support.
- Factual – ‘recalling generic facts (semantics) and specific facts (declarative)’, this is the most commonly confused aspect in current literature. The concept here is to recall factual information about an entity (e.g., customer, product, journey etc.).
- For example, recalling that the customer John Smith likes to be called John or the fact that a premium subscription costs £10 per month or that SLA for account unblock is 24 hrs.
Then there are two types that we find are absent:
- Prospective – ‘what must be remembered for the future’, this is about remembering to carry out a task in the future when certain time/space condition is met.
- For example, agent must remember to send a message when the interest rates go down (space) or after 6 months (time) because the customer mentioned ‘the interest rates are too hight’ or ‘I have recently changed my job and have a 6 month probation period’.
- Implicit – ‘what I remember but don’t know I remember it’, this is the most interesting one for me. This is about the effortless recall (especially associated with procedural memory) that allows us to do mundane tasks. This is critical for efficient use of AI for low value but high criticality tasks.
- For example, I know how to ride a bicycle and I do not need to strain to remember it as I may strain to remember my passwords. Same way an AI model must remember what ‘civil’ behaviour is and we need not spend precious space in the prompt instructing it to be a ‘helpful assistant’ or for it to ‘not make up information’.
But there is limited mention of the two attention mechanisms at play.
Keep It Simple: Agents and Attention
Treating memory like a database is the first anti-pattern. A database has perfect recall as once you find the required record you will get exactly what was stored – not a version of it nor a mixture of related but not relevant results nor a summary.
For AI agents we have a rather helpful software layer that can store the moment. Then the moment can be recalled perfectly but then processed into traces required for the use-case that focus attention on (or away from) specific topics.
A trace can be thought of as a data item created from a raw moment by application of some kind of attention mechanism (attention at write). This data item then can be used by AI for further processing (attention at read). In between the write and the read there is the recall (see next section).
Humans do this all the time, We have lots of ways of perfectly recording a moment thanks to our smartphones but where we point our camera is attention at write. When we review a video we took we get to pay attention to different aspects (attention at read) and create new traces that we may choose to use in the future. In between is the recall where I look for an old video to view it.
As an example, the other day I was looking for a photo of a receipt to check the name of an item. I knew the date therefore it was easy to find (lookup). When I found it I realised my attention at that time had been on the bar code and the total therefore I had missed out the full receipt!
This changing focus to generate a trace is context driven and closely aligned with the use-case and the stage within the use-case. There may be some general traces (e.g., customer name, time of day) that we will always need to recall but these are expected to be a small proportion of the traces needed.
The same principle can be applied when recalling a trace. Note the use of the word ‘can’ because it is not mandatory. It depends on the specificity of the trace. If the trace is a single fact (e.g., does the user own a house) then those traces can be recalled as a default.
If the trace is complex (e.g., a conversational chunk where the user spoke about their financial situation) then we may wish to use the built in attention mechanism of a LLM and a prompt to focus on specific aspects to generate a specific trait (e.g., how much is their current income) and store that. This would be a perfect example of attention at write tuned by the instruction prompt.
Think of it like building a Customer 360 record which has different sections.. some really precise key-value type others that are more descriptive (e.g., free text box) and therefore require context aware attention based processing.
This can then be used via a LLM (attention at read) tuned by the instruction prompt to focus on different aspects as required by the use-case and stage of interaction.
Principles for Memory Implementation
Memory is storage and remembering what is stored is what we are really interested in.
Remembering can be implemented as a deterministic lookup or as compute.
Databases use a mix of deterministic and light-weight compute (no ML) to remember precisely what was stored. Deterministic is lookup by value (e.g., find me all rows where name = John and surname = Smith). Compute is lookup by a computed value (e.g., ID 123 hashed to get the bucket where the full record can be found).
Any vectorised retrieval (so called semantic retrieval) relies on medium-weight compute because we use an embedding model to retrieve a vector that represents the input text in a high dimensional space. This vector is then used to lookup its neighbours via a distance calculation.
Any questions asked to a LLM uses heavy compute. Think about how a LLM answers a question like ‘What is the capital of Italy?’. That particular fact is ‘stored’ deep inside the model somewhere. As we pass our question and it flows through the model the fact is looked up and churned out in the response. This is pure compute – no lookups.
Heavier the compute gets more difficult it is to scale as more resources are required.
Create Traces Not Summaries
Focus on capturing traces generated by paying attention to specific lens captured as tags (see context identifier tags). This ensures whatever is ‘remembered’ is specific for application that is consuming it rather than attempting to build a one-size-fits-all trace (a.k.a. ‘Summary’).
If using LLMs for attention at write make sure there are context tags associated with the instruction prompts being used with the LLM. The tags can be human or LLM generated. This links the generation process with the generated trace.
Use Context Identifier Tags
Create a tag cloud around the generic traces to identify use-case specific context and to enable lookups (e.g., customer ID, product tag, topic clouds). Make this extendable so the same trace can be tagged with different context identifiers for reuse.
The context identifier tag then helps provide a signpost for the next lookup with the same/similar context. This reduces the weight of the lookups with extreme convergence to a database style lookup based on tag matching.
Connect Traces
Where you can use LLMs or humans to start connecting traces do it. This will allow concepts to be correlated ensuring related memories are retrieved. The link properties can describe whether the link is a mandatory one or not.
For example, when I retrieve the memory of the day today I will remember the name of the colleagues who I worked with. Or the office I worked at.
But Don’t Create a Mess
Context is good but only the right amount of it. Too many connections can lead to confusion and difficulty in maintaining the trace network.
Here we can leverage graph complexity metrics (a deep topic in itself) starting with simple Edge/Vertex counts/ratios to more complex ones.
Guiding principle is know just enough about your customers as required to complete the journeys you are offering through AI. Your agent doesn’t need to be their best friend. If most of the data sits in your existing CRM as a structured data item then why do you need a separate ‘customer memory’? Structured data is a precise trace consider extending that with specific attributes (which you may use LLMs to extract from a conversation and populate).