The focus in AI is all about Machine Learning – training larger models with more data and then showcasing how that model performs on tests. This is testing two main aspects of the model:
- Ability to recall relevant information.
- To organize and present the relevant information so that it resembles human generated content.
But no one is talking about the ability to forget information and to ignore information that is not relevant or out of date.
Reasons to Forget
Forgetting is a natural function of learning. It allows us to learn new knowledge, connect new knowledge with existing (reinforcement) and deal with the ever increasing flood of information as we grow older.
For Machine Learning models this is a critical requirement. This will allow models to keep learning without it taking more time and effort (energy) as the body of available knowledge grows. This will also allow models to build their knowledge in specific areas in an incremental way.
ChatGPT has a time horizon of September 2021 and has big gaps in its knowledge. Imagine being 1.5 years behind in today’s day and age [see Image 1].
Reasons to Ignore
Machine learning models need to learn to ignore. This is something humans do naturally, using the context of the task to direct our attention and ignore the noise. For example, doctors, lawyers, accountants need to focus on the latest available information in their field.
When we start to learn something new we take the same approach of focusing on specific items and ignoring everything else. Once we have mastered a topic we are able to understand why some items were ignored and what are the rules and risks of ignoring.
Current transformer models have an attention mechanism which does not tell us what to ignore and why. The ‘why’ part is very important because it adds a level of explainability to the output. For example the model can end up paying more attention to the facts that are incorrect or no longer relevant (e.g. treatments, laws, rules) because of larger presence in the training data. If it was able to describe why it ignored related but less repeated facts (or the opposite – ignore repeated facts in favor of unique ones – see Image 2 below) then we can build specific re-training regimes and knowledge on-boarding frameworks. This can be thought of as choosing between ‘exploration’ and ‘exploitation’ of facts.
Image 2: ChatGPT giving wrong information about Queen Elizabeth II, which is expected given its limitations (see Image 1), as she passed away in 2022.
Image 3: Asking for references used and ChatGPT responding with a few valid references (all accessed on the day of writing the post as part of the request to ChatGPT). Surely, one of those links would be updated with the facts beyond 2021. Let us investigate the second link.
Image 4: Accessing the second link we find updated information about Queen Elizabeth II that she passed away in 2022. If I was not aware of this fact I might have taken the references at face value and used the output. ChatGPT should have ignored what it knew from its training in favor of newer information. But it was not able to do that.
What Does It Mean For Generative AI?
For agile applications where LLMs and other generative models are ubiquitous we will need to allow these models to re-train as and when required. For that to happen we will need a mechanism for the model to forget and learn in a way that builds on what was learnt before. We will also need the model to learn to ignore information.
With ML Models we will also need a way of validating/confirming that the right items have been forgotten. This points to regular certification of generative models especially those being used in regulated verticals such as finance, insurance , healthcare and telecoms. This is similar to certification regimes in place for financial advisors, doctors etc.
Looking back at this article – it is simply amazing that we are talking about Generative AI in the same context as human cognition and reasoning!