Risks in Agentic systems are underappreciated. These risks are layered and have different sources and therefore different owners. Materialised risks can be thought of as the fruit of the Risk tree.

The Model
The model determines the first level of risk. It is the root of the tree. Often model providers describe knowledge and behavioural risk parameters for their models.
Knowledge risk includes model not having the right knowledge (e.g., used for advanced data analysis but model is bad at maths). Behavioural risk comes from task misalignment (e.g., using a generic model for specific task).
Risk Owner: Development team selecting the model.
Agent
The way we interact with the model adds the second layer of risk. The key decisions include how the agent reasons and decides, where we use self-reflection, and guardrails. Agents with complex internal architectures lead to difficult to debug behaviours and to detect issues.
Risk Owner: Development team selecting the architecture.
Multi-agents
As we start to connect agents together we add the third layer or risk. Chain effects start to creep in where individual agents may behave as required but the sequence of interactions leads to misalignment. If we have dynamic interactions then potentially we have a situation where not only can we get unpredictable behaviour but also find it impossible to reproduce.
Risk Owner: Development team creating the multi-agent system.
Data
Data is the one big variable and therefore adds the fourth layer of risk. The system may work well with some samples of data but not with others or data drift may cause the system to become unstable. This level of variability can be quite difficult to detect.
Risk Owner: Data owner – you must know if the data you own is suitable for a given project.
Use-case
The final layer of risk. If the use-case is open ended (e.g., customer support agent) the risk is higher because we will find it difficult to create tests for all eventualities.
Given the functionality is defined mostly using text (prompts) and less using code we cannot have an objective ‘test coverage’ associated with the use-case.
We can have all kinds of inputs that we use as tests but there is no way to quantify the level of coverage (except for that narrow class of use-cases where outputs can be easily validated).
Risk Owner: Use-case owner – you must know what you are putting in front of the users and how can you make it easier for the good actors and difficult for the bad actors.
Examples
Let us explore the risk tree with a real example using Google’s Agent Development Kit and Gemini Flash 2.0.
Use-case is a simple Gen AI app – we are building a set of maths tools for the four basic operations. LLM will use these to answer questions. This problems allows us to address the the following layers of the tree: Use-case, Data, and Single Agent.
The twist we add is that the four basic operations are restricted to integer inputs and outputs. Integer restriction is for governance that can be objectively evaluated.
Version 1
Prompt v1: “you are a helpful agent that can perform basic math operations”
Python Tool v1:
def add(x:int, y:int)->int:
"""Add two numbers"""
print("Adding:", x, y)
return x + y
Output:

In the above output the blue background is the human user and the grey is the Agent responding. We see the explicit type hint (integer) provided in the tool definition is easily overcome by some simple rephrasing – ‘what could it be?’.
The LLM answers this without using the tool (we can trace tool use) thereby not only disregarding the tool definition but also using its own knowledge to answer (which is a big risk!).
Issues Discovered: LLM not obeying tool guardrails and loosing its (weak) grounding – risk at Data and Use-case levels.
Version 2
Prompt v2: “you are a helpful agent that can perform basic math operations as provided by the functions using only integers. Do not imagine.”
Python tool remains the same.
Output:


We see that with these changes to the prompt it refuses to solve a question that does not have integer parameters (e.g., 14.4 + 3). But if you try a division problem (e.g., 5 / 2) it does return a float response! Once again ignoring the tool definition which clearly states ‘integer’ as a return type. Not only that, with some confrontational prompting we can get it to say all kinds of incorrect things about the tool.
Issues Discovered: Firstly the tool does not return a dictionary as can be clearly seen in the definition. This is probably the Agent Framework causing issues where internal plumbing may be using a dictionary. This is the risk at the Agent Architecture level.
Secondly with confrontational prompting we can break the grounding especially as with Agents and increased looping certain messages can get reinforced without too much effort. Once a ‘thought’ is part of a conversation it can easily get amplified.
Version 3
Prompt remains the same.
Python Tool v2:
def divide(x:int, y:int)->int:
"""Divide two numbers and return an integer"""
print("Dividing:", x, y)
if y == 0:
raise ValueError("Cannot divide by zero")
return x / y
We change the description of the tool (which is used by the LLM) to explicitly add the guidance – ‘return an integer’.
Output:

Even with additional protection at the use-case level (Versions 2 and 3) we can still get the Agent to break the guardrails.
At first we give it an aligned problem for divide – integer inputs and result. Everything works.
Next we give integer inputs but not the result. It executes the divide, checks the result, then refuses to give me the result as it is not a float. This is an example of the partial information problem. It doesn’t know whether the guardrails are violated or not till it does the task. And this is not a theoretical problem. Same issue can come up whenever the agent engages external systems that return any kind of data back to the Agent (e.g., API call, data lookup). The response of the agent in that respect is not predictable beforehand.
The agent in this example runs the divide function finds the result and this time instead of blocking it, it shares the result! Clearly breaking the guidelines established in the prompt and the tool but there was no way we could have predicted that beforehand.
Issues Discovered: This time it is a combination of the Agent, Data, and Use-case risks are clearly visible. Plugging existing gaps can create new gaps. Finally, when we are bringing in new information it is not possible to predict the agents behaviour beforehand.
Results
We can see even in such a simple example it is easy to break governance with confrontational prompting. Therefore, we must do everything we can to educate the good actors and block the bad actors.
I will summarise the result as a set of statements…
Statement 1: Agentic AI (and multi-agent systems) will not get it right 100% of the time – there will be negative outcomes and frustrated customers.
Statement 2: The Government and Regulators will need to change their outlook and organisations will need to change their risk appetite as things have changed as compared to the days of machine learning.
Statement 3: The customers must be educated to ‘trust but verify’. This will help improve the outcomes for the good actors.
Statement 4: Automated and manual monitoring of all Gen AI (100% monitoring coverage) inputs and outputs – to block bad actors.
Code
Google’s ADK makes it easy to create agents. If you want the code just drop a comment and I will share it with you.




























