Behaviours in Generative AI (Part 1)

While this may seem like a post about functions and testing in Python it is not. I need to establish some concepts before we can introduce Generative AI.

When we write a software function we expect it to be ‘well-behaved’. I define a well-behaved function as a function that is testable. Testable function needs to have stable behaviour to provide consistent outputs.

Tests provide some confidence for us to integrate and deploy a given function in production.

If the function’s behaviour is inconsistent resulting different outputs given the same input then it becomes a lot harder to test.

To explain this in a simple way, imagine you have a function that accepts a single integer parameter x and adds ‘1’ to the provided input (x) and returns the result (y) as an integer.

In Python we could write this function as:

def add_one(x :int) -> int:
    y = x + 1
    return y

Now the above function is easily testable based on stated requirements for add_one. We can, for example, use assert statements to compare actual function output with expected function output. This allows us to make guarantees about the behaviour of the function in the ‘wild’ (in production).

def test_add_one() -> bool:
    assert add_one(10) == 11
    assert add_one(-1) == 0
    return True

Introducing and Detecting Bad Behaviour

Bad behaviour involves (as per our definition) inconsistency in the input-output pairing. This can be done in two ways:

  • Evolve the function
  • Introduce randomness

Introduce Randomness

Let us investigate the second option as it is easier to demonstrate. We will modify the add_one by adding a random number after rounding it. The impact this has is subtle (try the code) the result is as expected some of the times. Our existing tests may still pass occasionally but there will be failures. This makes it complex to test the add_one function. The frequency of inconsistent output depends on how randomness is introduced within the function. Given the current implementation we expect the tests to fail approximately 50% of the time (figure out why).

def add_one(x :int) -> int:
    y = x + 1 + round(random.random())
    return y

Evolve the function

Assume we have a rogue developer that keeps changing the code for the add_one function without updating the documentation or the function signature. In this case for example, the developer could change the operation from addition to subtraction without changing the function name, comments, or associated documentation.

Testing Our Example

Given our function is a single mathematical operation with one input and one output, we can objectively verify the results. The inconsistent behaviour resulting from the introduction of randomness or changes made by the rogue developer will be caught before the code is deployed.

Testing Functions with Complex Inputs and Outputs

Imagine if the function was processing and/or producing unstructured/semi-structured data.

Say it was taking a string and returning another string, or it returned an image of the string written in cursive or the spoken version of the string as an audio file (hope the connection with Gen AI is becoming clearer!). I show an example below of a summarising function that takes in some text (string) and returns its summary (string).

def summarise_text(input_text :str) -> str:
    return model.generate([{"role":"user", "content": f"Summarise: {input_text}"}])

Such functions are difficult to test in an objective manner. Since need exact input output pairs, any tests will only help us validate the function within the narrow confines of the test inputs.

Therefore, in the above case we may not catch any changes made to such a complex function (whether through addition of randomness or through function evolution). Especially if the incorrect behaviour surfaces only for certain inputs.

Putting such a component into production therefore presents a different kind of challenge.

The Human Brain: The Ultimate Evolving Function

The human brain is the ultimate evolving function.

It takes all the inputs it receives, absorbs them selectively and changes the way it works – this is how we learn. The impressive thing is that as we learn new things we do not forget what we learnt before – the evolution is mostly constructive. For example, learning Math doesn’t mean we forget how to write English or ride a bicycle.

To mimic this our add_one function should be able to evolve and learn new tricks – for example how to deal with adding one to a complex number or for that matter adding one to anything. A generic signature for such a function would be:

def add_one(a: Any)-> Any:

It may surprise you to know that humans can ‘add_one’ quite easily to a wide range of inputs. Beyond mathematics we can:

  • add one object to a set of objects (e.g., marbles or toys or sweets)
  • add one time-period to a date or time
  • add one more spoon of sugar to the cake mix

Conclusion

So in this part of the series I have shown how well behaved functions can be made to mis-behave. This involves either changing the function internals or by introducing randomness.

Furthermore, the input and output types also have an impact on how we identify whether a given function is well-behaved or not. Operations that give objective results or cases where the expected output can be calculated independently are easy to validate.

The deployment of such functions into production presents a significant challenge.

Generative AI models show exactly the same characteristics as mis-behaving functions.

All Generative AI models can be cast as functions (see the next post in this series). The source of their mis-behaviour comes from randomness as well as evolution. They do not evolve like our brains (by continuous learning) or through the actions of a rogue developer. They evolve every time they are re-trained and a newer version released (e.g., ChatGPT-4 after ChatGPT-3.5).

2 Comments

Leave a Comment