Understanding ChatGPT

We know GPT stands for Generative Pre-trained Transformers. But what does ‘Chat’ mean in ChatGPT and how is it different from GPT-3.5 the OpenAI large language model?

And the really interesting question for me: Why doesn’t ChatGPT say ‘Hi’?

The Chat in ChatGPT

Any automated chat product must have the following capabilities, to do useful work:

  1. Understand entity (who), context (background of the interaction), intent (what do they want) and if required user sentiment.
  2. Trigger action/workflow to retrieve data for response or to carry out some task.
  3. Generate appropriate response incorporating retrieved data/task result.

The first step is called Natural Language Understanding and the third step is called Natural Language Generation. For traditional systems the language generation part usually involves mapping results from step (1) against a particular pre-written response template. If the response is generated on the fly without using pre-written responses then the model is called a generative AI model as it is generating language.

ChatGPT is able to do both (1) and (3) and can be considered as generative AI as it does not depend on canned responses. It is also capable of generating a wide variety of correct responses to the same question.

With generative AI we cannot be 100% sure about the generated response. This is not a trivial issue because, for example, we may not want the system to generate different terms and conditions in different interactions. On the other hand we would like it to show some ‘creativity’ when dealing with general conversations to make the whole interaction ‘life like’. This is similar to a human agent reading off a fixed script (mapping) vs allowed to give their own response.

Another important point specific to ChatGPT is that unlike an automated chat product it does not have access to any back-end systems to do ‘useful’ work. All the knowledge (upto the year 2021) it has is stored within the 175 billion parameter neural network model. There is no workflow or actuator layer (as yet) to ChatGPT which would allow it to sequence out requests to external systems (e.g. Google Search) and incorporate the fetched data in the generated response.

Opening the Box

Let us now focus on ChatGPT specifics.

Chat GPT is a Conversational AI model based on the GPT-3.5 large language model (as of writing this). A language model is an AI model that encapsulates the rules of a given language and is able to use those rules to carry out various tasks (e.g. text generation).

The term language can be understood as the means of expressing something using a set of rules to assemble some finite set of tokens that make up the language. This applies to human language (expressing our thoughts by assembling alphabets), computer code (expressing software functionality by assembling keywords and variables) as well as protein structures (expressing biological behavior by assembling amino-acids).

The term large refers to the (175 billion) number of parameters within the model which are required to learn the rules. Think of a model like a sponge, complex language rules like water. More complex the rules, bigger the sponge you will need to soak it all up. If the sponge is small then rules will start to leak out and we won’t get an accurate model.

Now a large language model (LLM) is the core of ChatGPT but it is not the only thing. Remember our three capabilities above? The LLM is involved in step (3) but there is still step (1) to consider.

This is where the ChatGPT model comes in. The ChatGPT model is specifically a fine-tuned model based on GPT-3.5 LLM. In other words, we take the language rules captured by GPT-3.5 model and we fine tune it (i.e. retrain a part of the model) to be able to answer questions. So ChatGPT is not a chat platform (as defined by the capability to do Steps 1-3 above) but a platform that can respond to a prompt in a human-like way without resorting to a library of canned responses.

Why do I say ‘respond to a prompt’? Did you notice that ChatGPT doesn’t greet you? It doesn’t know when you have logged in and are ready to go, unlike a conventional chatbot that chirps up with a greeting. It doesn’t initiate a conversation, instead it waits for a prompt (i.e. for you to seed the dialog with a question or a task). See examples of some prompts in Figure 1.

Figure 1: ChatGPT example prompts, capabilities and limitations. Source [https://chat.openai.com/chat]

This concept needing a prompt is an important clue in how ChatGPT was fine tuned from the GPT-3.5 base model.

Fine Tuning GPT-3.5 for Prompts

As the first step the GPT-3.5 is fine-tuned using supervised learning on a prompt sampled from a prompt database. This is quite a time consuming process because while we may have a large collection of prompts (e.g.: https://github.com/f/awesome-chatgpt-prompts) and a model capable of generating a response based on a prompt, it is not easy to measure the quality of the response except in the simplest of cases (e.g. factual answers).

For example if the prompt was to ‘Tell me about the city of Paris’ then we have to ensure that the facts are correct as well as their presentation is clear (e.g. Paris is the capital of France). Furthermore we have to ensure correct grouping and flow within the text. It is also important to understand where opinion is presented as a fact (hence the second limitation in Figure 1).

Human in the Loop

The easiest way to do this is to get a human to write the desired response to the sampled prompt (from a prompt dataset) based on model generated suggestions. This output when formatted into a dialog structure (see Figure 2) provides labelled data for using supervised learning to fine-tuning the GPT-3.5 model. This basically teaches GPT-3.5 what a dialog is.

Figure 2: Casting text as a set of dialogs.

But this is not the end of Human in the Loop. To ensure that the model can self-learn a reward model is built. This reward model is built by taking a prompt and few generated outputs (from the fine-tuned model) and asking a human to rank them in order of quality.

This labeled data is used to then create the reward function. Reward functions are found in Reinforcement Learning (RL) systems which also allow self-learning. Therefore there must be some RL going on in ChatGPT training.

Reinforcement Learning: How ChatGPT Self-Learns

ChatGPT uses the Proximal Policy Optimization RL algorithm (https://openai.com/blog/openai-baselines-ppo/#ppo) in a game-playing setting to further fine-tune the model. The action is the generated output. The input is the reward value from the Reward function (as above). Using this iterative process the model can be continuously fine-tuned using simple feedback (e.g. ‘like’ and ‘dislike’ button that comes up next to the response). This is very much the wisdom of the masses being used to direct the evolution of the model. It is not clear though how much of this feedback is reflected back into the model. Given the public facing nature of the model you would want to carefully monitor any feedback that is incorporated into the training.

What is ChatGPT Model doing?

By now it should be clear that ChatGPT is not chatting at all. It is filling in next bit of text in a dialog. This process starts from the prompt (seed) that the user provides. This can be seen in the way it is fine-tuned.

ChatGPT responds based on tokens. A token is (as per my understanding) a combination of up to four characters and can be a full word or part of one. It can create text consisting of up to 2048 tokens (which is a lot of text!).

Figure 3: Generating response as a dialog.

The procedure for generating a response (see Figure 3) is:

  1. Start with the prompt
  2. Take the text so far (including the prompt) and process it to decide what goes next
  3. Add that to existing text and check if we have encountered the end
  4. If yes, then stop otherwise go to step 2

This allows us to answer the question: why doesn’t ChatGPT say ‘Hi’?

Because if it seeded the conversation with some type of greeting then we would by bounding the conversation trajectory. Imagine starting with the same block in Figure 3 – we would soon find that the model starts going down a few select paths.

ChatGPT confirms this for us:

I hope you have enjoyed this short journey inside ChatGPT.

Sampling and Time-series Forecasting

In the previous post we built a simple time-series forecasting model using a custom neural network model as well as a SARIMAX model from Statsmodels TSA library. The data we used was the monthly house sales transaction counts. Training data was from 1995 – 2015 and the test data was from 2016 – end 2022.

But we can see only about 340 data points available (Figure 1a) and the data is quite spiky (see Figure 1a/1b). From Figure 1 we can see the Test Data is quite Note the outliers in the historgram (Figure 1b) in the Test data. The distribution plot was produced using Seaborne (using the .distplot() method).

Figure 1a: Monthly transaction counts – showing test/train split.
Figure 1b: Density plots for Training (blue) and Test (orange) data.

Any model that is trained on this data set is likely to not perform well especially after 2016 (see Figure 1) due to the sharp rise and fall. In fact this is the time period we would use to validate our model before attempting to forecast beyond the available time horizon.

We can see from the performance of the Neural Network (NN) trained on available data (Figure 2 Bottom) that the areas where data is spiky (orange line) the trained network (blue line) doesn’t quite reach the peaks (see boxes).

Figure 2 Top: Neural network trained using over sampled data; Bottom: Neural network trained using available data.

Similarly if we over sample from available data especially around the spiky regions the performance improves (Figure 2 Top). We can see the predicted spikes (blue line) are lot closer to the actual spikes in the data (orange line). If they seem ‘one step ahead’ it is because this is a ‘toy’ neural network model which has learnt to follow the last observed value.

As an aside we can see the custom Neural Network is tracking SARIMAX quite nicely except that it is not able to model the seasonal fluctuation.

More sophisticated methods such as RNNs will product very different output. We can see in Figure 3 how RNNs model the transaction data. This is just to model the data not doing any forecasting yet. The red line in Figure 3 is a custom RNN implementation and the orange line is Tensorflow RNN implementation.

Sampling

To understand why oversampling has that effect let us understand how we sample.

Figure 4 shows this sampling process at work. We normalise the data and take a chunk of data around a central point. For this chunk, we calculate the standard deviation. If standard deviation of the chunk is more than 0.5 the central point is accepted into the sample.

As we decrease the chunk size we see less points are collected. Smaller the chunk size more variability will be needed in the neighborhood for the central point to be sampled. For example for chunk size of two (which mean two points either side of the central point – see Figure 4), we find sampling from areas of major changes in transaction counts.

Figure 4: Sample points (orange) collected from the Training data using different

The other way of getting around this is to use a Generative AI system to create synthetic data that we can add to the sample set. The generative AI system will have to create both the time value (month-year) as well as the transaction count.