Web-server for LLMs

One thing that really bugs me when running larger LLMs locally is the load time for each model. Larger the on-disk size of the model, the longer it takes for the model to be ready to query.

One solution is to run the model in a Jupyter notebook so we load it once and then query as many times as we want in subsequent code blocks. But this is not ideal because we can still have issues that require us to restart the notebook. These issues usually have little to do with the Gen AI model itself (which in most use-cases is treated as a closed box).

Another requirement is to be able to run your application and your LLM on different machines/instances. This can be a common requirement if you want to create a scalable Gen-AI based app. This is the main reason we have services like the Google Model Garden that don’t require you to ‘host’ models and provide API-based access instead.

To get around this I developed a web-server using python flask that can load any model accessible through the huggingface transformers library (using a plugin approach). This is mainly useful for testing and learning but can be made production ready with little effort.

The code can be found here: https://github.com/amachwe/gen_ai_web_server

Key steps to wrap and setup your model web-server:

  1. Load the tokenizer and model using the correct class from the huggingface transformers python library.
  2. Create the wrapped model using selected LLM_Server_Wrapper and pass that to the server.
  3. Start the server.

The above steps shown as code below. Also available in the llm_server.py file in the linked GitHub repo above.

Will share further examples of how I have been using this…

… as promised I have now added a client to help you get started (see link below).

Simple Client

This client shows the power of abstracting the LLM away from its use. We can use the “/info” path to get the prompting hints from the wrapped model. This will help the client create the prompt properly. This is required because each model could have its own prompting style.

Indian General Elections (2024)

Numbers are really beautiful and can often illuminate hidden corners of a complex situation. But before we start, I want to congratulate the citizens of India on an amazing election during record breaking heatwaves.

The results was a transition from Modi 2.0 to NDA 3.0 (previous two being under Mr Vajpayee) with BJP emerging as the single largest party but falling short of an absolute majority by about 32 seats.

In many ways this is a good situation that BJP is on relatively solid ground to be able to support the coalition with clear dependency on its other members to encourage more consultative approach to governance.

Figure 1: Comparing Total Parliamentary Seats in a state vs % of Seats won by NDA. Yellow line shows the half way mark (50% seats).

Back to the numbers…

I only want to show one graph to help explain why we have the given situation. Figure 1 above shows total seats in a State against the win % (in terms of number of seats won by the BJP).

To get absolute majority the states below the orange line had to be much closer to it or in case of Uttar Pradesh, above it.

What didn’t help was the fact that states in the centre and right (those with larger number of seats in the Parliament) ended up below the yellow line (50% seat win line).

The states below or on the yellow line but above the orange line needed to be above the yellow line. For example, this time BJP lost one seat in Jammu and Kashmir – which took the win % below the 50% line (2 out of 5 seats) but still kept it above the orange line (lower envelope).

I want to add one more piece of information to the graph. The trend in voter turnout. In Figure 2 (below) we see that the red marked states (voter turnout was lower than in 2019) are mostly above the 50% win rate (yellow) line. Those below the orange line are mostly green (voter turnout higher than 2019).

Figure 2: Same plot with colour coded states based on voter turnout trend (green for increase, red for decrease, and black for no data).

My first hypothesis is that BJP voters were far more resilient and enthusiastic about voting.

My second hypothesis is that those supporting non-BJP candidates were not convinced about their chances in front of the shock and awe campaign undertaken by the BJP.