Customizing Large Language Models for your Gen AI use case

Andrew Patricio
May 4
5 min read

Updated: Jul 15

When building a Gen AI tool it’s important to understand how it works so that you know where to focus your development efforts. Large Language Models (LLMs) can be easily customized but the way to do so is different than with coding up a web or mobile application or working with traditional AI/ML (Machine Learning).

In a previous post I explained that an LLM does one basic thing: predict the next word given a prompt. In this post I will give a high level explanation as to how an LLM does that magic and what customization techniques are available for each stage.

This is not an explanation of how an LLM is designed or implemented, nor is it a precise description of what is actually happening. Instead I'm aiming for a working concept what is happening under the hood that will help you to understand where you need to focus your development efforts.

While there are a lot of differences between various LLMs most of that is due to their training sets. However even with different actual implementations, when you submit a prompt all LLMs do three things:

Contextualize the input
Calculate probabilities for what the next word will be
Make a pseudo random choice from the list of most probable words

There are different methods of customizing the LLM for your use case for each of these three parts. Let’s take them in turn.

Contextualization

Words from all languages (but especially English) have a lot of ambiguity. The word “set” for example has over 400 different meanings. When we read a sentence that contains that word we determine which of all of those meanings is the most relevant by looking at all the other words. For example

She set

By itself, pretty ambiguous. It does give some meaning but nowhere near enough to be able to predict the next word. But see what happens when we add a few more words:

She set the ball to She set the table for She set herself against

The word “set” gets a different meaning “baked” into it from all the other words in each particular sentence.

Some words like “the” have almost no effect on the meaning of "set", others like “ball” and “table” have a great deal of impact on determining which of the varied meanings of "set" is the most relevant. But with the additional context that the other words provide to each other, we can more effectively narrow down what the next word should be.

This is exactly what the LLM does first: It takes the entire input prompt and bakes context into each word based on all the other words.

So when you type in something like “Create a poem about dogs”, each one of those 5 words gets its meaning adjusted by the other four words in the prompt. This is a critical step because without the correct nuanced meaning of each word, it’s impossible to determine what the next word should be.

Next word likelihood calculation

Step two is what we usually think of what a trained neural network does. It takes all the inputs and calculates a probability for which one of its outputs is most likely.

In the case of Gen AI every LLM is trained on an immense training set. A vocabulary containing every possible word (actually “token” which are wordish enough for this high level explanation) is created from all that text and the model trained to output a probability for each. Specifically the probability of the likelihood of that particular word being the next expected word given the input prompt.

In this second step, the LLM inputs all the contextualized words from the prompt into this model and calculates the next word likelihood probability for all the words in the vocabulary. So the output is one number for every word in a vocabulary

It is important to recognize that this is a very deterministic calculation. When you use a consumer tool like chatGPT, if you type in the same prompt multiple times, you will usually get different responses.

However, this “creativity” aspect of LLM tools comes into play in the next step. For this current step of determining next word probabilities, it is just a straight up calculation and will provide the same set of probabilities for every word in the vocabulary given the same input prompt.

Random choice of expected word

In the final step, once a likelihood has been assigned to every word in the entire vocabulary a choice has to be made as to which word to actually return to the user. A naive way to do this is to always return the most likely word, however that would make the LLM very mechanistic and less useful.

Instead of the single most probable word every time, a function is applied that makes a pseudo random choice from the ordered list of likelihoods. Meaning that a word that has a higher probability is more likely to be returned but it is not necessarily the word with the absolute highest probability.

There is usually a parameter called “temperature” or something similar that governs this. Adjusting this one way means that the pseudo random choice is hardly random at all and chooses from only the very most likely set of words to the point that If you turn this all the way to its end limit then you will always get the single most likely word. Turn it too much in the other direction and less likely words will start appearing to the point where you may get gibberish if you go too far.

But either way it’s important to understand that the LLM is at it’s core deterministic in that it calculates probabilities deterministically and it’s only this last step that makes it more “creative”.

Customization Techniques

So now that we have an understanding of what is happening under the hood, the question is what is the best way to build out your tool for your particular use case.

For the first stage, the techniques to use involve either giving more information to the LLM to generate a more accurate response or in effect "programming" (quotations used intentionally) it to behave a certain way. I will go through this in more detail in a future post because it's really where you can get the biggest bang for the buck. If you've heard of "agentic Gen AI" this is where that comes into play.

The second stage is the hardest to customize. This is where you get into more traditional machine learning techniques and as a result it can get quite expensive. However, you can also use pre-trained models with characteristics that more closely manage what you are looking for. In all cases, the customization has a much more subtle effect than in the first stage.

In the last stage, you are generally tuning the parameters of a pre-trained model. In this case the customization requires a lot of experimentation which in turn requires time and experience. The practical result being that you will rarely monkey with these parameters and generally use default settings.

The table below lists some specific techniques for each stage.

At the end of the day, Gen AI is a revolution in the making precisely because these techniques don't all require a high degree of technical skill or even expertise really. You can obtain a fairly high ROI on experimentation alone.