How to finetune your own LLM
Learn how to fine-tune transformer-based language models using Hugging Face to create specialized AI for your specific use cases and domains.
This article is written by Andrei Danila, a Machine Learning Engineer.
Below is a step-by-step guide to fine-tuning a language model (in this case, GPT-2 or its variants) on a text dataset using Hugging Face’s transformers library and the datasets library.
We’ve also written a companion Colab notebook which can be accessed here.
Note: If you’d like a deeper explanation of how Transformers work, please refer to our article on ChatGPT and Transformers. We’ll keep this guide relatively high-level, focusing on the main steps you need to fine-tune a model.
Pretrained transformer models like GPT-2, BERT, and others come with a wealth of linguistic knowledge acquired from massive amounts of text data. However, these models are often general-purpose. If you want a model to focus on a specific style, topic, or domain, you can fine-tune it on a smaller dataset of text that is more relevant to your use case.
Fine-tuning:
Note: If you’re on Colab, click on Runtime at the top of the page, select Change runtime type, select T4 GPU, then press Save. This will significantly speed up training.

First, install the transformers and datasets libraries. These provide tools to load, process, and train state-of-the-art language models. Both of these libraries have been created by Huggingface, the industry leader in open source ML infrastructure.

| !pip install -qqq transformers datasets |
Next, import the libraries you’ll need in your Python environment or Colab notebook. Importing them “brings” the relevant classes and functions into our notebook.
| import torch from datasets import load_dataset from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments |
Modern deep learning libraries can leverage GPUs for faster training. We can automatically detect if a GPU is available and use it. Otherwise, fall back to CPU.
| device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”) print(“Using device:”, device) |
Here are two important choices:
| model_name = “distilgpt2″# Try: “gpt2”, “gpt2-medium”, etc. dataset_name = “imdb”# Could be: “yelp_polarity”, “wiki40b”, etc. |
The load_dataset function from datasets makes it straightforward to load common datasets.
This is what our dataset looks like (from here on Huggingface):

| dataset = load_dataset(dataset_name) print(dataset[‘train’][0][‘text’][:250]) |
Transformers deal with tokens—numeric representations of pieces of text. That is to say, they can’t deal with text directly, so we transform each word in our vocabulary into a number (e.g. “dog” becomes 5). We use a Tokenizer to achieve this.
| tokenizer = AutoTokenizer.from_pretrained(model_name) # Some GPT-style models do not come with a pad token by default.if tokenizer.pad_token is None: def tokenize_function(examples): tokenized_dataset = dataset.map(tokenize_function, batched=True, remove_columns=[“label”]) |
Key points:
padding='max_length' and truncation=True: Ensures all sequences are the same length (here, 128 tokens).
We use a model class that’s designed for causal language modeling: AutoModelForCausalLM. The actual model loaded will bet the one we specified above (e.g. distilgpt2). This is the model page for distilgpt2, from Huggingface:

| model = AutoModelForCausalLM.from_pretrained(model_name) model.to(device) |
Moving to device ensures the model is on the GPU (if available).
It’s often useful to see what the model outputs before fine-tuning. Let’s provide a simple prompt to gauge the model’s initial response.
| prompt = “The movie was absolutely awful because” inputs = tokenizer.encode(prompt, return_tensors=”pt”).to(device) outputs = model.generate(inputs, max_length=50, num_beams=5, no_repeat_ngram_size=2) print(tokenizer.decode(outputs[0])) |
This is a sample output:
The movie was absolutely awful because of the way it was made, and I don’t know if I’ll ever get to see it again, but I can’t wait for it to come out.
Not very good as you can see.
We specify how we want to train. This includes batch size, number of epochs, and where to save checkpoints.
| training_args = TrainingArguments( output_dir=”./distilgpt2-finetuned-imdb”, evaluation_strategy=”epoch”,# Evaluate once every epoch learning_rate=2e-5, per_device_train_batch_size=2, per_device_eval_batch_size=2, num_train_epochs=1, weight_decay=0.01, report_to=”none”# Turn off logging to external services ) |
The Trainer class wraps the model, your training arguments, and the datasets together.
| trainer = Trainer( model=model, args=training_args, train_dataset=tokenized_dataset[“train”], eval_dataset=tokenized_dataset[“test”] ) |
Run the actual fine-tuning process. Depending on the size of your dataset and the power of your GPU, this could take a while.
| trainer.train() |
You’ll see a training loop with metrics like loss being printed out.
After training finishes, you can evaluate the model on the test set to see how it’s performing.
| trainer.evaluate() |
Finally, let’s see how the model’s output might have changed after fine-tuning. We’ll use the same prompt as before:
| prompt = “The movie was absolutely awful because” inputs = tokenizer.encode(prompt, return_tensors=”pt”).to(device) outputs = model.generate(inputs, max_length=50, num_beams=5, no_repeat_ngram_size=2) print(tokenizer.decode(outputs[0])) |
Compare this output to the pre-trained model’s response. Ideally, the fine-tuned model now has more relevant knowledge or style that you introduced during training (in this case, likely more knowledge of IMDB reviews).
Sample output:
The movie was absolutely awful because it was so bad. The acting was terrible, the plot was horrible, and the acting wasn’t as good as you’d expect from a movie like this. It was a waste of time and money to make this movie.
Much better than before we fine-tuned it!
That’s it! You’ve successfully fine-tuned a GPT-style language model with Hugging Face Transformers. Remember, this guide is flexible—feel free to swap out models, datasets, and parameters to fit your specific needs. If you want to dive deeper into how Transformers work, make sure to check out my article on Transformers. Happy fine-tuning!